from:"Hitesh Shah"

Re: clarification regarding Tez DAGs

2016-11-28 Thread Hitesh Shah

Hello Robert, 

Some of the questions may be better answered on the Hive list but I will take a 
first crack of some of them. 

From a Tez perspective, lets use vertices and ignore Maps and Reducers for now. 
Hive uses this as a convention to indicate that a vertex is either reading data 
from HDFS (map) or has an inbound shuffle edge (reduce).

For a given vertex, each task in the vertex is composed of a set of inputs, a 
processor and a set of outputs. 

The key-value constructs are defined by the kind of Input and Output being 
used. Today, pretty much all I/Os are key-value based.

The edge types defined how data is being transferred but they do not completely 
control how data is manipulated to be sent across the edge. A lot of that is 
defined within the Inputs and Outputs. To clarify, a broadcast edge implies an 
task from an upstream vertex will send all of its output to all tasks in the 
downstream vertex. However, a broadcast edge does not imply whether the data is 
sorted or non-sorted. Likewise for the scatter-gather edge. This edge allows 
for each task in an upstream vertex to generate partitioned data that can be 
distributed to all downstream tasks. This can be used to mimic the MR shuffle 
by having the Output in the upstream vertex generate partitioned and sorted 
data and be sent to a downstream Input which will do a merge+sort for all 
relevant partitions that it needs from all upstream tasks. This allows for 
plugging in a shuffle-like edge implementation that does not sort data but only 
partitions it ( or groups it ).

To answer your questions: 

>>> for (2) and (3)

Yes. The processor can generate a different key, val pair if it wants to. A 
simple usage of a MRR chain would be a case where you want to a group by X 
followed by an order by Y. It can be done in some form via a 2-stage DAG but a 
simplistic model would be a 3-stage dag where stage 2 does the grouping and 
stage 3 the order by.

>>> for (4) and (5)

I am not sure I understand the question. Could you clarify what M2 expects in 
terms of its input? If you combined the logic of M1 and M2 instead of a single 
task, would that retain the behavior that you want? If the reduce stage or a 
map stage in the middle of a DAG are both expecting an inbound shuffled input 
then there is no difference except for their logical names. 

Feel free to send more questions to the list to get more clarifications.

thanks
— Hitesh

> On Nov 28, 2016, at 3:44 PM, Robert Grandl  wrote:
> 
> Hi all,
> 
> I am trying to get a better understanding of the DAGs generated by Hive atop 
> Tez. However, I have some more fundamental questions about the types of 
> tasks/edges in a Tez DAG. 
> 
> 1) In case of MapReduce:
> Map - takes records and generates  pairs.
> Reduce - takes  pairs and reduce the list of the values for the 
> same Key. 
> Question:That means the reducer  does not change the Keys right?
> 
> In case of Tez, things can be more complex:
> 2) For example, Map tasks can be in the middle of the DAG too. My 
> understanding is that in this case the input is a set of  pairs 
> and the output can be a set of different  value pairs. 
> Is this true for any type of input edge (scatter gather, broadcast, one to 
> one)?
> 
> 3) Reduce tasks can be in the middle as well. Can I assume that the reducer 
> also can change the keys? For example, in case of Map -> Reduce_1 -> Reduce_2 
> patterns, what is the main reason of having Reduce_2? It is because the keys 
> are changed by Reduce_2 while Reduce_1 preserve the ones from the Map?
> 
> 4) On a related note. In case of Map_1 -> Map_2 patterns, it is possible 
> Map_2 to preserve the Keys generated by Map_1 or will be new keys?
> 
> 4) If my guess that both Map and Reduce stages can eventually change the 
> keys, what is the main difference of having both Map and Reduce stages in the 
> middle of the DAG (i.e. not input stages or leaf stages).
> 
> Thanks,
> - Robert
>

Re: Bad Log URL

2016-11-14 Thread Hitesh Shah

For the logs to a container in the NM, the NM’s http address is obtained from 
YARN APIs. Is this the only page in which the “:” is missing or is it missing 
in other rows’ links within the task attempts table? Can you confirm that the 
links to the NodeManagers work correctly from the ResourceManager UI?

thanks
— Hitesh

> On Nov 13, 2016, at 9:55 PM, Premal Shah  wrote:
> 
> Hi,
> Looks like the log url does not have a colon in it. Is this bad config on my 
> end or should I open a ticket for this?
> 
> 
> 
> -- 
> Regards,
> Premal Shah.

Re: Issue with the job progress in the UI

2016-11-09 Thread Hitesh Shah

Hello Premal,

This is likely a combination of a lag in publishing the history events to YARN 
timeline which is consumed by the UI and also related to the UI relying more on 
YARN Timeline for data as compared to reading the information directly from the 
Tez AM. The Hive client is directly getting its info from the AM but the UI is 
using a mix of both leading to such confusion. 

I believe there might already be an open jira for this but if you don’t mind, 
can you go ahead and create a new one in any case with the details that you 
have noticed?

thanks
— Hitesh


> On Nov 9, 2016, at 8:48 AM, Premal Shah  wrote:
> 
> Hi.
> I'm running a hive on tez query and this is what shows up in the shell
> 
> Logging initialized using configuration in 
> file:/usr/lib/apache-hive-2.0.1-bin/conf/hive-log4j2.properties
> OK
> Time taken: 2.422 seconds
> OK
> Time taken: 0.089 seconds
> Query ID = hadoop_20161109162801_2432d154-c75b-4173-af86-257a00b849f8
> Total jobs = 1
> Launching Job 1 out of 1
> 
> 
> Status: Running (Executing on YARN cluster with App id 
> application_1478236559237_1263)
> 
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED  
> --
> Map 1 .. container SUCCEEDED 48 4800  
>  0   0  
> Map 10 . container SUCCEEDED  7  700  
>  0   0  
> Map 11 . container SUCCEEDED 14 1400  
>  0   0  
> Map 7 .. container SUCCEEDED 44 4400  
>  0   0  
> Map 8 .. container SUCCEEDED 12 1200  
>  0   0  
> Map 9 .. container SUCCEEDED 44 4400  
>  0   0  
> Reducer 3 .  container   RUNNING 32 3110  
>  0   0  
> Reducer 4container   RUNNING 27  0   270  
>  0   0  
> Reducer 5containerINITED 31  00   31  
>  0   0  
> Reducer 6containerINITED 32  00   32  
>  0   0  
> --
> VERTICES: 06/10  [=>>-] 68%   ELAPSED TIME: 893.31 s  
>  
> --
> 
> The query is still running.
> 
> When I head over to the UI, it does not show the reducers
> 
> 
> 
> 
> And this does not happen for every query. Is there a limit to number of rows 
> in the UI table?
> 
> 
> -- 
> Regards,
> Premal Shah.

Re: Hive+Tez staging dir and scratch dir

2016-11-01 Thread Hitesh Shah

Hello Dharmesh,

The tez staging dir is where scratch data is kept for the lifetime of the Tez 
session. i.e. data which can be deleted once the application completes.

Staging data includes the following:
  - recovery logs used by the Tez AM for checkpointing state
  - Configs and/or dag plan payloads that are sent across to the AM via the 
staging dir. 

This staging directory location is configurable and overridable by the upper 
layer application. In the case of Hive, Hive uses the scratch dir as the Tez 
staging dir for the lifetime of the Hive session. 

For the actual usage of the hive staging dir and scratch dir, I suggest trying 
out the user@hive mailing list. 

thanks
— Hitesh 

> On Oct 31, 2016, at 2:41 PM, Dharmesh Kakadia  wrote:
> 
> Hi,
> 
> I am trying to understand meaning and relation between following 
> configurations when running Hive on Tez.
> 
> hive.exec.stagingdir
> tez.staging-dir
> hive.exec.scratchdir  
> 
> Thanks,
> Dharmesh

Re: Vertex Parallelism

2016-10-31 Thread Hitesh Shah

I suggest writing a custom InputFormat or modifying your existing InputFormat 
to generate more splits and at the same time, disable splits grouping for the 
vertex in question to ensure that you get the high level of parallelism that 
you want to achieve.

The log snippet is just indicating that vertex had been setup with -1 tasks as 
the splits are being calculated in the AM and that the vertex parallelism will 
be set via the initializer/controller (based on the splits from the Input 
Format).

— Hitesh

> On Oct 31, 2016, at 3:33 PM, Madhusudan Ramanna  wrote:
> 
> Hello Tez team,
> 
> We have a native Tez application.  The first vertex in the graph is a 
> downloader.  This vertex takes a CSV or sequence file that contains the 
> "urls" as input, downloads content and passes the content on to the next 
> vertex.  This input to vertex is smaller than the min split size.   However, 
> we'd like to have more than one task for running for this vertex to help 
> throughput. How do we set the tasks on this particular vertex to be greater 
> than one ?  Of course for other vertices in the graph,  number of tasks as 
> computed by data size fits perfectly fine. 
> 
> Currently, we're seeing this in the logs:
> 
> >
> 
> Root Inputs exist for Vertex: download : {_initial={InputName=_initial}, 
> {Descriptor=ClassName=org.apache.tez.mapreduce.input.MRInput, 
> hasPayload=true}, 
> {ControllerDescriptor=ClassName=org.apache.tez.mapreduce.common.MRInputAMSplitGenerator,
>  hasPayload=false}}
> Num tasks is -1. Expecting VertexManager/InputInitializers/1-1 split to set 
> #tasks for the vertex vertex_1477944280627_0004_1_00 [download]
> Vertex will initialize from input initializer. vertex_1477944280627_0004_1_00 
> [download]
> <
> 
> 
> 
> Thanks for your help !
> 
> Madhu
> 
> 
>

Re: Tez containers and input splits

2016-10-28 Thread Hitesh Shah

That is similar to MR’s approach. 

When a task requests for containers, it will specify that it needs to be 
scheduled on a particular set of hosts/racks via a TaskLocationHint. 
The TaskLocationHint is converted to a container ask from YARN i.e. one 
container on any of the hosts or racks specified in the location hint. 

See MRInputHelpers and TaskLocationHint classes for more details. 

Once a container is allocated to Tez, it will try its best to do host-level 
locality. After a certain time back-off, if rack level fall backs are enabled, 
it will try to match an unassigned task to a rack and then eventually fall back 
to a “any” match if that fallback option is enabled. 

— Hitesh 

> On Oct 28, 2016, at 12:10 PM, Madhusudan Ramanna  wrote:
> 
> Hello Hitesh,
> 
> Thanks for that explanation ! Could you clarify about how locality of input 
> splits is used.. 
> 
> thanks,
> Madhu
> 
> 
> On Thursday, October 27, 2016 11:19 PM, Hitesh Shah  wrote:
> 
> 
> Hello Madhusudan,
> 
> I will start with how container allocations work and make my way back to 
> explaining splits. 
> 
> At the lowest level, each vertex will have decided to run a number of tasks. 
> At a high level, when a task is ready to run, it tells the global DAG 
> scheduler about its requirements ( i.e. what kind of container resources it 
> needs, additional container specs such as env, local resources, etc. and also 
> where it wants to be executed for locality. 
> 
> The global scheduler then requests the ResourceManager for as many containers 
> as there are pending tasks. When YARN allocates a container to the Tez AM, 
> the Tez AM decides which is the highest priority task ( vertices at the top 
> of the tree run first ) that matches the container allocated and runs the 
> task on it. Re-used containers are given higher priority over new containers 
> due to JVM launch costs. And YARN may not give Tez all the containers it 
> requested so Tez will make do with whatever it has. It may end up releasing 
> containers which don’t match if there are non-matching tasks that need to be 
> run.
> 
> Now, let us take a “map” vertex which is reading data from HDFS. In MR, each 
> task represented one split ( or a group if you use something like Hive’s 
> CombineFileInputFormat ). In Tez, there are a couple of differences: 
>   
> 1) The InputFormat is invoked in the AM i.e. splits are calculated in the AM 
> ( can be done on the client but most folks now run those in the AM)
> 2) Splits are grouped based on the wave configurations ( 
> https://cwiki.apache.org/confluence/display/TEZ/How+initial+task+parallelism+works
>  ).
> 
> Each grouped split will be mapped to one task. This will then define what 
> kind of container is requested. 
> 
> Let us know if you have more questions.
> 
> thanks
> — Hitesh
> 
> 
> > On Oct 27, 2016, at 5:06 PM, Madhusudan Ramanna  wrote:
> > 
> > Hello Folks,
> > 
> > We have a native Tez application.  My question is mainly about MR inputs 
> > and tez allocated containers.  How does tez grab containers ? Is it one per 
> > input split ?  Could someone shed some light on this ?
> > 
> > thanks,
> > Madhu
> 
>

Re: Tez containers and input splits

2016-10-27 Thread Hitesh Shah

Hello Madhusudan,

I will start with how container allocations work and make my way back to 
explaining splits. 

At the lowest level, each vertex will have decided to run a number of tasks. At 
a high level, when a task is ready to run, it tells the global DAG scheduler 
about its requirements ( i.e. what kind of container resources it needs, 
additional container specs such as env, local resources, etc. and also where it 
wants to be executed for locality. 

The global scheduler then requests the ResourceManager for as many containers 
as there are pending tasks. When YARN allocates a container to the Tez AM, the 
Tez AM decides which is the highest priority task ( vertices at the top of the 
tree run first ) that matches the container allocated and runs the task on it. 
Re-used containers are given higher priority over new containers due to JVM 
launch costs. And YARN may not give Tez all the containers it requested so Tez 
will make do with whatever it has. It may end up releasing containers which 
don’t match if there are non-matching tasks that need to be run.

Now, let us take a “map” vertex which is reading data from HDFS. In MR, each 
task represented one split ( or a group if you use something like Hive’s 
CombineFileInputFormat ). In Tez, there are a couple of differences: 

1) The InputFormat is invoked in the AM i.e. splits are calculated in the AM ( 
can be done on the client but most folks now run those in the AM)
2) Splits are grouped based on the wave configurations ( 
https://cwiki.apache.org/confluence/display/TEZ/How+initial+task+parallelism+works
 ).

Each grouped split will be mapped to one task. This will then define what kind 
of container is requested. 

Let us know if you have more questions.

thanks
— Hitesh

> On Oct 27, 2016, at 5:06 PM, Madhusudan Ramanna  wrote:
> 
> Hello Folks,
> 
> We have a native Tez application.  My question is mainly about MR inputs and 
> tez allocated containers.   How does tez grab containers ? Is it one per 
> input split ?  Could someone shed some light on this ?
> 
> thanks,
> Madhu

Re: Tez Sessions

2016-10-20 Thread Hitesh Shah

HiveServer2 today maintains Tez sessions ( when running with perimeter security 
i.e Ranger/Sentry ) and re-uses the session across queries. 

Tez AM recovery works for the most part. It will try to recover completed tasks 
of the last running DAG and complete the one that did not complete or were 
still running. It does not handle cases where the committer was in the middle 
of a commit though so those dags will abort when trying to recover. Given the 
complexity of recovery, there are probably bugs that we may not have discovered 
yet but for the most part, it does function well.

There are a few issues you should consider when trying to use a single AM:
   - on secure clusters, the delegation token max lifetime is 7 days so you 
will need to re-cycle apps on a weekly basis. 
   - YARN does not clean up data/logs for an app until the app completes so 
this can add space pressure on the yarn local dirs. That said, there is some 
work happening as part of TEZ-3334 to help clean up intermediate data on a 
regular basis. There have been a couple of other jiras filed recently too to 
look at being able to clean up data more frequently.

thanks
— Hitesh   


> On Oct 20, 2016, at 2:35 PM, Madhusudan Ramanna  wrote:
> 
> Ok, no worries. I agree that this single AM model would be very close to a 
> mini-job tracker.  One of the options we're investigating having 1 yarn Tez 
> AM running all our DAGs. Given this AM already has all the 
> resources/containers, we were thinking this could save on the cost of AM, and 
> container initialization.
> 
> We haven't looked into tez recovery as well.  Durability is one of our big 
> concerns as well.
> 
> 
> On Thursday, October 20, 2016 12:44 PM, Hitesh Shah  wrote:
> 
> 
> Not supported as of now. There are multiple aspects to supporting this 
> properly. One of the most important issues to address would be to do proper 
> QoS across various DAGs i.e. what kind of policies would need to be built out 
> to run multiple DAGs to completion within a limited amount of resources. The 
> model would become close to a mini-jobtracker or a spark-standalone cluster.
> 
> Could you provide more details on what you are trying to achieve? We could 
> try and provide different viewpoints on trying to get you to a viable 
> solution.
> 
> — Hitesh
> 
> > On Oct 20, 2016, at 10:52 AM, Madhusudan Ramanna  
> > wrote:
> > 
> > Hello Folks,
> > 
> > http://hortonworks.com/blog/introducing-tez-sessions/
> > 
> > From the above post it seems like DAGs can only be executed serially.  
> > Could DAGs be executed in parallel on one Tez AM ?  
> > 
> > thanks,
> > Madhu
> 
>

Re: Tez Sessions

2016-10-20 Thread Hitesh Shah

Not supported as of now. There are multiple aspects to supporting this 
properly. One of the most important issues to address would be to do proper QoS 
across various DAGs i.e. what kind of policies would need to be built out to 
run multiple DAGs to completion within a limited amount of resources. The model 
would become close to a mini-jobtracker or a spark-standalone cluster.

Could you provide more details on what you are trying to achieve? We could try 
and provide different viewpoints on trying to get you to a viable solution.

— Hitesh

> On Oct 20, 2016, at 10:52 AM, Madhusudan Ramanna  wrote:
> 
> Hello Folks,
> 
> http://hortonworks.com/blog/introducing-tez-sessions/
> 
> From the above post it seems like DAGs can only be executed serially.  Could 
> DAGs be executed in parallel on one Tez AM ?  
> 
> thanks,
> Madhu

Re: Container settings at vertex level

2016-10-20 Thread Hitesh Shah

Hello Madhu,

If you are using Tez via Hive, then this would need a fix in Hive. I don’t 
believe Hive supports different settings for each vertex in a given query today.

However, for native jobs, Tez already supports different specs for each vertex:

Vertex::setTaskResource() ( configuring yarn resources i.e. memory/cpu )
Vertex::setTaskLaunchCmdOpts() ( java opts, etc )

Does the above help? Or are you looking for something different? 

thanks
— HItesh

> On Oct 20, 2016, at 10:44 AM, Madhusudan Ramanna  wrote:
> 
> Hello Folks,
> 
> Some vertices require more memory than other vertices. These vertices are 
> memory intensive.  The graph, in general, takes a long(ish) time to complete. 
>  Default allocation of a huge chunk of memory to this one DAG/application 
> severely limits concurrent yarn containers that can be run.  How can we 
> influence Tez Runtime to request and execute some vertices in specialized 
> containers ? What is a good solution to this problem?
> 
> thanks,
> Madhu

Re: Tez UI

2016-10-19 Thread Hitesh Shah

Thanks for doing all the investigative work and patiently trying out the 
various options we provided. 

If you get a chance, feel free to try the following ( and hopefully provide a 
patch if it works better for you as a long term solution ):
   - compile the tez source against CDH’s distribution of hadoop ( would entail 
adding the cdh maven repo to the pom as well as updating hadoop.version hence 
the patch if you can put this under a maven profile )
   - the above I believe will pull in the transitive dependencies of jackson 
from the hadoop version compiled against.
   - this allows you to retain CDH components as is and only modify tez instead 
of messing with the hadoop install. 

thanks
— Hitesh


> On Oct 18, 2016, at 9:28 PM, Stephen Sprague  wrote:
> 
> i'll be a monkey's uncle.  that did it.
> 
> this is what i did:
> 
> * i untarred share/tez.tar.gz
> 
> * i then set these two env vars:
>  export YARN_USER_CLASSPATH_FIRST=true
>  export YARN_USER_CLASSPATH=/usr/lib/apache-tez-0.8.4-bin/share/*
> 
> * and started up the ATS as such:
> 
>  sudo -u yarn -- ./yarn-daemon.sh --config /etc/hadoop/conf start 
> timelineserver
> 
> 
> hard to believe that jackson version from 1.8.8 to 1.9.13 made that 
> difference.
> 
> great call Hitesh.  many thanks. I've been wrestling with this for quite some 
> time.
> 
> Cheers,
> Stephen.
> 
> On Tue, Oct 18, 2016 at 8:09 PM, Stephen Sprague  wrote:
> * hadoop version: Hadoop 2.6.0-cdh5.4.7
> 
> * tez version: 0.8.4 (i am using the bin distro from apache - so i didn't 
> "make" it)
> 
> * i found these in the tez distro tarball.
> 
>$ cd /home/dwr/downloads/apache-tez-0.8.4-bin
>$ find . -name '*jers*'
> ./lib/jersey-client-1.9.jar
> ./lib/jersey-json-1.9.jar
> 
> * i found these in the tez server-side tarball
> $ tar ztf share/tez.tar.gz | grep jers
> lib/jersey-json-1.9.jar
> lib/jersey-core-1.9.jar
> lib/jersey-client-1.9.jar
> lib/jersey-guice-1.9.jar
> 
> 
>  $ tar ztf share/tez.tar.gz | grep jack
>  lib/jackson-core-asl-1.9.13.jar
>  lib/jackson-mapper-asl-1.9.13.jar
>  lib/jackson-jaxrs-1.9.13.jar
>  lib/jackson-xc-1.9.13.jar
> 
> * i found these in the hadoop ATS timeline server classpath:
> 
>  * /usr/lib/hadoop/lib/jersey-server-1.9.jar
>  * /usr/lib/hadoop/lib/jersey-core-1.9.jar
>  * /usr/lib/hadoop/lib/jersey-json-1.9.jar
>  * /usr/lib/hadoop-yarn/lib/jersey-client-1.9.jar
>  * /usr/lib/hadoop-yarn/lib/jersey-guice-1.9.jar
> 
>  * /usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar
>  * /usr/lib/hadoop/lib/jackson-xc-1.8.8.jar
>  * /usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar
>  * /usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar
> 
> 
> 
> so jersey jars look to be in sync but the jackson ones look a step behind.  
> 
> what do you think?  should i force the 1.9's into the ATS CLASSPATH?  can't 
> hurt would be my guess. lemme try.
> 
> Cheers,
> Stephen.
> 
> 
> On Mon, Oct 17, 2016 at 2:44 PM, Stephen Sprague  wrote:
> Thanks Hitesh.   i'll look into this tonight.
> 
> On Mon, Oct 17, 2016 at 10:31 AM, Hitesh Shah  wrote:
> Hello Stephen,
> 
> I checked branch-2.4.0 of hadoop just to make sure - it does contain 
> “eventinfo” as a member of the TimelineEvent class so this does not seem to 
> indicate any issue in terms of a potential mismatch or a missing patch in the 
> version of hadoop that you are running.
> 
> Based on the logs, YARN_APPLICATION_ATTEMPT is data being written by the YARN 
> RM into YARN Timeline and that seems to be working. What is not working is 
> the Tez AM talking to YARN Timeline. I have not come across the property not 
> found issue in the past. One guess I have is that this potentially could due 
> be either due to something incompatible with the timeline client class on Tez 
> AM’s classpath and/or a combination of the jackson/jersey jars in use.
> 
> There are a few things you should look into and update this thread with the 
> following info:
>- what version of hadoop you are running
>- what version of Tez ( and also what version of hadoop it was compiled 
> against )
>- check the hadoop classpath for jackson/jersey jars and compare the 
> versions in it to the versions in the tez tarball.
> 
> thanks
> — Hitesh
> 
> > On Oct 16, 2016, at 9:24 PM, Stephen Sprague  wrote:
> >
> > thanks Allan.  so i enabled DEBUG,console on the ATS.  I see this in that 
> > log:
> >
> > 16/10/16 21:07:59 DEBUG mortbay.log: call filter Cross Origin Filter
> > 16/10/16 21:07:59 DEBUG mortbay.log:

Re: Tez UI

2016-10-17 Thread Hitesh Shah

582)
> 
> 
> again not sure how to read it. 
> 
> so far this seems to be the smoking gun to me from the Tez AM.
> 
> 2016-10-16 16:14:06,106 [DEBUG] [HistoryEventHandlingThread] 
> |impl.TimelineClientImpl|: HTTP error code: 404 Server response : 
> {"exception":"
> UnrecognizedPropertyException","message":"Unrecognized field \"eventinfo\" 
> 
> 
> On Sun, Oct 16, 2016 at 5:53 PM, Allan Wilson  wrote:
> I can send you my TEZ file later
> 
> Sent from my iPhone
> 
> On Oct 16, 2016, at 1:32 PM, Stephen Sprague  wrote:
> 
>> Hi Hitesh,
>> Bingo!
>> 
>> Log Type: syslog_dag_1476593404620_0001_1
>> 
>> Log Upload Time: Sat Oct 15 22:03:47 -0700 2016
>> 
>> Log Length: 75813
>> 
>> Showing 4096 bytes of 75813 total. Click here for the full log.
>> 
>> 6-10-15 21:51:35,970 [WARN] [IPC Server handler 25 on 40353] 
>> |app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown 
>> container with id: container_1476593404620_0001_
>> 01_50, asking it to die
>> 2016-10-15 21:51:35,972 [WARN] [IPC Server handler 27 on 40353] 
>> |app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown 
>> container with id: container_1476593404620_0001_
>> 01_08, asking it to die
>> 2016-10-15 21:51:35,973 [WARN] [IPC Server handler 3 on 40353] 
>> |app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown 
>> container with id: container_1476593404620_0001_
>> 01_07, asking it to die
>> 2016-10-15 21:51:35,974 [WARN] [IPC Server handler 29 on 40353] 
>> |app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown 
>> container with id: container_1476593404620_0001_
>> 01_11, asking it to die
>> 2016-10-15 21:51:35,987 [ERROR] [HistoryEventHandlingThread] 
>> |impl.TimelineClientImpl|: Failed to get the response from the timeline 
>> server.
>> 2016-10-15 21:51:35,987 [WARN] [HistoryEventHandlingThread] 
>> |ats.ATSHistoryLoggingService|
>> : Could not handle history events
>> org.apache.hadoop.yarn.
>> exceptions.YarnException: Failed to get the response from the timeline 
>> server.
>>  at org.apache.hadoop.yarn.client.
>> api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.
>> java:339)
>>  at org.apache.hadoop.yarn.client.
>> api.impl.TimelineClientImpl.putEntities(
>> TimelineClientImpl.java:301)
>>  at org.apache.tez.dag.history.
>> logging.ats.ATSHistoryLoggingService.handleEvents(ATSHistoryLoggingService.java:
>> 357)
>>  at org.apache.tez.dag.history.
>> logging.ats.ATSHistoryLoggingService.access$700(ATSHistoryLoggingService.java:
>> 53)
>>  at org.apache.tez.dag.history.
>> logging.ats.ATSHistoryLoggingService$1.run(ATSHistoryLoggingService.
>> java:190)
>>  at java.lang.Thread.run(Thread.
>> java:745)
>> 2016-10-15 21:51:35,987 [WARN] [IPC Server handler 6 on 40353] 
>> |app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown 
>> container with id: container_1476593404620_0001_
>> 01_58, asking it to die
>> 2016-10-15 21:51:35,989 [WARN] [IPC Server handler 24 on 40353] 
>> |app.TezTaskCommunicatorImpl|: Received task heartbeat from unknown 
>> container with id: container_1476593404620_0001_
>> 01_51, asking it to die
>> 2016-10-15 21:51:36,021 [ERROR] [HistoryEventHandlingThread] 
>> |impl.TimelineClientImpl|: Failed to get the response from the timeline 
>> server.
>> 2016-10-15 21:51:36,021 [WARN] [HistoryEventHandlingThread] 
>> |ats.ATSHistoryLoggingService|
>> : Could not handle history events
>> org.apache.hadoop.yarn.
>> exceptions.YarnException: Failed to get the response from the timeline 
>> server.
>>  at org.apache.hadoop.yarn.client.
>> api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.
>> java:339)
>>  at org.apache.hadoop.yarn.client.
>> api.impl.TimelineClientImpl.putEntities(
>> TimelineClientImpl.java:301)
>>  at org.apache.tez.dag.history.
>> logging.ats.ATSHistoryLoggingService.handleEvents(ATSHistoryLoggingService.java:
>> 357)
>>  at org.apache.tez.dag.history.
>> logging.ats.ATSHistoryLoggingService.access$700(ATSHistoryLoggingService.java:
>> 53)
>>  at org.apache.tez.dag.history.
>> logging.ats.ATSHistoryLoggingService$1.run(ATSHistoryLoggingService.
>> java:190)
>>  at java.lang.Thread.run(Thread.
>> java:745)
>> 
>> 
>> i'm running the hive cli on host=dwrdevnn1.
>> 
>> i updated yarn-site.xml on dwrdevnn1.
>> 
>&g

Re: Tez UI

2016-10-16 Thread Hitesh Shah

t; $ sudo netstat -lanp | grep 31168
> tcp0  0 172.19.103.136:102000.0.0.0:*   LISTEN
>   31168/java
> tcp0  0 172.19.103.136:8188 0.0.0.0:*       LISTEN
>   31168/java
> 
> 
> might there be a debug log level i can set on impl.TimelineClientImpl to see 
> what is happening on the connection event?  
> 
> thank you again!
> 
> Cheers,
> Stephen.
> 
> 
> 
> 
> On Sun, Oct 16, 2016 at 9:54 AM, Hitesh Shah  wrote:
> Hello Stephen,
> 
> yarn-site.xml needs to be updated wherever the Tez client is used. i.e if you 
> are using Hive, then wherever you launch the Hive CLI and also where the 
> HiveServer2 is installed ( HS2 will need a restart ).
> 
> To see if the connection to timeline is/was an issue, please check the yarn 
> app logs for any Tez application ( the application master logs to be more 
> specific: syslog_dag* files) to see if there are any warnings/exceptions 
> being logged related to history event handling.
> 
> thanks
> — Hitesh
> 
> > On Oct 15, 2016, at 9:58 PM, Stephen Sprague  wrote:
> >
> > hmm... made that change to yarn-site.xml and retarted the timelineserver 
> > and RM.
> >
> > $ sudo netstat -lanp | grep 31168 #timelineserver
> >
> > tcp0  0 172.19.103.136:102000.0.0.0:*   LISTEN  
> > 31168/java
> > tcp0  0 172.19.103.136:8188 0.0.0.0:*   LISTEN  
> > 31168/java
> > tcp0  0 172.19.103.136:8188 172.19.103.136:45299
> > ESTABLISHED 31168/java
> > tcp0  0 172.19.103.136:8188 172.19.103.136:45298
> > ESTABLISHED 31168/java
> > tcp0  0 172.19.103.136:8188 172.19.103.136:45322
> > ESTABLISHED 31168/java
> > tcp0  0 172.19.103.136:8188 172.19.103.136:45297
> > ESTABLISHED 31168/java
> > tcp0  0 172.19.103.136:8188 172.19.103.136:45316
> > ESTABLISHED 31168/java
> > tcp0  0 172.19.103.136:8188 172.19.103.136:45318
> > ESTABLISHED 31168/java
> > tcp0  0 172.19.103.136:8188 172.19.103.136:45317
> > ESTABLISHED 31168/java
> > tcp0  0 172.19.103.136:8188 172.19.103.136:45321
> > ESTABLISHED 31168/java
> > tcp0  0 172.19.103.136:8188 172.19.103.136:45326
> > ESTABLISHED 31168/java
> > tcp0  0 172.19.103.136:8188 172.19.103.136:45314
> > ESTABLISHED 31168/java
> > tcp0  0 172.19.103.136:8188 172.19.103.136:45315
> > ESTABLISHED 31168/java
> > tcp0  0 172.19.103.136:8188 172.19.103.136:45313
> > ESTABLISHED 31168/java
> > tcp0  0 172.19.103.136:8188 172.19.103.136:45320
> > ESTABLISHED 31168/java
> > tcp0  0 172.19.103.136:8188 172.19.103.136:45324
> > ESTABLISHED 31168/java
> > tcp0  0 172.19.103.136:8188 172.19.103.136:45325
> > ESTABLISHED 31168/java
> > tcp0  0 172.19.103.136:8188 172.19.103.136:45319
> > ESTABLISHED 31168/java
> > unix  2  [ ] STREAM CONNECTED 1455259739 31168/java
> > unix  2  [ ] STREAM CONNECTED 1455253313 31168/java
> >
> >
> > still no dice though.  same error.   i only changed yarn-site.xml on the 
> > namenode though.  you think i need to copy it to all the datanodes and 
> > restart the NM's too?
> >
> > any other suggestions?
> >
> > 'ppreciate the help!
> >
> >
> > Cheers,
> > Stephen.
> >
> > On Sat, Oct 15, 2016 at 8:46 PM, Allan Wilson  wrote:
> > Just saw Gopals response...that def needs updating too.
> >
> > Sent from my iPhone
> >
> > On Oct 15, 2016, at 9:31 PM, Stephen Sprague  wrote:
> >
> >> thanks guys. lemme answer.
> >>
> >> Sreenath-
> >> 1. yarn.acl.enable = false  (ie. i did not set it)
> >> 2.  this:  http://dwrdevnn1.sv2.trulia.com:9766 displays index.html with 
> >> an *empty* list
> >>
> >> Gopal-
> >> 3. i'll replace 0.0.0.0 with dwrdevnn1.sv2.trulia.com and see happens...
> >>
> >> Allan-
> >> 4. yes, metrics are enabled.
> >>
> >>
> >> I'll let you know what happens with Gopal's suggestion.
> >>
> >>
> >> Cheers,
> >> Stephen.
> >>
> >> On Sat, Oct 15, 2016 at 8:20 PM, Allan Wilson  
> >> wrote:
> >> Are you emitting metrics to the ATS?
> >>
>

Re: Tez UI

2016-10-16 Thread Hitesh Shah

Hello Stephen,

yarn-site.xml needs to be updated wherever the Tez client is used. i.e if you 
are using Hive, then wherever you launch the Hive CLI and also where the 
HiveServer2 is installed ( HS2 will need a restart ). 

To see if the connection to timeline is/was an issue, please check the yarn app 
logs for any Tez application ( the application master logs to be more specific: 
syslog_dag* files) to see if there are any warnings/exceptions being logged 
related to history event handling. 

thanks
— Hitesh

> On Oct 15, 2016, at 9:58 PM, Stephen Sprague  wrote:
> 
> hmm... made that change to yarn-site.xml and retarted the timelineserver and 
> RM.
> 
> $ sudo netstat -lanp | grep 31168 #timelineserver
> 
> tcp0  0 172.19.103.136:102000.0.0.0:*   LISTEN
>   31168/java
> tcp0  0 172.19.103.136:8188 0.0.0.0:*   LISTEN
>   31168/java
> tcp0  0 172.19.103.136:8188 172.19.103.136:45299
> ESTABLISHED 31168/java
> tcp0  0 172.19.103.136:8188 172.19.103.136:45298
> ESTABLISHED 31168/java
> tcp0  0 172.19.103.136:8188 172.19.103.136:45322
> ESTABLISHED 31168/java
> tcp0  0 172.19.103.136:8188 172.19.103.136:45297
> ESTABLISHED 31168/java
> tcp0  0 172.19.103.136:8188 172.19.103.136:45316
> ESTABLISHED 31168/java
> tcp0  0 172.19.103.136:8188 172.19.103.136:45318
> ESTABLISHED 31168/java
> tcp0  0 172.19.103.136:8188 172.19.103.136:45317
> ESTABLISHED 31168/java
> tcp0  0 172.19.103.136:8188 172.19.103.136:45321
> ESTABLISHED 31168/java
> tcp0  0 172.19.103.136:8188 172.19.103.136:45326
> ESTABLISHED 31168/java
> tcp0  0 172.19.103.136:8188 172.19.103.136:45314
> ESTABLISHED 31168/java
> tcp0  0 172.19.103.136:8188 172.19.103.136:45315
> ESTABLISHED 31168/java
> tcp0  0 172.19.103.136:8188 172.19.103.136:45313
> ESTABLISHED 31168/java
> tcp0  0 172.19.103.136:8188 172.19.103.136:45320
> ESTABLISHED 31168/java
> tcp0  0 172.19.103.136:8188 172.19.103.136:45324
> ESTABLISHED 31168/java
> tcp0  0 172.19.103.136:8188 172.19.103.136:45325
> ESTABLISHED 31168/java
> tcp0  0 172.19.103.136:8188 172.19.103.136:45319
> ESTABLISHED 31168/java
> unix  2  [ ] STREAM CONNECTED 1455259739 31168/java
> unix  2  [ ] STREAM CONNECTED 1455253313 31168/java
> 
> 
> still no dice though.  same error.   i only changed yarn-site.xml on the 
> namenode though.  you think i need to copy it to all the datanodes and 
> restart the NM's too?
> 
> any other suggestions?
> 
> 'ppreciate the help!
> 
> 
> Cheers,
> Stephen.
> 
> On Sat, Oct 15, 2016 at 8:46 PM, Allan Wilson  wrote:
> Just saw Gopals response...that def needs updating too.
> 
> Sent from my iPhone
> 
> On Oct 15, 2016, at 9:31 PM, Stephen Sprague  wrote:
> 
>> thanks guys. lemme answer.
>> 
>> Sreenath-
>> 1. yarn.acl.enable = false  (ie. i did not set it)
>> 2.  this:  http://dwrdevnn1.sv2.trulia.com:9766 displays index.html with an 
>> *empty* list
>> 
>> Gopal-
>> 3. i'll replace 0.0.0.0 with dwrdevnn1.sv2.trulia.com and see happens...
>> 
>> Allan-
>> 4. yes, metrics are enabled.
>> 
>> 
>> I'll let you know what happens with Gopal's suggestion.
>> 
>> 
>> Cheers,
>> Stephen.
>> 
>> On Sat, Oct 15, 2016 at 8:20 PM, Allan Wilson  wrote:
>> Are you emitting metrics to the ATS? 
>> 
>> yarn.timeline-service.enabled=true
>> 
>> Sent from my iPhone
>> 
>> On Oct 15, 2016, at 8:36 PM, Sreenath Somarajapuram 
>>  wrote:
>> 
>>> Hi Stephen,
>>> 
>>> The error message is coming from ATS, and it says that the application data 
>>> is not available.
>>> And yes, tez_application_1476574340629_0001 is a legit value. It can be 
>>> considered as the id for Tez application details.
>>> 
>>> Please help me with these:
>>> 1. Are you having yarn.acl.enable = true in yarn-site.xml ?
>>> 2. On going to http://dwrdevnn1.sv2.trulia.com:9766 from your browser 
>>> window, the UI is supposed to display a list of DAGs. Are you able to view 
>>> them?
>>> 
>>> Thanks,
>>> Sreenath
>>> 
>>> From: Stephen Sprague 
>>> Reply-To: "user@tez.apache.org" 
>>> Date: Sunday, October 16, 2016 at 7:16 AM
>>> To: "user@tez.apache.org" 
>>> Subject: Tez UI
>>> 
>>> hey guys,
>>> i'm having hard time getting the Tez UI to work.  I'm sure i'm doing 
>>> something wrong but i can't seem to figure out.  Here's my scenario.
>>> 
>>> 1. i'm using nginx as the webserver. port 9766.   using that port without 
>>> params correctly displays index.html.  (i followed the instructions on 
>>> unzipping the war file - that seems ok - i'm using tez-ui2 )
>>> 
>>> 
>>> 2. i run a Tez job. It runs fine.
>>> 
>>> 
>>> 3. i click on the "History" hyperlink in the RM UI at 8088.
>>> 
>>> 
>>> 4. it attempts to r

Re: Origin of failed tasks

2016-10-12 Thread Hitesh Shah

If you have the logs for the application master, you can try the following: 

grep “[HISTORY]” | grep “TASK_ATTEMPT_FINISHED”

This will give you info on any failed task attempts. 

The AM logs have history events being published to them. You can do grep 
“[HISTORY]” | grep “_” where entity type is one of 
DAG, VERTEX, TASK, TASK_ATTEMPT and event type is STARTED or FINISHED. 

The logs are also split into diff files. e.g. 
The AM logs use a syslog_dag… format to split across dags. 
Task/Container logs use syslog_attempt* format to split out logs for different 
task attempts. 

If you have YARN timeline enabled, you can use the analyzers to do more 
analysis on the dag specific data. These are more related to perf tuning and 
not failure diagnostics though.

thanks
— Hitesh

> On Oct 11, 2016, at 5:09 PM, Allan Wilson  wrote:
> 
> Use the yarn logs command.  That's your only chance without the TEZ UI.  I 
> setup the TEZ UI
> In our shop and it is really nice.
> 
> Allan
> Sent from my iPhone
> 
>> On Oct 11, 2016, at 5:05 PM, Jan Morlock  wrote:
>> 
>> Hi,
>> 
>> currently failed tasks occur during the execution of my Hive/Tez job.
>> However in the end, the overall job succeeds. Is it possible to find out
>> afterwards about the origin of those failed tasks (without using the Tez
>> UI) just by analyzing the output log files?
>> 
>> Best regards
>> Jan

Re: adding local resource to classpath and/or java.library.path

2016-10-05 Thread Hitesh Shah

From the current dir perspective, the classpath could be set up to use 
“$PWD/archivename/*” where PWD in the container’s env would point to the 
container’s working directory. This is a bit more robust as compared to using 
“./“. Both should ideally work in most scenarios but using “./“ is dependent on 
YARN implementation behavior. 

— Hitesh


> On Oct 5, 2016, at 10:06 AM, Madhusudan Ramanna  wrote:
> 
> Seems like with this approach, there is no need to have information on 
> current dir.
> 
> thanks,
> Madhu
> 
> 
> On Tuesday, October 4, 2016 4:44 PM, Hitesh Shah  wrote:
> 
> 
> The env is one approach for augmenting classpath. The other approach which 
> modifies classpath for both the AM and the task containers is to use 
> “tez.cluster.additional.classpath.prefix” by setting it to something like 
> “./archive name/*” 
> 
> — Hitesh
> 
> > On Oct 4, 2016, at 4:38 PM, Madhusudan Ramanna  wrote:
> > 
> > Actually, we solved it using
> > 
> > tez.task.launch.env  in tez-site.xml
> > 
> > thanks,
> > Madhu
> > 
> > 
> > On Tuesday, October 4, 2016 11:13 AM, Madhusudan Ramanna 
> >  wrote:
> > 
> > 
> > Please note that we need this in the tez containers
> > 
> > thanks,
> > Madhu
> > 
> > 
> > On Tuesday, October 4, 2016 11:11 AM, Madhusudan Ramanna 
> >  wrote:
> > 
> > 
> > Hello Folks,
> > 
> > We have an archive local resource that is being expanded to 
> > 
> > PWD//*.*
> > 
> > How do we add PWD/ to classpath and java.library.path ?
> > 
> > thanks,
> > Madhu
> > 
> > 
> > 
> > 
> 
>

Re: Debugging M/R job with tez

2016-10-05 Thread Hitesh Shah

Thanks for filing the issues, Manuel. 

I took a quick look at trying to run the MR job in tez local mode. A native tez 
job running in local mode ( i.e. by running something like "hadoop jar 
./tez/tez-examples-0.9.0-SNAPSHOT.jar wordcount -Dtez.local.mode=true …” ) 
works but local mode when trying to run an MR job via the tez framework does 
not. I don’t believe that this has really worked at all since the initial 
implementation of local mode was committed. There are some quirks of the MR to 
Tez translation layer which are still pending from an MR local mode 
perspective. If you can file a JIRA for the local mode issue, I can provide a 
small patch that allowed me to make some minor headway before I ended up 
hitting other ones.  

thanks
— Hitesh


> On Oct 5, 2016, at 5:44 AM, Manuel Godbert  wrote:
> 
> Hello,
> 
> I just opened TEZ-3459, with attached code adressing 3 of the issues I 
> encountered, including the embedded jars one.
> 
> I did not manage yet to provide an example showing the issue I had with 
> multiple outputs. It would definitely help me if I could run my jobs locally 
> with Tez to understand the specificity of these jobs. Would it be possible to 
> get some support to set up my workstation to achieve this?
> 
> Brgds
> 
> Manuel
> 
> On Wed, Sep 28, 2016 at 8:37 PM, Hitesh Shah  wrote:
> Thanks for the context, Manuel.
> 
> Full compat with MR is something that has not really been fully tested with 
> Tez. We believe that it works for the most part but there are probably cases 
> out there which have either not been addressed or some which we are not aware 
> of.
> 
> It is great that you are trying this out. We can definitely help you figure 
> out these issues and get the fixes into Tez to allow more users to seamlessly 
> run MR jobs on Tez. It will be great if you can file a jira for the MR 
> distributed cache handling of archives in Tez. A simple example to reproduce 
> it would help a lot too so as to allow any of the Tez contributors to quickly 
> debug and fix. I am assuming you are passing in archives/fat-jars to the 
> distributed cache which MR implicitly applies ./* + ./lib/* pattern against 
> to add to the runtime classpath? I am guessing this is something we may not 
> have handled correctly in the translation layer.
> 
> thanks
> — Hitesh
> 
> > On Sep 28, 2016, at 9:38 AM, Manuel Godbert  
> > wrote:
> >
> > Hello,
> >
> > In non local mode my M/R jobs generally behave as expected with Tez. 
> > However some still resist, and I am trying to have them running locally to 
> > understand if I they can work with some changes (either in my code or in 
> > Tez code, and in that latter case I planned to contribute some way to the 
> > Tez effort). Runnning the WordCount locally is only a first step.
> >
> > I won't be able to provide source code easily for the real problematic 
> > jobs, as we use a quite big home made framework on top of hadoop and that 
> > is not open source... in a few words most of my issues actually seem to 
> > come from the task attempts IDs management. We have subclassed the output 
> > committers to manage multiple outputs, and when we reach the commit task 
> > step the produced files are not always where expected in the temporary task 
> > attempt paths. It is hard to say what happens exactly, and this is why I 
> > wanted to reproduce the issue locally before sharing it.
> >
> > Besides this, another minor issue we got is that we used to package our 
> > applicative jars with nested dependencies in /lib and these are ignored by 
> > Tez. We could easily work around this expanding these and adapting our 
> > classpath.
> >
> > Regards
> >
> > On Wed, Sep 28, 2016 at 5:46 PM, Hitesh Shah  wrote:
> > Hello Manuel,
> >
> > Thanks for reporting the issue. Let me try and reproduce this locally to 
> > see what is going on.
> >
> > A quick question in general though - are you hitting issues when running in 
> > non-local mode too? Would you mind sharing that details on the issues you 
> > hit?
> >
> > thanks
> > — Hitesh
> >
> >
> > > On Sep 27, 2016, at 9:53 AM, Manuel Godbert  
> > > wrote:
> > >
> > > Hello,
> > >
> > > I have map/reduce jobs that work as expected within YARN, and I want to 
> > > see if Tez can help me improving their performance. Alas, I am 
> > > experiencing issues and I want to understand what happens, to see if I 
> > > can adapt my code or if I can suggest Tez enhancements. For this I need 
> > > to be able to debug jobs from within eclipse, with br

Re: adding local resource to classpath and/or java.library.path

2016-10-04 Thread Hitesh Shah

The env is one approach for augmenting classpath. The other approach which 
modifies classpath for both the AM and the task containers is to use 
“tez.cluster.additional.classpath.prefix” by setting it to something like 
“./archive name/*” 

— Hitesh

> On Oct 4, 2016, at 4:38 PM, Madhusudan Ramanna  wrote:
> 
> Actually, we solved it using
> 
> tez.task.launch.env  in tez-site.xml
> 
> thanks,
> Madhu
> 
> 
> On Tuesday, October 4, 2016 11:13 AM, Madhusudan Ramanna 
>  wrote:
> 
> 
> Please note that we need this in the tez containers
> 
> thanks,
> Madhu
> 
> 
> On Tuesday, October 4, 2016 11:11 AM, Madhusudan Ramanna 
>  wrote:
> 
> 
> Hello Folks,
> 
> We have an archive local resource that is being expanded to 
> 
> PWD//*.*
> 
> How do we add PWD/ to classpath and java.library.path ?
> 
> thanks,
> Madhu
> 
> 
> 
>

Re: Zip Exception since commit da4098b9

2016-09-28 Thread Hitesh Shah

t;counterValue":1},{"counterName":"INPUT_SPLIT_LENGTH_BYTES","counterValue":14297},{"counterName":"OUTPUT_BYTES_WITH_OVERHEAD","counterValue":6},{"counterName":"OUTPUT_BYTES_PHYSICAL","counterValue":34}]}]},"successfulAttemptId":"attempt_1475091857089_0015_1_00_00_0"}}
> {"entity":"vertex_1475091857089_0015_1_00","entitytype":"TEZ_VERTEX_ID","events":[{"ts":1475098785759,"eventtype":"VERTEX_FINISHED"}],"otherinfo":{"endTime":1475098785759,"timeTaken":12129,"status":"SUCCEEDED","diagnostics":"","counters":{"counterGroups":[{"counterGroupName":"org.apache.tez.common.counters.DAGCounter","counters":[{"counterName":"DATA_LOCAL_TASKS","counterValue":1}]},{"counterGroupName":"org.apache.tez.common.counters.FileSystemCounter","counterGroupDisplayName":"File
>  System 
> Counters","counters":[{"counterName":"FILE_BYTES_WRITTEN","counterValue":42},{"counterName":"HDFS_BYTES_READ","counterValue":14297},{"counterName":"HDFS_READ_OPS","counterValue":2}]},{"counterGroupName":"org.apache.tez.common.counters.TaskCounter","counters":[{"counterName":"GC_TIME_MILLIS","counterValue":265},{"counterName":"CPU_MILLISECONDS","counterValue":7860},{"counterName":"PHYSICAL_MEMORY_BYTES","counterValue":265814016},{"counterName":"VIRTUAL_MEMORY_BYTES","counterValue":5392384000},{"counterName":"COMMITTED_HEAP_BYTES","counterValue":265814016},{"counterName":"INPUT_RECORDS_PROCESSED","counterValue":1},{"counterName":"INPUT_SPLIT_LENGTH_BYTES","counterValue":14297},{"counterName":"OUTPUT_BYTES_WITH_OVERHEAD","counterValue":6},{"counterName":"OUTPUT_BYTES_PHYSICAL","counterValue":34}]}]},"stats":{"firstTaskStartTime":1475098782472,"firstTasksToStart":["task_1475091857089_0015_1_00_00"],"lastTaskFinishTime":1475098785740,"lastTasksToFinish":["task_1475091857089_0015_1_00_00"],"minTaskDuration":3268,"maxTaskDuration":3268,"avgTaskDuration":3268,"shortestDurationTasks":["task_1475091857089_0015_1_00_00"],"longestDurationTasks":["task_1475091857089_0015_1_00_00"]},"numFailedTaskAttempts":0,"numKilledTaskAttempts":0,"numCompletedTasks":1,"numSucceededTasks":1,"numKilledTasks":0,"numFailedTasks":0,"servicePlugin":{"taskSchedulerName":"TezYarn","taskSchedulerClassName":"org.apache.tez.dag.app.rm.YarnTaskSchedulerService","taskCommunicatorName":"TezYarn","taskCommunicatorClassName":"org.apache.tez.dag.app.TezTaskCommunicatorImpl","containerLauncherName":"TezYarn","containerLauncherClassName":"org.apache.tez.dag.app.launcher.TezContainerLauncherImpl"}}}
> {"entity":"task_1475091857089_0015_1_01_00","entitytype":"TEZ_TASK_ID","relatedEntities":[{"entity":"vertex_1475091857089_0015_1_01","entitytype":"TEZ_VERTEX_ID"}],"events":[{"ts":1475098785787,"eventtype":"TASK_STARTED"}],"otherinfo":{"startTime":1475098785787,"scheduledTime":1475098785787}}
> {"entity":"attempt_1475091857089_0015_1_01_00_0","entitytype":"TEZ_TASK_ATTEMPT_ID","relatedEntities":[{"entity":"ip-10-1-2-173.us-west-2.compute.internal:8041","entitytype":"nodeId"},{"entity":"container_1475091857089_0015_01_02","entitytype":"containerId"},{"entity":"task_1475091857089_0015_1_01_00","entitytype":"TEZ_TASK_ID"}],"events":[{"ts":1475098785847,"eventtype":"TASK_ATTEMPT_STARTED"}],"otherinfo":{"inProgressLogsURL":"ip-10-1-2-173.us-west-2.compute.internal:8042\/node\/containerlogs\/container_1475091857089_0015_01_02\/apxqueue","completedLogsURL":"http:\/\/ip-10-1-3-71.us-west-2.compute.internal:19888\/jobhistory\/logs\/\/ip-10-1-2-173.us-west-2.compu

Re: Zip Exception since commit da4098b9

2016-09-28 Thread Hitesh Shah

To pinpoint the issue, one approach would be to change the history logger to 
SimpleHistoryLogger . i.e comment out the property for 
tez.history.logging.service.class in the configs so that it falls back to the 
default value. This should generate a history log file as part of the 
application logs which should help us understand whether tez itself is not 
generating the data or YARN timeline is somehow losing it. Any exceptions in 
the DAGAppMaster log and/or the yarn timeline logs when this job runs? 

— HItesh  



> On Sep 28, 2016, at 1:30 PM, Madhusudan Ramanna  wrote:
> 
> Hitesh,
> 
> Some information like appId is getting through to timeline server, but not 
> all. See attached.
> 
> Here is the output of 
> 
> http://timelinehost:port/ws/v1/timeline/TEZ_DAG_ID/
> {"entities":[{"events":[{"timestamp":1475094093409,"eventtype":"DAG_FINISHED","eventinfo":{}},{"timestamp":1475094062692,"eventtype":"DAG_STARTED","eventinfo":{}},{"timestamp":1475094062688,"eventtype":"DAG_INITIALIZED","eventinfo":{}},{"timestamp":1475094062055,"eventtype":"DAG_SUBMITTED","eventinfo":{}}],"entitytype":"TEZ_DAG_ID","entity":"dag_1475091857089_0007_1","starttime":1475094062055,"domain":"DEFAULT","relatedentities":{},"primaryfilters":{},"otherinfo":{}}]}
> 
> http://host:8188/ws/v1/timeline/TEZ_DAG_ID/dag_1475091857089_0007_1
> 
> {"events":[{"timestamp":1475094093409,"eventtype":"DAG_FINISHED","eventinfo":{}},{"timestamp":1475094062692,"eventtype":"DAG_STARTED","eventinfo":{}},{"timestamp":1475094062688,"eventtype":"DAG_INITIALIZED","eventinfo":{}},{"timestamp":1475094062055,"eventtype":"DAG_SUBMITTED","eventinfo":{}}],"entitytype":"TEZ_DAG_ID","entity":"dag_1475091857089_0007_1","starttime":1475094062055,"domain":"DEFAULT","relatedentities":{},"primaryfilters":{},"otherinfo":{}}
> 
> 
> 
> On Wednesday, September 28, 2016 8:44 AM, Hitesh Shah  
> wrote:
> 
> 
> Hello Madhusudan, 
> 
> Thanks for the patience. Let us take this to a jira where once you attach 
> more logs, we can root cause the issue.
> 
> A few things to attach to the jira:
>   - yarn-site.xml
>   - tez-site.xml
>   - hadoop version
>   - timeline server log for the time period in question
>   - application logs for any tez app which fails to display
>   - output of http://timelinehost:port/ws/v1/timeline/TEZ_DAG_ID// ( 
> e.g. dag_1475014682883_0027_1 )
> 
> thanks
> — Hitesh
> 
> > On Sep 27, 2016, at 10:42 PM, Madhusudan Ramanna  
> > wrote:
> > 
> > So I downloaded Tez commit 91a397b0ba and built the dist package.  We're 
> > not seeing the zip exception anymore.
> > 
> > However, now Tez UI is completely broken. Not at all sure what is happening 
> > here. Please see attached screenshots.
> > 
> > 
> > 2016-09-28 05:11:40,903 [INFO] [main] |web.WebUIService|: Tez UI History 
> > URL: http://dev-cv2.aws:8080/tez-ui/#/tez-app/application_1475014682883_0027
> > 2016-09-28 05:11:40,908 [INFO] [main] |history.HistoryEventHandler|: 
> > Initializing HistoryEventHandler withrecoveryEnabled=true, 
> > historyServiceClassName=org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService
> > 2016-09-28 05:11:41,474 [INFO] [main] |impl.TimelineClientImpl|: Timeline 
> > service address: http://ts-ip.aws:8188/ws/v1/timeline/
> > 2016-09-28 05:11:41,474 [INFO] [main] |ats.ATSHistoryLoggingService|: 
> > Initializing ATSHistoryLoggingService with maxEventsPerBatch=5, 
> > maxPollingTime(ms)=10, waitTimeForShutdown(ms)=-1, 
> > TimelineACLManagerClass=org.apache.tez.dag.history.ats.acls.ATSHistoryACLPolicyManager
> > 2016-09-28 05:11:41,644 [INFO] [main] |impl.TimelineClientImpl|: Timeline 
> > service address: http://ts-ip.aws:8188/ws/v1/timeline/
> > 
> > 
> > >>> DAG Execution
> > 
> > 2016-09-28 05:11:52,779 [INFO] [IPC Server handler 0 on 44039] 
> > |history.HistoryEventHandler|: 
> > [HISTORY][DAG:dag_1475014682883_0027_1][Event:DAG_SUBMITTED]: 
> > dagID=dag_1475014682883_0027_1, submitTime=1475039511185
> > 
> > 
> > Timeline server is up and running. Tez UI is however not able to display 
> > DAG and other details 
> > 
> > thanks,
> > Madhu
> > 
>

Re: Debugging M/R job with tez

2016-09-28 Thread Hitesh Shah

Thanks for the context, Manuel. 

Full compat with MR is something that has not really been fully tested with 
Tez. We believe that it works for the most part but there are probably cases 
out there which have either not been addressed or some which we are not aware 
of. 

It is great that you are trying this out. We can definitely help you figure out 
these issues and get the fixes into Tez to allow more users to seamlessly run 
MR jobs on Tez. It will be great if you can file a jira for the MR distributed 
cache handling of archives in Tez. A simple example to reproduce it would help 
a lot too so as to allow any of the Tez contributors to quickly debug and fix. 
I am assuming you are passing in archives/fat-jars to the distributed cache 
which MR implicitly applies ./* + ./lib/* pattern against to add to the runtime 
classpath? I am guessing this is something we may not have handled correctly in 
the translation layer. 

thanks
— Hitesh

> On Sep 28, 2016, at 9:38 AM, Manuel Godbert  wrote:
> 
> Hello,
> 
> In non local mode my M/R jobs generally behave as expected with Tez. However 
> some still resist, and I am trying to have them running locally to understand 
> if I they can work with some changes (either in my code or in Tez code, and 
> in that latter case I planned to contribute some way to the Tez effort). 
> Runnning the WordCount locally is only a first step.
> 
> I won't be able to provide source code easily for the real problematic jobs, 
> as we use a quite big home made framework on top of hadoop and that is not 
> open source... in a few words most of my issues actually seem to come from 
> the task attempts IDs management. We have subclassed the output committers to 
> manage multiple outputs, and when we reach the commit task step the produced 
> files are not always where expected in the temporary task attempt paths. It 
> is hard to say what happens exactly, and this is why I wanted to reproduce 
> the issue locally before sharing it.
> 
> Besides this, another minor issue we got is that we used to package our 
> applicative jars with nested dependencies in /lib and these are ignored by 
> Tez. We could easily work around this expanding these and adapting our 
> classpath.
> 
> Regards
> 
> On Wed, Sep 28, 2016 at 5:46 PM, Hitesh Shah  wrote:
> Hello Manuel,
> 
> Thanks for reporting the issue. Let me try and reproduce this locally to see 
> what is going on.
> 
> A quick question in general though - are you hitting issues when running in 
> non-local mode too? Would you mind sharing that details on the issues you hit?
> 
> thanks
> — Hitesh
> 
> 
> > On Sep 27, 2016, at 9:53 AM, Manuel Godbert  
> > wrote:
> >
> > Hello,
> >
> > I have map/reduce jobs that work as expected within YARN, and I want to see 
> > if Tez can help me improving their performance. Alas, I am experiencing 
> > issues and I want to understand what happens, to see if I can adapt my code 
> > or if I can suggest Tez enhancements. For this I need to be able to debug 
> > jobs from within eclipse, with breakpoints in Tez source code etc.
> >
> > I am working on a linux (ubuntu) platform
> > I use the latest Tez version I found, i.e. 0.9.0-SNAPSHOT (also tried with 
> > 0.7.0)
> > I have set up the hortonworks mini dev cluster 
> > https://github.com/hortonworks/mini-dev-cluster
> > I am trying to run the basic WordCount2 code found here 
> > https://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Example:_WordCount_v2.0
> > I added the following code to have tez running locally:
> > conf.set("mapreduce.framework.name", "yarn-tez");
> > conf.setBoolean("tez.local.mode", true);
> > conf.set("fs.default.name", "file:///");
> > conf.setBoolean("tez.runtime.optimize.local.fetch", true);
> >
> > And I am getting the following error:
> >
> > 2016-09-27 18:32:34 Running Dag: dag_1474992804027_0003_1
> > 2016-09-27 18:32:34 Running Dag: dag_1474992804027_0003_1
> > Exception in thread "main" java.lang.NullPointerException
> >   at 
> > org.apache.tez.client.LocalClient.getApplicationReport(LocalClient.java:153)
> >   at 
> > org.apache.tez.dag.api.client.rpc.DAGClientRPCImpl.getAppReport(DAGClientRPCImpl.java:231)
> >   at 
> > org.apache.tez.dag.api.client.rpc.DAGClientRPCImpl.createAMProxyIfNeeded(DAGClientRPCImpl.java:251)
> >   at 
> > org.apache.tez.dag.api.client.rpc.DAGClientRPCImpl.getDAGStatus(DAGClientRPCImpl.java:96)
> >   at 
> > org.apache.tez.dag.api.client.DAGClientImpl.getDAGStatusViaAM(DAGCl

Re: Debugging M/R job with tez

2016-09-28 Thread Hitesh Shah

Hello Manuel, 

Thanks for reporting the issue. Let me try and reproduce this locally to see 
what is going on. 

A quick question in general though - are you hitting issues when running in 
non-local mode too? Would you mind sharing that details on the issues you hit?

thanks
— Hitesh


> On Sep 27, 2016, at 9:53 AM, Manuel Godbert  wrote:
> 
> Hello,
> 
> I have map/reduce jobs that work as expected within YARN, and I want to see 
> if Tez can help me improving their performance. Alas, I am experiencing 
> issues and I want to understand what happens, to see if I can adapt my code 
> or if I can suggest Tez enhancements. For this I need to be able to debug 
> jobs from within eclipse, with breakpoints in Tez source code etc.
> 
> I am working on a linux (ubuntu) platform
> I use the latest Tez version I found, i.e. 0.9.0-SNAPSHOT (also tried with 
> 0.7.0)
> I have set up the hortonworks mini dev cluster 
> https://github.com/hortonworks/mini-dev-cluster
> I am trying to run the basic WordCount2 code found here 
> https://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Example:_WordCount_v2.0
> I added the following code to have tez running locally:
> conf.set("mapreduce.framework.name", "yarn-tez");
> conf.setBoolean("tez.local.mode", true);
> conf.set("fs.default.name", "file:///");
> conf.setBoolean("tez.runtime.optimize.local.fetch", true);
> 
> And I am getting the following error:
> 
> 2016-09-27 18:32:34 Running Dag: dag_1474992804027_0003_1
> 2016-09-27 18:32:34 Running Dag: dag_1474992804027_0003_1
> Exception in thread "main" java.lang.NullPointerException
>   at 
> org.apache.tez.client.LocalClient.getApplicationReport(LocalClient.java:153)
>   at 
> org.apache.tez.dag.api.client.rpc.DAGClientRPCImpl.getAppReport(DAGClientRPCImpl.java:231)
>   at 
> org.apache.tez.dag.api.client.rpc.DAGClientRPCImpl.createAMProxyIfNeeded(DAGClientRPCImpl.java:251)
>   at 
> org.apache.tez.dag.api.client.rpc.DAGClientRPCImpl.getDAGStatus(DAGClientRPCImpl.java:96)
>   at 
> org.apache.tez.dag.api.client.DAGClientImpl.getDAGStatusViaAM(DAGClientImpl.java:360)
>   at 
> org.apache.tez.dag.api.client.DAGClientImpl.getDAGStatusInternal(DAGClientImpl.java:220)
>   at 
> org.apache.tez.dag.api.client.DAGClientImpl.getDAGStatus(DAGClientImpl.java:268)
>   at 
> org.apache.tez.dag.api.client.MRDAGClient.getDAGStatus(MRDAGClient.java:58)
>   at 
> org.apache.tez.mapreduce.client.YARNRunner.getJobStatus(YARNRunner.java:710)
>   at 
> org.apache.tez.mapreduce.client.YARNRunner.submitJob(YARNRunner.java:650)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:240)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
>   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
>   at WordCount2.main(WordCount2.java:136)
> 
> Please help me understanding what I am doing wrong!
> 
> Regards

Re: Zip Exception since commit da4098b9

2016-09-28 Thread Hitesh Shah

Hello Madhusudan, 

Thanks for the patience. Let us take this to a jira where once you attach more 
logs, we can root cause the issue.

A few things to attach to the jira:
   - yarn-site.xml
   - tez-site.xml
   - hadoop version
   - timeline server log for the time period in question
   - application logs for any tez app which fails to display
   - output of http://timelinehost:port/ws/v1/timeline/TEZ_DAG_ID// ( 
e.g. dag_1475014682883_0027_1 )

thanks
— Hitesh

> On Sep 27, 2016, at 10:42 PM, Madhusudan Ramanna  wrote:
> 
> So I downloaded Tez commit 91a397b0ba and built the dist package.  We're not 
> seeing the zip exception anymore.
> 
> However, now Tez UI is completely broken. Not at all sure what is happening 
> here. Please see attached screenshots.
> 
> 
> 2016-09-28 05:11:40,903 [INFO] [main] |web.WebUIService|: Tez UI History URL: 
> http://dev-cv2.aws:8080/tez-ui/#/tez-app/application_1475014682883_0027
> 2016-09-28 05:11:40,908 [INFO] [main] |history.HistoryEventHandler|: 
> Initializing HistoryEventHandler withrecoveryEnabled=true, 
> historyServiceClassName=org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService
> 2016-09-28 05:11:41,474 [INFO] [main] |impl.TimelineClientImpl|: Timeline 
> service address: http://ts-ip.aws:8188/ws/v1/timeline/
> 2016-09-28 05:11:41,474 [INFO] [main] |ats.ATSHistoryLoggingService|: 
> Initializing ATSHistoryLoggingService with maxEventsPerBatch=5, 
> maxPollingTime(ms)=10, waitTimeForShutdown(ms)=-1, 
> TimelineACLManagerClass=org.apache.tez.dag.history.ats.acls.ATSHistoryACLPolicyManager
> 2016-09-28 05:11:41,644 [INFO] [main] |impl.TimelineClientImpl|: Timeline 
> service address: http://ts-ip.aws:8188/ws/v1/timeline/
> 
> 
> >>> DAG Execution
> 
> 2016-09-28 05:11:52,779 [INFO] [IPC Server handler 0 on 44039] 
> |history.HistoryEventHandler|: 
> [HISTORY][DAG:dag_1475014682883_0027_1][Event:DAG_SUBMITTED]: 
> dagID=dag_1475014682883_0027_1, submitTime=1475039511185
> 
> 
> Timeline server is up and running. Tez UI is however not able to display DAG 
> and other details 
> 
> thanks,
> Madhu
> 
> 
> 
> On Saturday, September 24, 2016 12:01 PM, Hitesh Shah  
> wrote:
> 
> 
> tez-dist tar balls are not published to maven today - only the module 
> specific jars are. But yes, you could just try a local build to see if you 
> can reproduce the issue with the commit in question. 
> 
> — Hitesh
> 
> 
> > On Sep 23, 2016, at 6:23 PM, Madhusudan Ramanna  wrote:
> > 
> > Hitesh and Zhiyuan,
> > 
> > Apache snapshots doesn't seem to have tez-dist 
> > 
> > http://repository.apache.org/content/groups/snapshots/org/apache/tez/tez-dist/
> > 
> > The last one seems to be 0.2.0-SNAPSHOT
> > 
> > Should I just download based on the commit and recompile ? 
> > 
> > thanks,
> > Madhu
> > 
> > 
> > On Friday, September 23, 2016 5:19 PM, Hitesh Shah  
> > wrote:
> > 
> > 
> > Hello Madhusudan,
> > 
> > If you look at the MANIFEST.MF inside any of the tez jars, it will provide 
> > the commit hash via the SCM-Revision field.
> > 
> > The tez client and the DAGAppMaster also log this info at runtime.
> > 
> > — Hitesh 
> > 
> > > On Sep 23, 2016, at 4:08 PM, Madhusudan Ramanna  
> > > wrote:
> > > 
> > > Zhiyuan,
> > > 
> > > We just pulled down the latest snapshot from Apache repository.  
> > > Question, is how can I figure out branch and commit information from the 
> > > snapshot artifact ?
> > > 
> > > thanks,
> > > Madhu
> > > 
> > > 
> > > On Friday, September 23, 2016 10:38 AM, zhiyuan yang  
> > > wrote:
> > > 
> > > 
> > > Hi Madhu,
> > > 
> > > It looks like a Inflater-Deflater mismatch to me. From stack traces I see 
> > > you cherry-picked this patch instead of using master branch.
> > > Would you mind double check whether the patch is correctly cherry-picked?
> > > 
> > > Thanks!
> > > Zhiyuan
> > > 
> > >> On Sep 23, 2016, at 10:21 AM, Madhusudan Ramanna  
> > >> wrote:
> > >> 
> > >> Hello,
> > >> 
> > >> We're using the Apache snapshot repository to pull latest tez snapshots. 
> > >> 
> > >> We've started seeing this exception:
> > >> 
> > >> org.apache.tez.dag.api.TezUncheckedException: 
> > >> java.util.zip.ZipException: incorrect header check
> > >> at 
> > >>

Re: Zip Exception since commit da4098b9

2016-09-24 Thread Hitesh Shah

tez-dist tar balls are not published to maven today - only the module specific 
jars are. But yes, you could just try a local build to see if you can reproduce 
the issue with the commit in question. 

— Hitesh


> On Sep 23, 2016, at 6:23 PM, Madhusudan Ramanna  wrote:
> 
> Hitesh and Zhiyuan,
> 
> Apache snapshots doesn't seem to have tez-dist 
> 
> http://repository.apache.org/content/groups/snapshots/org/apache/tez/tez-dist/
> 
> The last one seems to be 0.2.0-SNAPSHOT
> 
> Should I just download based on the commit and recompile ? 
> 
> thanks,
> Madhu
> 
> 
> On Friday, September 23, 2016 5:19 PM, Hitesh Shah  wrote:
> 
> 
> Hello Madhusudan,
> 
> If you look at the MANIFEST.MF inside any of the tez jars, it will provide 
> the commit hash via the SCM-Revision field.
> 
> The tez client and the DAGAppMaster also log this info at runtime.
> 
> — Hitesh 
> 
> > On Sep 23, 2016, at 4:08 PM, Madhusudan Ramanna  wrote:
> > 
> > Zhiyuan,
> > 
> > We just pulled down the latest snapshot from Apache repository.  Question, 
> > is how can I figure out branch and commit information from the snapshot 
> > artifact ?
> > 
> > thanks,
> > Madhu
> > 
> > 
> > On Friday, September 23, 2016 10:38 AM, zhiyuan yang  
> > wrote:
> > 
> > 
> > Hi Madhu,
> > 
> > It looks like a Inflater-Deflater mismatch to me. From stack traces I see 
> > you cherry-picked this patch instead of using master branch.
> > Would you mind double check whether the patch is correctly cherry-picked?
> > 
> > Thanks!
> > Zhiyuan
> > 
> >> On Sep 23, 2016, at 10:21 AM, Madhusudan Ramanna  
> >> wrote:
> >> 
> >> Hello,
> >> 
> >> We're using the Apache snapshot repository to pull latest tez snapshots. 
> >> 
> >> We've started seeing this exception:
> >> 
> >> org.apache.tez.dag.api.TezUncheckedException: java.util.zip.ZipException: 
> >> incorrect header check
> >> at 
> >> org.apache.tez.dag.library.vertexmanager.ShuffleVertexManager.handleVertexManagerEvent(ShuffleVertexManager.java:622)
> >> at 
> >> org.apache.tez.dag.library.vertexmanager.ShuffleVertexManager.onVertexManagerEventReceived(ShuffleVertexManager.java:579)
> >> at 
> >> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEventReceived.invoke(VertexManager.java:606)
> >> at 
> >> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:647)
> >> at 
> >> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:642)
> >> at java.security.AccessController.doPrivileged(Native Method)
> >> at javax.security.auth.Subject.doAs(Subject.java:422)
> >> at 
> >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> >> at 
> >> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:642)
> >> at 
> >> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:631)
> >> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> >> at 
> >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> >> at 
> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> >> at java.lang.Thread.run(Thread.java:745)
> >> Caused by: java.util.zip.ZipException: incorrect header check
> >> at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:164)
> >> at java.io.FilterInputStream.read(FilterInputStream.java:107)
> >> at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1792)
> >> at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1769)
> >> at org.apache.commons.io.IOUtils.copy(IOUtils.java:1744)
> >> at org.apache.commons.io.IOUtils.toByteArray(IOUtils.java:462)
> >> 
> >> 
> >> since this commit
> >> 
> >> https://github.com/apache/tez/commit/da4098b9d6f72e6d4aacc1623622a0875408d2ba
> >> 
> >> 
> >> Wanted to bring this to your attention. For now we've locked the snapshot 
> >> version down.
> >> 
> >> thanks,
> >> Madhu
> > 
> > 
> > 
> 
>

Re: Zip Exception since commit da4098b9

2016-09-23 Thread Hitesh Shah

Hello Madhusudan,

If you look at the MANIFEST.MF inside any of the tez jars, it will provide the 
commit hash via the SCM-Revision field.

The tez client and the DAGAppMaster also log this info at runtime.

— Hitesh 

> On Sep 23, 2016, at 4:08 PM, Madhusudan Ramanna  wrote:
> 
> Zhiyuan,
> 
> We just pulled down the latest snapshot from Apache repository.   Question, 
> is how can I figure out branch and commit information from the snapshot 
> artifact ?
> 
> thanks,
> Madhu
> 
> 
> On Friday, September 23, 2016 10:38 AM, zhiyuan yang  
> wrote:
> 
> 
> Hi Madhu,
> 
> It looks like a Inflater-Deflater mismatch to me. From stack traces I see you 
> cherry-picked this patch instead of using master branch.
> Would you mind double check whether the patch is correctly cherry-picked?
> 
> Thanks!
> Zhiyuan
> 
>> On Sep 23, 2016, at 10:21 AM, Madhusudan Ramanna  wrote:
>> 
>> Hello,
>> 
>> We're using the Apache snapshot repository to pull latest tez snapshots. 
>> 
>> We've started seeing this exception:
>> 
>> org.apache.tez.dag.api.TezUncheckedException: java.util.zip.ZipException: 
>> incorrect header check
>> at 
>> org.apache.tez.dag.library.vertexmanager.ShuffleVertexManager.handleVertexManagerEvent(ShuffleVertexManager.java:622)
>> at 
>> org.apache.tez.dag.library.vertexmanager.ShuffleVertexManager.onVertexManagerEventReceived(ShuffleVertexManager.java:579)
>> at 
>> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEventReceived.invoke(VertexManager.java:606)
>> at 
>> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:647)
>> at 
>> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:642)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:422)
>> at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>> at 
>> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:642)
>> at 
>> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:631)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.util.zip.ZipException: incorrect header check
>> at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:164)
>> at java.io.FilterInputStream.read(FilterInputStream.java:107)
>> at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1792)
>> at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1769)
>> at org.apache.commons.io.IOUtils.copy(IOUtils.java:1744)
>> at org.apache.commons.io.IOUtils.toByteArray(IOUtils.java:462)
>> 
>> 
>> since this commit
>> 
>> https://github.com/apache/tez/commit/da4098b9d6f72e6d4aacc1623622a0875408d2ba
>> 
>> 
>> Wanted to bring this to your attention. For now we've locked the snapshot 
>> version down.
>> 
>> thanks,
>> Madhu
> 
> 
>

Re: Zip Exception since commit da4098b9

2016-09-23 Thread Hitesh Shah

Hello Madhusudan 

Thanks for reporting the issue. Would you mind filing a bug at 
https://issues.apache.org/jira/browse/tez with the application logs and tez 
configs attached? If you have a simple dag/job example that reproduces the 
behavior that would be great too.

thanks
— Hitesh

> On Sep 23, 2016, at 10:38 AM, zhiyuan yang  wrote:
> 
> Hi Madhu,
> 
> It looks like a Inflater-Deflater mismatch to me. From stack traces I see you 
> cherry-picked this patch instead of using master branch.
> Would you mind double check whether the patch is correctly cherry-picked?
> 
> Thanks!
> Zhiyuan
> 
>> On Sep 23, 2016, at 10:21 AM, Madhusudan Ramanna  wrote:
>> 
>> Hello,
>> 
>> We're using the Apache snapshot repository to pull latest tez snapshots. 
>> 
>> We've started seeing this exception:
>> 
>> org.apache.tez.dag.api.TezUncheckedException: java.util.zip.ZipException: 
>> incorrect header check
>> at 
>> org.apache.tez.dag.library.vertexmanager.ShuffleVertexManager.handleVertexManagerEvent(ShuffleVertexManager.java:622)
>> at 
>> org.apache.tez.dag.library.vertexmanager.ShuffleVertexManager.onVertexManagerEventReceived(ShuffleVertexManager.java:579)
>> at 
>> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEventReceived.invoke(VertexManager.java:606)
>> at 
>> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:647)
>> at 
>> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:642)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:422)
>> at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>> at 
>> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:642)
>> at 
>> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:631)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.util.zip.ZipException: incorrect header check
>> at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:164)
>> at java.io.FilterInputStream.read(FilterInputStream.java:107)
>> at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1792)
>> at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1769)
>> at org.apache.commons.io.IOUtils.copy(IOUtils.java:1744)
>> at org.apache.commons.io.IOUtils.toByteArray(IOUtils.java:462)
>> 
>> 
>> since this commit
>> 
>> https://github.com/apache/tez/commit/da4098b9d6f72e6d4aacc1623622a0875408d2ba
>> 
>> 
>> Wanted to bring this to your attention. For now we've locked the snapshot 
>> version down.
>> 
>> thanks,
>> Madhu
>

Re: Parallel queries to HS2/Tez

2016-08-29 Thread Hitesh Shah

I think there are some thread pool related settings in HiveServer2 which could 
be used to throttle the no. of concurrent queries down to 1. One quick search 
led me to https://issues.apache.org/jira/browse/HIVE-5229 but you may wish to 
ask the same question on the hive mailing lists for a definitive answer. 

thanks
— Hitesh 


> On Aug 27, 2016, at 1:02 AM, Chitragar, Uday (KMLWG) 
>  wrote:
> 
> Hi Hitesh,
> 
> Thank you for the advice. While I get dev help on TEZ-3420, are there any 
> recommendations in terms of configuring HIVE/HS2 to run the dags 
> sequentially? Interestingly this is not a problem with HDP deployment which 
> obviously has a 'fuller' setup.  Local mode really helps to test.
> 
> Thank you,
> Uday
> From: Hitesh Shah 
> Sent: 25 August 2016 20:06:30
> To: user@tez.apache.org
> Subject: Re: Parallel queries to HS2/Tez
>  
> Hello Uday,
> 
> I don’t believe anyone has tried running 2 dags in parallel in local mode 
> within the same TezClient ( and definitely not for HiveServer2 ). If this is 
> with 2 instances of Tez client, this could likely be a bug in terms of either 
> how Hive is setting up the TezClient for local mode with the same directories 
> or a bug somewhere in Tez where clashing directories for intermediate data 
> might be causing an issue. FWIW, the Tez AM does not support running 2 dags 
> in parallel and quite a bit of this code path is used with local mode. 
> 
> It would be great if you could file a JIRA for this with more detailed logs 
> and then take help of the dev community to come up with a patch that 
> addresses the issue in your environment.
> 
> thanks
> — Hitesh 
> 
> 
> 
>  
> 
> > On Aug 25, 2016, at 8:34 AM, Chitragar, Uday (KMLWG) 
> >  wrote:
> > 
> > Hello,
> >  
> > When running parallel queries (simultaneous connections by two beeline 
> > clients to HS2), I get the following exception (full debug attached), 
> > interestingly running the queries one after the other completes without any 
> > problem. 
> >  
> > The setup is Hive (1.2.1) and Tez (0.8.4) running in local mode.
> > Apologies in advance if this forum is not the right place for this 
> > question, thank you.
> >  
> > 2016-08-25 15:45:41,333 DEBUG 
> > [TezTaskEventRouter{attempt_1472136335089_0001_1_01_00_0}]: 
> > impl.ShuffleInputEventHandlerImpl 
> > (ShuffleInputEventHandlerImpl.java:processDataMovementEvent(127)) - DME 
> > srcIdx: 0, targetIndex: 9, attemptNum
> > : 0, payload: [hasEmptyPartitions: true, host: , port: 0, pathComponent: , 
> > runDuration: 0]
> > 2016-08-25 15:45:41,557 ERROR [TezChild]: tez.MapRecordSource 
> > (MapRecordSource.java:processRow(90)) - java.lang.IllegalStateException: 
> > Invalid input path 
> > file:/acorn/QC/OraExtract/20160131/Devices/Devices_extract_20160229T080613_3
> > at 
> > org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:415)
> > at 
> > org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:457)
> > at 
> > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1069)
> > at 
> > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:501)
> >  
> >  
> >  
> > 2016-08-25 15:45:41,817 INFO  [TezChild]: io.HiveContextAwareRecordReader 
> > (HiveContextAwareRecordReader.java:doNext(326)) –
> > Cannot get partition description from 
> > file:/acorn/QC/reportlib/VM_ValEdit.24656because cannot find dir = file:/ac
> > orn/QC/reportlib/VM_ValEdit.24656 in pathToPartitionInfo: 
> > [file:/acorn/QC/OraExtract/20160131/Devices]
> >  
> >  
> >  
> > Regards,
> > Uday
> >  
> >  
> > 
> > 
> > Kantar Disclaimer

Re: Node unable to start vertex

2016-08-25 Thread Hitesh Shah

Created https://cwiki.apache.org/confluence/display/TEZ/FAQ which might be a 
better fit for such content and other related questions down the line. 

> On Aug 25, 2016, at 1:16 PM, Hitesh Shah  wrote:
> 
> +1. Would you like to contribute the content? You should be able to add an 
> article under 
> https://cwiki.apache.org/confluence/display/TEZ/Troubleshooting+articles.
> 
> If you hit any permission issues, feel free to reply back with your 
> confluence id. 
> 
> thanks
> — Hitesh 
> 
> 
>> On Aug 25, 2016, at 12:59 PM, Madhusudan Ramanna  wrote:
>> 
>> Thanks, #2 worked !
>> 
>> Might be a good idea to add to confluence ?
>> 
>> Madhu
>> 
>> 
>> On Thursday, August 25, 2016 12:00 PM, Hitesh Shah  wrote:
>> 
>> 
>> Hello Madhu, 
>> 
>> There are 2 approaches for this:
>> 
>> 1) Programmatically, for user code running in tasks, you would need to use 
>> either DAG::addTaskLocalFiles() or Vertex::addTaskLocalFiles() - former if 
>> the same jars are needed in all tasks of the DAG.  
>> TezClient::addAppMasterLocalFiles only impacts the ApplicationMaster. 
>> 
>> 2) Configure tez.aux.uris. This will ensure that all files specified here 
>> will be available in the AM and all tasks.  
>> 
>> thanks
>> — Hitesh
>> 
>>> On Aug 25, 2016, at 11:46 AM, Madhusudan Ramanna  
>>> wrote:
>>> 
>>> Hello,
>>> 
>>> I'm trying to extend TezExamplesBase and get a dag running on yarn (pseudo 
>>> cluster mode on my host).
>>> 
>>> For some reason, I'm running into class not found exception on the node
>>> 
>>> Vertex failed, vertexName=v1, vertexId=vertex_1471907702278_0030_1_00, 
>>> diagnostics=[Task failed, taskId=task_1471907702278_0030_1_00_00, 
>>> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
>>> failure ) : 
>>> attempt_1471907702278_0030_1_00_00_0:org.apache.tez.dag.api.TezReflectionException:
>>>  Unable to load class: sample.sampletez.OnetoOne$V1Processor
>>> at org.apache.tez.common.ReflectionUtils.getClazz(ReflectionUtils.java:46)
>>> at 
>>> org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:87)
>>> at 
>>> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.createProcessor(LogicalIOProcessorRuntimeTask.java:668)
>>> 
>>> 
>>> Should I be adding my application jar somewhere so that it can get 
>>> distributed ? I tried adding my jar via  tezClient.addAppMasterLocalFiles() 
>>> but it didn't help.
>>> 
>>> What am I not doing ? 
>>> 
>>> thanks!
>>> Madhu
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>

Re: Node unable to start vertex

2016-08-25 Thread Hitesh Shah

+1. Would you like to contribute the content? You should be able to add an 
article under 
https://cwiki.apache.org/confluence/display/TEZ/Troubleshooting+articles.

If you hit any permission issues, feel free to reply back with your confluence 
id. 

thanks
— Hitesh 


> On Aug 25, 2016, at 12:59 PM, Madhusudan Ramanna  wrote:
> 
> Thanks, #2 worked !
> 
> Might be a good idea to add to confluence ?
> 
> Madhu
> 
> 
> On Thursday, August 25, 2016 12:00 PM, Hitesh Shah  wrote:
> 
> 
> Hello Madhu, 
> 
> There are 2 approaches for this:
> 
> 1) Programmatically, for user code running in tasks, you would need to use 
> either DAG::addTaskLocalFiles() or Vertex::addTaskLocalFiles() - former if 
> the same jars are needed in all tasks of the DAG.  
> TezClient::addAppMasterLocalFiles only impacts the ApplicationMaster. 
> 
> 2) Configure tez.aux.uris. This will ensure that all files specified here 
> will be available in the AM and all tasks.  
> 
> thanks
> — Hitesh
> 
> > On Aug 25, 2016, at 11:46 AM, Madhusudan Ramanna  
> > wrote:
> > 
> > Hello,
> > 
> > I'm trying to extend TezExamplesBase and get a dag running on yarn (pseudo 
> > cluster mode on my host).
> > 
> > For some reason, I'm running into class not found exception on the node
> > 
> > Vertex failed, vertexName=v1, vertexId=vertex_1471907702278_0030_1_00, 
> > diagnostics=[Task failed, taskId=task_1471907702278_0030_1_00_00, 
> > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> > failure ) : 
> > attempt_1471907702278_0030_1_00_00_0:org.apache.tez.dag.api.TezReflectionException:
> >  Unable to load class: sample.sampletez.OnetoOne$V1Processor
> > at org.apache.tez.common.ReflectionUtils.getClazz(ReflectionUtils.java:46)
> > at 
> > org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:87)
> > at 
> > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.createProcessor(LogicalIOProcessorRuntimeTask.java:668)
> > 
> > 
> > Should I be adding my application jar somewhere so that it can get 
> > distributed ? I tried adding my jar via  tezClient.addAppMasterLocalFiles() 
> > but it didn't help.
> > 
> > What am I not doing ? 
> > 
> > thanks!
> > Madhu
> > 
> > 
> > 
> > 
> > 
> 
>

Re: Parallel queries to HS2/Tez

2016-08-25 Thread Hitesh Shah

Hello Uday,

I don’t believe anyone has tried running 2 dags in parallel in local mode 
within the same TezClient ( and definitely not for HiveServer2 ). If this is 
with 2 instances of Tez client, this could likely be a bug in terms of either 
how Hive is setting up the TezClient for local mode with the same directories 
or a bug somewhere in Tez where clashing directories for intermediate data 
might be causing an issue. FWIW, the Tez AM does not support running 2 dags in 
parallel and quite a bit of this code path is used with local mode. 

It would be great if you could file a JIRA for this with more detailed logs and 
then take help of the dev community to come up with a patch that addresses the 
issue in your environment.

thanks
— Hitesh 



 

> On Aug 25, 2016, at 8:34 AM, Chitragar, Uday (KMLWG) 
>  wrote:
> 
> Hello,
>  
> When running parallel queries (simultaneous connections by two beeline 
> clients to HS2), I get the following exception (full debug attached), 
> interestingly running the queries one after the other completes without any 
> problem. 
>  
> The setup is Hive (1.2.1) and Tez (0.8.4) running in local mode.
> Apologies in advance if this forum is not the right place for this question, 
> thank you.
>  
> 2016-08-25 15:45:41,333 DEBUG 
> [TezTaskEventRouter{attempt_1472136335089_0001_1_01_00_0}]: 
> impl.ShuffleInputEventHandlerImpl 
> (ShuffleInputEventHandlerImpl.java:processDataMovementEvent(127)) - DME 
> srcIdx: 0, targetIndex: 9, attemptNum
> : 0, payload: [hasEmptyPartitions: true, host: , port: 0, pathComponent: , 
> runDuration: 0]
> 2016-08-25 15:45:41,557 ERROR [TezChild]: tez.MapRecordSource 
> (MapRecordSource.java:processRow(90)) - java.lang.IllegalStateException: 
> Invalid input path 
> file:/acorn/QC/OraExtract/20160131/Devices/Devices_extract_20160229T080613_3
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:415)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:457)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1069)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:501)
>  
>  
>  
> 2016-08-25 15:45:41,817 INFO  [TezChild]: io.HiveContextAwareRecordReader 
> (HiveContextAwareRecordReader.java:doNext(326)) –
> Cannot get partition description from 
> file:/acorn/QC/reportlib/VM_ValEdit.24656because cannot find dir = file:/ac
> orn/QC/reportlib/VM_ValEdit.24656 in pathToPartitionInfo: 
> [file:/acorn/QC/OraExtract/20160131/Devices]
>  
>  
>  
> Regards,
> Uday
>  
>  
> 
> 
> Kantar Disclaimer

Re: Node unable to start vertex

2016-08-25 Thread Hitesh Shah

Hello Madhu, 

There are 2 approaches for this:

1) Programmatically, for user code running in tasks, you would need to use 
either DAG::addTaskLocalFiles() or Vertex::addTaskLocalFiles() - former if the 
same jars are needed in all tasks of the DAG.  
TezClient::addAppMasterLocalFiles only impacts the ApplicationMaster. 

2) Configure tez.aux.uris. This will ensure that all files specified here will 
be available in the AM and all tasks.  

thanks
— Hitesh

> On Aug 25, 2016, at 11:46 AM, Madhusudan Ramanna  wrote:
> 
> Hello,
> 
> I'm trying to extend TezExamplesBase and get a dag running on yarn (pseudo 
> cluster mode on my host).
> 
> For some reason, I'm running into class not found exception on the node
> 
> Vertex failed, vertexName=v1, vertexId=vertex_1471907702278_0030_1_00, 
> diagnostics=[Task failed, taskId=task_1471907702278_0030_1_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1471907702278_0030_1_00_00_0:org.apache.tez.dag.api.TezReflectionException:
>  Unable to load class: sample.sampletez.OnetoOne$V1Processor
> at org.apache.tez.common.ReflectionUtils.getClazz(ReflectionUtils.java:46)
> at 
> org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:87)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.createProcessor(LogicalIOProcessorRuntimeTask.java:668)
> 
> 
> Should I be adding my application jar somewhere so that it can get 
> distributed ? I tried adding my jar via  tezClient.addAppMasterLocalFiles() 
> but it didn't help.
> 
> What am I not doing ? 
> 
> thanks!
> Madhu
> 
> 
> 
> 
>

Re: Extra JAR files in the minimal distribution package?

2016-08-16 Thread Hitesh Shah

Hello Nathaniel, 

You are probably right that they should not be as long as the cluster classpath 
used contains the MR jars. I believe these jars were retained as a result of 
using yarn.application.classpath for augmenting the runtime classpath when 
using the classpath from the cluster instead of the full tarball approach. The 
yarn.application.classpath config by default brings in only common, hdfs and 
yarn jars and not necessarily the MR jars. 

@Jon Eagles, @Jason Lowe - do you have any additional comments on how this is 
deployed at Yahoo? I believe you use a combination of the minimal tarball and 
the mapreduce/hadoop tarball - in this case, have you removed the MR jars from 
the minimal tarball? 

thanks
— Hitesh

> On Aug 16, 2016, at 4:43 AM, Nathaniel Braun  wrote:
> 
> Hello,
> 
> I'm building the 0.8.3 version of Tez to test on my cluster.
> 
> Looking at the minimal distribution pacakge, I can see the following to JAR 
> files inside:
>- hadoop-mapreduce-client-common
>- hadoop-mapreduce-client-core
> 
> Aren't these supposed to be excluded from the build, the same way other 
> Hadoop libraries are?
> 
> Thanks!
> 
> Regards,
> Nathaniel

Re: Questions about Tez

2016-08-12 Thread Hitesh Shah

When comparing just a simple MR job to a Tez dag with 2 vertices, the perf 
improvements are limited (as the plan is pretty much the same and data is 
transferred via a shuffle edge):
   - container re-use
   - pipelined sorter vs the MR sorter ( your mileage may vary here depending 
on the kind of work load )
   - auto-reduce parallelism
   - dynamic splits grouping for the map vertex ( assuming splits are 
calculated in the client )

For the second question, the dag plan/structure and the processor are both 
user-code and therefore which output to write to is driven by user/business 
logic. If you write a tee processor, it could write to all outputs if needed. A 
processor which re-partitions data to different downstream vertices could be 
coded to write diff data to each output if needed. The MapProcessor and 
ReduceProcessor assume MR semantics which means that they always assume one 
input and one output (likewise for the WordCount example). 

thanks
— Hitesh

> On Aug 12, 2016, at 9:54 AM, Madhusudan Ramanna  wrote:
> 
> Hello all,
> 
> I've just started looking at Tez. I've setup Tez locally and have run sample 
> MapReduce job with Tez as a replacement for Yarn MRV2. I plan to use Tez 
> independently (No hive or pig).  I have the following questions
> 
> 1.  Are there performance gains to use Tez for Mapreduce jobs apart from 
> container reuse ? Want to make sure I'm not missing anything
> 
> 2.  More technically, from what should be the behavior of a processor while 
> generating outputs.  Should it write to all KeyValueWriters returned from 
> getOutputs() in the processor.   The WordCount example retrieves an output by 
> name and writes to it.  
> 
> thanks,
> Madhu

Re: Some resource about tez architecture and design document

2016-08-10 Thread Hitesh Shah

If you want understand Tez in terms of how it is used by Hive or Pig, it might 
be best to send emails to the dev lists of hive/pig as needed. 

If you have any tez specific questions, you can take a look at the links I 
mentioned and example code under tez-examples. 

thanks
— Hitesh 


> On Aug 10, 2016, at 9:56 AM, darion.yaphet  wrote:
> 
> Hi Hitesh thank you for you relay . AFAIK Tez is the execute engine 
> about hive (It's also a series of API to build DAG). So when I want 
> to understand Tez I have to had a good know about Hive execute plan and AST 
> generation ?
> 在 2016-08-10 01:02:31，"Hitesh Shah"  写道：
>> The following 2 links should help you get started. Might be best to start 
>> with the sigmod paper and one of the earlier videos. 
>> 
>> https://cwiki.apache.org/confluence/display/TEZ/How+to+Contribute+to+Tez
>> https://cwiki.apache.org/confluence/display/TEZ/Presentations%2C+publications%2C+and+articles+about+Tez
>> 
>> thanks
>> — Hitesh
>> 
>>> On Aug 9, 2016, at 8:03 AM, darion.yaphet  wrote:
>>> 
>>> Hi team :
>>> I'm a beginner to learn tez source code . Is there any resource tez 
>>> architecture and design document to introduce it ? 
>>> thanks 
>>> 
>>> 
>>> 
>>

Re: Some resource about tez architecture and design document

2016-08-09 Thread Hitesh Shah

The following 2 links should help you get started. Might be best to start with 
the sigmod paper and one of the earlier videos. 

https://cwiki.apache.org/confluence/display/TEZ/How+to+Contribute+to+Tez
https://cwiki.apache.org/confluence/display/TEZ/Presentations%2C+publications%2C+and+articles+about+Tez

thanks
— Hitesh

> On Aug 9, 2016, at 8:03 AM, darion.yaphet  wrote:
> 
> Hi team :
> I'm a beginner to learn tez source code . Is there any resource tez 
> architecture and design document to introduce it ? 
> thanks 
> 
> 
>

Re: Word Count examples run failed with Tez 0.8.4

2016-08-04 Thread Hitesh Shah

Hello 

I am assuming that this is the same issue as the one reported in TEZ-3396?

Based on the logs in the jira: 

2016-08-03 10:55:33,856 [INFO] [Thread-2] |app.DAGAppMaster|: 
DAGAppMasterShutdownHook invoked
2016-08-03 10:55:33,856 [INFO] [Thread-2] |app.DAGAppMaster|: DAGAppMaster 
received a signal. Signaling TaskScheduler

It seems like the AM is getting killed. 

Can you provide the configs being used for:
  - tez.am.resource.memory.mb
  - tez.am.launch.cmd-opts

You should also check the NodeManager logs for 
container_1470148111230_0011_01_01. It might shed light on whether the NM 
killed the AM for exceeding memory limits. 

thanks
— Hitesh 

> On Aug 3, 2016, at 8:50 PM, HuXi  wrote:
> 
> Default configuration was used with yarn.resourcemanager.hostname  set to 
> 0.0.0.0 and yarn.resourcemanager.address set to 0.0.0.0:8032.
> 
> If what you mentioned is really the reason, please tell me what I should do 
> to fix it? 
> 
> 
> > Date: Wed, 3 Aug 2016 20:41:31 -0700
> > Subject: Re: Word Count examples run failed with Tez 0.8.4
> > From: gop...@apache.org
> > To: user@tez.apache.org
> > 
> > 
> > > 16/08/04 09:36:00 INFO client.TezClient: The url to track the Tez AM:
> > >http://iZ25f2qedc7Z:8088/proxy/application_1470148111230_0014/
> > > 16/08/04 09:36:05 INFO client.RMProxy: Connecting to ResourceManager at
> > >/0.0.0.0:8032
> > 
> > That sounds very strange - is the resource manager really running on
> > localhost, but that resolves back to that strange hostname?
> > 
> > Cheers,
> > Gopal
> > 
> > 
> > 
> > 
> > 
> >

Re: hung AM due to timeline timeout

2016-08-03 Thread Hitesh Shah

It might worth filing a YARN jira to get it backported to 2.6.x and 2.7.x. At 
the very least, it will simplify rebuilding the timeline-server jar against the 
CDH version that you are running. 

— Hitesh

> On Aug 3, 2016, at 4:42 PM, Slava Markeyev  wrote:
> 
> Thanks for the info Hitesh. Unfortunately it seems that RollingLevelDB is 
> only in trunk. I may have to backport it to 2.6.2 (version I use). I did 
> notice that the leveldb does grow to tens of gb which may be an indication of 
> pruning not happening often enough (or at all?). I also need to fix the 
> logging as the logs for the timeline server don't seem to be very active 
> beyond it starting up.
> 
> For the job I posted before here is the associated eventQueueBacklog log line.
> 2016-08-03 19:23:27,932 [INFO] [AMShutdownThread] 
> |ats.ATSHistoryLoggingService|: Stopping ATSService, eventQueueBacklog=17553
> I'll look into lowering tez.yarn.ats.event.flush.timeout.millis while trying 
> to look into the timelineserver.
> 
> Thanks for your help,
> Slava
> 
> On Wed, Aug 3, 2016 at 2:45 PM, Hitesh Shah  wrote:
> Hello Slava,
> 
> Can you check for a log line along the lines of "Stopping ATSService, 
> eventQueueBacklog=“ to see how backed up is the event queue to YARN timeline?
> 
> I have noticed this in quite a few installs with YARN Timeline where YARN 
> Timeline is using the simple Level DB impl and not the RollingLevelDB storage 
> class. The YARN timeline ends up hitting some bottlenecks around the time 
> when the data purging happens ( takes a global lock on level db ). The 
> Rolling level db storage impl solved this problem by using separate level dos 
> for different time intervals and just throwing out the level db instead of 
> trying to do a full scan+purge.
> 
> Another workaround though not a great one is to set 
> “tez.yarn.ats.event.flush.timeout.millis” to a value say 6 i.e. 1 min. 
> This implies that the Tez AM will try for at max 1 min to flush the queue to 
> YARN timeline before giving up and shutting down the Tez AM.
> 
> A longer term option is the YARN Timeline version 1.5 work currently slated 
> to be released in hadoop 2.8.0 which uses HDFS for writes instead of the 
> current web service based approach. This has a far better perf throughput for 
> writes albeit with a delay on the read path as the Timeline server scans HDFS 
> for new updates. The tez changes for this are already available in the source 
> code under the hadoop28 profile though the documentation for this is still 
> pending.
> 
> thanks
> — Hitesh
> 
> 
> 
> 
> 
> > On Aug 3, 2016, at 2:02 PM, Slava Markeyev  
> > wrote:
> >
> > I'm running into an issue that occurs fairly often (but not consistently 
> > reproducible) where yarn reports a negative value for memory allocation eg 
> > (-2048) and a 0 vcore allocation despite the AM actually running. For 
> > example the AM reports a runtime of 1hrs, 29mins, 40sec while the dag only 
> > 880 seconds.
> >
> > After some investigating I've noticed that the AM has repeated issues 
> > contacting the timeline server after the dag is complete (error trace 
> > below). This seems to be delaying the shutdown sequence. It seems to retry 
> > every minute before either giving up or succeeding but I'm not sure which. 
> > What's the best way to debug why this would be happening and potentially 
> > shortening the timeout retry period as I'm more concerned with job 
> > completion than logging it to the timeline server. This doesn't seem to be 
> > happening consistently to all tez jobs only some.
> >
> > I'm using hive 1.1.0 and tez 0.7.1 on cdh5.4.10 (hadoop 2.6).
> >
> > 2016-08-03 19:18:22,881 [INFO] [ContainerLauncher #112] 
> > |impl.ContainerManagementProtocolProxy|: Opening proxy : node:45454
> > 2016-08-03 19:18:23,292 [WARN] [HistoryEventHandlingThread] 
> > |security.UserGroupInformation|: PriviledgedActionException as:x 
> > (auth:SIMPLE) cause:java.net.SocketTimeoutException: Read timed out
> > 2016-08-03 19:18:23,292 [ERROR] [HistoryEventHandlingThread] 
> > |impl.TimelineClientImpl|: Failed to get the response from the timeline 
> > server.
> > com.sun.jersey.api.client.ClientHandlerException: 
> > java.net.SocketTimeoutException: Read timed out
> > at 
> > com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
> > at 
> > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter$1.run(TimelineClientImpl.java:226)
> > at 
> > org.apache.hadoop.yarn.client.api.impl.Timel

Re: hung AM due to timeline timeout

2016-08-03 Thread Hitesh Shah

Hello Slava, 

Can you check for a log line along the lines of "Stopping ATSService, 
eventQueueBacklog=“ to see how backed up is the event queue to YARN timeline? 

I have noticed this in quite a few installs with YARN Timeline where YARN 
Timeline is using the simple Level DB impl and not the RollingLevelDB storage 
class. The YARN timeline ends up hitting some bottlenecks around the time when 
the data purging happens ( takes a global lock on level db ). The Rolling level 
db storage impl solved this problem by using separate level dos for different 
time intervals and just throwing out the level db instead of trying to do a 
full scan+purge.

Another workaround though not a great one is to set 
“tez.yarn.ats.event.flush.timeout.millis” to a value say 6 i.e. 1 min. This 
implies that the Tez AM will try for at max 1 min to flush the queue to YARN 
timeline before giving up and shutting down the Tez AM. 

A longer term option is the YARN Timeline version 1.5 work currently slated to 
be released in hadoop 2.8.0 which uses HDFS for writes instead of the current 
web service based approach. This has a far better perf throughput for writes 
albeit with a delay on the read path as the Timeline server scans HDFS for new 
updates. The tez changes for this are already available in the source code 
under the hadoop28 profile though the documentation for this is still pending. 

thanks
— Hitesh





> On Aug 3, 2016, at 2:02 PM, Slava Markeyev  wrote:
> 
> I'm running into an issue that occurs fairly often (but not consistently 
> reproducible) where yarn reports a negative value for memory allocation eg 
> (-2048) and a 0 vcore allocation despite the AM actually running. For example 
> the AM reports a runtime of 1hrs, 29mins, 40sec while the dag only 880 
> seconds.
> 
> After some investigating I've noticed that the AM has repeated issues 
> contacting the timeline server after the dag is complete (error trace below). 
> This seems to be delaying the shutdown sequence. It seems to retry every 
> minute before either giving up or succeeding but I'm not sure which. What's 
> the best way to debug why this would be happening and potentially shortening 
> the timeout retry period as I'm more concerned with job completion than 
> logging it to the timeline server. This doesn't seem to be happening 
> consistently to all tez jobs only some.
> 
> I'm using hive 1.1.0 and tez 0.7.1 on cdh5.4.10 (hadoop 2.6).
> 
> 2016-08-03 19:18:22,881 [INFO] [ContainerLauncher #112] 
> |impl.ContainerManagementProtocolProxy|: Opening proxy : node:45454
> 2016-08-03 19:18:23,292 [WARN] [HistoryEventHandlingThread] 
> |security.UserGroupInformation|: PriviledgedActionException as:x 
> (auth:SIMPLE) cause:java.net.SocketTimeoutException: Read timed out
> 2016-08-03 19:18:23,292 [ERROR] [HistoryEventHandlingThread] 
> |impl.TimelineClientImpl|: Failed to get the response from the timeline 
> server.
> com.sun.jersey.api.client.ClientHandlerException: 
> java.net.SocketTimeoutException: Read timed out
> at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
> at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter$1.run(TimelineClientImpl.java:226)
> at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:162)
> at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:237)
> at com.sun.jersey.api.client.Client.handle(Client.java:648)
> at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
> at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
> at 
> com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
> at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:472)
> at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:321)
> at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:301)
> at 
> org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.handleEvents(ATSHistoryLoggingService.java:349)
> at 
> org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.access$700(ATSHistoryLoggingService.java:53)
> at 
> org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService$1.run(ATSHistoryLoggingService.java:190)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:152)
> at java.net.SocketInputStream.read(SocketInputStream.java:122)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> at java.io.BufferedInputStream.read1(BufferedInputStr

Re: Guide to write map-reduce code using Tez API

2016-08-01 Thread Hitesh Shah

Please check Step 7 on http://tez.apache.org/install.html

thanks
— Hitesh

> On Aug 1, 2016, at 10:25 AM, zhiyuan yang  wrote:
> 
> The nice thing of Tez is it’s compatible with MapReduce API. So if you just 
> want to run MapReduce on Tez, you just learn how to write standard MapReduce 
> and change the execution engine to Tez.
> 
> To change the execution engine of MapReduce, please change the configuration 
> mapreduce.framework.name. (Not 100% percent sure about this, correct me if 
> I’m wrong)
> 
> Thanks!
> Zhiyuan
> 
>> On Aug 1, 2016, at 10:19 AM, Sudhir.Kumar  wrote:
>> 
>> Hello All,
>> 
>> I have just started to read about Tez.
>> 
>> Is there a document to understand the Tez Java APIs which can be used to 
>> write map-reduce code.
>> 
>> Thanks,
>> 
>> Sudhir
>

Re: Tez error

2016-07-30 Thread Hitesh Shah

If your cluster has a running YARN timeline server, setting up the UI is quite 
straightforward: http://tez.apache.org/tez-ui.html

— Hitesh

> On Jul 30, 2016, at 10:21 AM, Sandeep Khurana  wrote:
> 
> Hitesh
> 
> Tons of thanks. I am able to run Tez job from a hive query on mapR cluster. 
> I.e. hive query now fires a tez job on mapR. 
> 
> As pointed out by you I made 2 mistakes. Not configuring mapR repository in 
> tez pom.xml, somehow I thought I had done it but .. Second I was not using 
> minimal tar ball instead was using full tar, now using minimal tar. 
> 
> With these 2 changes, things worked. Also, I ran tez-examples.jar with hadoop 
> jar command, this also worked and executed a tez job. 
> 
> Now I will have to read tez docs to find out if there is a way to see the tez 
> graph in a tez view on mapR  like I see in Ambari on HDP?
> 
> 
> 
> 
> On Sat, Jul 30, 2016 at 10:25 PM, Sandeep Khurana  
> wrote:
> Sorry, by mistake sent half written message.
> 
> I will try both now and update this thread.
> 
> About trying tez bundle, I tried with the steps at 
> http://doc.mapr.com/display/MapR41/Installing+and+Configuring+Tez+0.5.3 .But 
> hive job gave some NumberFormatError and found out by googling that there is 
> version mismatch between tez and hadoop libs.
> 
> On Sat, Jul 30, 2016 at 10:22 PM, Sandeep Khurana  
> wrote:
> Hitesh
> 
> Both of things 
> 
> On Sat, Jul 30, 2016 at 10:20 PM, Hitesh Shah  wrote:
> Hello Sandeep,
> 
> 2 things to check:
>- When compiling Tez, is the hadoop.version in the top-level pom ( and 
> addition of mapr’s maven repo ) being used to compile against MapR’s hadoop 
> distribution and not the std. apache release? The Tez AM cannot seem to do a 
> handshake with the YARN RM. If MapR changed anything in that code path, this 
> issue could crop up.
>- For tez.lib.uris, are you using the minimal tez tarball and not the full 
> one? The full one has hadoop jars in it so it should not be used in your 
> deployment setup as you are using MapR specific jars. You may want to check 
> for jar clashes in terms of the content of the tarball vs the jars from the 
> cluster. The only ones we do bundle in the minimal one are some mapreduce 
> jars I believe.
> 
> FWIW, I think MapR does provide a Tez bundle for their clusters. You may wish 
> to check on their user lists too to see if they made some changes. If you do 
> find out anything, please do let us know in case this requires some fixes in 
> Tez for better compatibility across all distros.
> 
> thanks
> — Hitesh
> 
> 
> 
> > On Jul 30, 2016, at 8:55 AM, Sandeep Khurana  wrote:
> >
> > Hello
> >
> >
> > On mapR single node cluster, I compiled Tez as per the documentation on 
> > apache tez site.
> >
> > The only thing I changed from documentation was in tez-site.xml, where I 
> > gave the value of tez.use.cluster.hadoop-libs as true. I did it because 
> > mapR has its own jars for hadoop components. And without these tez might 
> > not work.
> >
> > When I run any MR job or hive query I get this below error in the job on RM 
> > link. (DIGEST-MD5: digest response format violation. Mismatched URI: 
> > default/; expecting: null/default)
> >
> > I am using mapr 4.1 and tez 0.7.0. mapR 4.1 uses hive 1.2.0.
> >
> > Any pointer or suggestion what might be causing this issue or where should 
> > I look?
> >
> > 2016-07-30 09:58:38,737 INFO [main] ipc.Server: Stopping server on 45868
> > 2016-07-30 09:58:38,737 INFO [IPC Server Responder] ipc.Server: Stopping 
> > IPC Server Responder
> > 2016-07-30 09:58:38,739 INFO [IPC Server listener on 45868] ipc.Server: 
> > Stopping IPC Server listener on 45868
> > 2016-07-30 09:58:38,739 INFO [IPC Server Responder] ipc.Server: Stopping 
> > IPC Server Responder
> > 2016-07-30 09:58:38,739 ERROR [main] app.DAGAppMaster: Error starting 
> > DAGAppMaster
> > org.apache.tez.dag.api.TezUncheckedException: 
> > javax.security.sasl.SaslException: DIGEST-MD5: digest response format 
> > violation. Mismatched URI: default/; expecting: null/default [Caused by 
> > org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): 
> > DIGEST-MD5: digest response format violation. Mismatched URI: default/; 
> > expecting: null/default]
> >   at 
> > org.apache.tez.dag.app.rm.YarnTaskSchedulerService.serviceStart(YarnTaskSchedulerService.java:384)
> >   at 
> > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> >   at 
> > org.apache.tez.dag.app.rm.TaskSchedulerEventHandler.serviceStart(TaskSch

Re: Tez error

2016-07-30 Thread Hitesh Shah

Hello Sandeep,

2 things to check: 
   - When compiling Tez, is the hadoop.version in the top-level pom ( and 
addition of mapr’s maven repo ) being used to compile against MapR’s hadoop 
distribution and not the std. apache release? The Tez AM cannot seem to do a 
handshake with the YARN RM. If MapR changed anything in that code path, this 
issue could crop up. 
   - For tez.lib.uris, are you using the minimal tez tarball and not the full 
one? The full one has hadoop jars in it so it should not be used in your 
deployment setup as you are using MapR specific jars. You may want to check for 
jar clashes in terms of the content of the tarball vs the jars from the 
cluster. The only ones we do bundle in the minimal one are some mapreduce jars 
I believe.

FWIW, I think MapR does provide a Tez bundle for their clusters. You may wish 
to check on their user lists too to see if they made some changes. If you do 
find out anything, please do let us know in case this requires some fixes in 
Tez for better compatibility across all distros.

thanks
— Hitesh
 


> On Jul 30, 2016, at 8:55 AM, Sandeep Khurana  wrote:
> 
> Hello
> 
> 
> On mapR single node cluster, I compiled Tez as per the documentation on 
> apache tez site.
> 
> The only thing I changed from documentation was in tez-site.xml, where I gave 
> the value of tez.use.cluster.hadoop-libs as true. I did it because mapR has 
> its own jars for hadoop components. And without these tez might not work.
> 
> When I run any MR job or hive query I get this below error in the job on RM 
> link. (DIGEST-MD5: digest response format violation. Mismatched URI: 
> default/; expecting: null/default) 
> 
> I am using mapr 4.1 and tez 0.7.0. mapR 4.1 uses hive 1.2.0. 
> 
> Any pointer or suggestion what might be causing this issue or where should I 
> look?
> 
> 2016-07-30 09:58:38,737 INFO [main] ipc.Server: Stopping server on 45868
> 2016-07-30 09:58:38,737 INFO [IPC Server Responder] ipc.Server: Stopping IPC 
> Server Responder
> 2016-07-30 09:58:38,739 INFO [IPC Server listener on 45868] ipc.Server: 
> Stopping IPC Server listener on 45868
> 2016-07-30 09:58:38,739 INFO [IPC Server Responder] ipc.Server: Stopping IPC 
> Server Responder
> 2016-07-30 09:58:38,739 ERROR [main] app.DAGAppMaster: Error starting 
> DAGAppMaster
> org.apache.tez.dag.api.TezUncheckedException: 
> javax.security.sasl.SaslException: DIGEST-MD5: digest response format 
> violation. Mismatched URI: default/; expecting: null/default [Caused by 
> org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): 
> DIGEST-MD5: digest response format violation. Mismatched URI: default/; 
> expecting: null/default]
>   at 
> org.apache.tez.dag.app.rm.YarnTaskSchedulerService.serviceStart(YarnTaskSchedulerService.java:384)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.tez.dag.app.rm.TaskSchedulerEventHandler.serviceStart(TaskSchedulerEventHandler.java:353)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.tez.dag.app.DAGAppMaster$ServiceWithDependency.start(DAGAppMaster.java:1573)
>   at 
> org.apache.tez.dag.app.DAGAppMaster$ServiceThread.run(DAGAppMaster.java:1591)
> Caused by: javax.security.sasl.SaslException: DIGEST-MD5: digest response 
> format violation. Mismatched URI: default/; expecting: null/default [Caused 
> by org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): 
> DIGEST-MD5: digest response format violation. Mismatched URI: default/; 
> expecting: null/default]
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:109)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at com.sun.proxy.$Proxy24.registerApplicationMaster(Unknown Source)
>   at 
> org.apache.hadoop.yarn.client.ap

Re: Could Tez 0.5.4 Integrate with Hive 2.X

2016-07-29 Thread Hitesh Shah

That is highly unlikely to work as Hive-2.x requires APIs introduced in Tez 
0.8.x.

thanks
— Hitesh 

> On Jul 28, 2016, at 8:56 PM, darion.yaphet  wrote:
> 
> Hi team :
> 
> We are using hadoop 2.5.0 and hive 1.2.1 tez 0.5.4 . Now we want to upgrade 
> to hive 2.X  . Could Tez 0.5.4 support Hive 2.X ?
> 
> thanks ~~
> 
> 
>

Re: Getting ClosedByInterruptException when DAG w/ edge executes

2016-07-20 Thread Hitesh Shah

Either emails to the dev list or specific JIRAs on any usability issues that 
you come across - be it missing/unclear docs, APIs that could require cleaning 
up, bugs or potential helper libraries to make things easier. Pretty much any 
feedback ( and/or patches ) are welcome :)

thanks
— Hitesh 

> On Jul 1, 2016, at 10:48 AM, Scott McCarty  wrote:
> 
> I'm still in the middle of design and code (learning along the way!) but can 
> certainly provide feedback.  Aside from commits (something I'm definitely not 
> thinking I should be doing), what's a good way for this?

Re: Getting ClosedByInterruptException when DAG w/ edge executes

2016-07-01 Thread Hitesh Shah

Thanks for the update, Scott. 

Given that the APIs have mostly been used by other framework developers, there 
is probably quite a few things which may not be easily surfaced in javadocs, 
usage examples ( and their lack of ), etc. It would be great if you can provide 
feedback ( and patches ) to help address such shortcomings. 

Also, would you mind providing some more details on how you are using Tez? 

— Hitesh 

> On Jul 1, 2016, at 7:27 AM, Scott McCarty  wrote:
> 
> Thanks for responding.
> 
> After much hair pulling I found and fixed this.  It was due to my not calling 
> setFromConfiguration(tezConf) on OrderedPartitionedKVEdgeConfig (other 
> builders probably require the same call).  The comments in the sample code 
> say that the call is optional (allowing override of the config with command 
> line parameters) but that appears not to be the case, at least for my code :-(
> 
> I also needed to make sure that the TezConfiguration I passed to it had been 
> used in the call UserGroupInformation.setConfigurat(tezConf).  There's a lot 
> of behind-the-scenes stuff I wasn't aware of...
> 
> --Scott
> 
> On Thu, Jun 30, 2016 at 3:48 PM, Siddharth Seth  wrote:
> Scott,
> Do you have logs for the entire job. I haven't seen this error before . The 
> trace may be end result of an earlier failure / decision made to kill the 
> task - which causes the task to be interrupted, and hence the trace.
> 
> Thanks,
> Sid
> 
> On Wed, Jun 29, 2016 at 10:00 AM, Scott McCarty  wrote:
> Hi,
> 
> I am trying to get Tez 0.9.0-SNAPSHOT (latest commit as of this writing, but 
> still fails with earlier 0.9.0 commits) working with vanilla hadoop 2.6.0 but 
> it's failing with the following under certain conditions:
> 
> java.lang.RuntimeException: java.io.IOException: Failed on local exception: 
> java.nio.channels.ClosedByInterruptException; Host Details : local host is: 
> "localhost/127.0.0.1"; destination host is: "localhost":9000; 
>   at 
> org.apache.hadoop.mapreduce.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:209)
>   at 
> org.apache.hadoop.mapreduce.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initialize(TezGroupedSplitsInputFormat.java:156)
>   at 
> org.apache.tez.mapreduce.lib.MRReaderMapReduce.setupNewRecordReader(MRReaderMapReduce.java:157)
>   at 
> org.apache.tez.mapreduce.lib.MRReaderMapReduce.setSplit(MRReaderMapReduce.java:88)
>   at 
> org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:694)
>   at 
> org.apache.tez.mapreduce.input.MRInput.processSplitEvent(MRInput.java:622)
>   at org.apache.tez.mapreduce.input.MRInput.handleEvents(MRInput.java:586)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.handleEvent(LogicalIOProcessorRuntimeTask.java:715)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.access$600(LogicalIOProcessorRuntimeTask.java:105)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$1.runInternal(LogicalIOProcessorRuntimeTask.java:792)
>   at org.apache.tez.common.RunnableWithNdc.run(RunnableWithNdc.java:35)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Failed on local exception: 
> java.nio.channels.ClosedByInterruptException; Host Details : local host is: 
> "localhost/127.0.0.1"; destination host is: "localhost":9000; 
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1472)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>   at com.sun.proxy.$Proxy15.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at com.sun.proxy.$Proxy16.getFileInfo(Unknown Source)
>   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1988)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1118)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFi

Re: Tez Job Counters

2016-06-28 Thread Hitesh Shah

Hello Muhammad,

Did you try any of the calls to YARN timeline as described by Rajesh in his 
earlier reply? 

thanks
— Hitesh

> On Jun 28, 2016, at 1:20 PM, Muhammad Haris 
>  wrote:
> 
> Hi,
> Could anybody please guide me how to get all task level counters? Thanks a lot
> 
> 
> 
> Regards
> 
> On Mon, Jun 27, 2016 at 5:43 PM, Muhammad Haris 
>  wrote:
> Hi Rajesh,
> Thank you for quick help
> 
> 
> Regards
> 
> On Mon, Jun 27, 2016 at 4:56 PM, Rajesh Balamohan  
> wrote:
> You can refer to Tez-UI which makes use of these APIs extensively.  
> https://tez.apache.org/tez-ui.html provides 
> details on setting this up.
> 
> AppID:
> http://atsmachine:8088/ws/v1/cluster/apps/application_1466689310983_0024
> 
> Tez DAG:
> http://atsmachine:8188/ws/v1/timeline/TEZ_DAG_ID?limit=10
> http://atsmachine:8188/ws/v1/timeline/TEZ_DAG_ID/dag_1466689310983_0021_409 
> 
> This should have the counters in the otherInfo --> counters --> counterGroups
> 
> Tez vertex:
> http://atsmachine:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=10&primaryFilter=TEZ_DAG_ID:dag_1466689310983_0023_408
> 
> Tez task:
> http://atsmachine:8188/ws/v1/timeline/TEZ_TASK_ID?limit=1000&primaryFilter=TEZ_DAG_ID:dag_1466689310983_0023_408
> 
> Tez task attempt:
> http://atsmachine:8188/ws/v1/timeline/TEZ_TASK_ATTEMPT_ID?limit=1000&primaryFilter=TEZ_DAG_ID:dag_1466689310983_0023_408
> 
> ~Rajesh.B
> 
> On Mon, Jun 27, 2016 at 3:50 AM, Muhammad Haris 
>  wrote:
> Hi,
> I have implemented an application that runs Hive jobs and pulls its job 
> counters using Job History server REST API, Job History server REST API that 
> i am using is: 
> http:// address:port>/ws/v1/history/mapreduce/jobs/{jobid}/counters 
> 
> I have been looking for similar REST API to get job counters for Hive on Tez 
> jobs, i understand Tez jobs history is pushed into YARN timeline server but i 
> failed to find any API in YARN timeline server through which i can pull the 
> job counters.
> 
> Any help/suggestions will be highly appreciated
> Regards
> Haris Akhtar
> 
> 
> 
>

Re: Tez Job fails - waiting for AM container to be allocated

2016-06-18 Thread Hitesh Shah

Hi Ananda,

Yes - looks like the RM assigned a container for the Tez AM. Next up would be 
to search for “container_e54_1466115469995_0142_01_01” in the nodemanager 
logs on host  usw2stdpwo12.glassdoor.local. 

Also, did the app logs of application_1466115469995_0142 shed any light ( 
obtained via bin/yarn logs -applicationId application_1466115469995_0142 )?

— Hitesh

> On Jun 17, 2016, at 11:31 PM, Anandha L Ranganathan  
> wrote:
> 
> Hitesh,
> 
> This is the information, I see in the RM logs.  There are enough resources 
> available on that NM. 
> 
> 
> 2016-06-17 19:04:50,406 INFO  scheduler.SchedulerNode 
> (SchedulerNode.java:allocateContainer(154)) - Assigned container 
> container_e54_1466115469995_0142_01_01 of capacity  vCores:1> on host usw2stdpwo12.glassdoor.local:45454, which has 1 containers, 
>  used and  available after 
> allocation
> 2016-06-17 19:04:50,406 INFO  capacity.LeafQueue 
> (LeafQueue.java:assignContainer(1633)) - assignedContainer application 
> attempt=appattempt_1466115469995_0142_01 container=Container: 
> [ContainerId: container_e54_1466115469995_0142_01_01, NodeId: 
> usw2stdpwo12.glassdoor.local:45454, NodeHttpAddress: 
> usw2stdpwo12.glassdoor.local:8042, Resource: , 
> Priority: 0, Token: null, ] queue=default: capacity=0.2, 
> absoluteCapacity=0.2, usedResources=, 
> usedCapacity=0.61731374, absoluteUsedCapacity=0.12345679, numApps=3, 
> numContainers=2 clusterResource= type=OFF_SWITCH
> 2016-06-17 19:04:50,407 INFO  security.NMTokenSecretManagerInRM 
> (NMTokenSecretManagerInRM.java:createAndGetNMToken(200)) - Sending NMToken 
> for nodeId : usw2stdpwo12.glassdoor.local:45454 for container
> 
> On Fri, Jun 17, 2016 at 6:38 PM, Hitesh Shah  wrote:
> -dev@tez for now.
> 
> Hello Anandha,
> 
> The usual issue with this is a lack of resources. e.g. no cluster capacity to 
> launch the AM, queue configs not allowing another AM to launch, the memory 
> size configured for the AM is too large such that it cannot be scheduled on 
> any existing node, etc.
> 
> Can you search for this string “1466115469995_0142” within the 
> ResourceManager logs? That should shed some more light on what is going on.
> 
> thanks
> — Hitesh
> 
> 
> > On Jun 17, 2016, at 6:30 PM, Anandha L Ranganathan  
> > wrote:
> >
> > Yes.  sufficient resources  are available for that job.  No other jobs are 
> > running and only this job is running.
> >
> >
> >
> > On Fri, Jun 17, 2016 at 5:16 PM, Jeff Zhang  wrote:
> > Please check RM UI whether you have sufficient resources for your app
> >
> >
> > On Sat, Jun 18, 2016 at 7:35 AM, Anandha L Ranganathan 
> >  wrote:
> > I am upgrading one of our cluster from HDP 2.2 to HDP 2.4.0. version.
> >
> >
> >
> > The status I see in the Application monitoring URL is
> >
> > YARN Applicaiton Status: ACCEPTED: waiting for AM container to be
> > allocated, launched and register with RM.  But when we submit the MR job,
> > then it is running fine.
> >
> > It waits in that state for sometime(300 seconds) and dies and the service
> > check is failed.  All nodes are live and Active status.
> >
> >
> >
> > We try to run the job manually , and the job stops at this point.
> >
> > hadoop --config /usr/hdp/2.4.0.0-169/hadoop/conf jar
> > /usr/hdp/current/tez-client/tez-examples*.jar orderedwordcount
> > /tmp/tezsmokeinput/sample-tez-test /tmp/tezsmokeoutput1/
> > WARNING: Use "yarn jar" to launch YARN applications.
> > 16/06/17 19:04:47 INFO client.TezClient: Tez Client Version: [
> > component=tez-api, version=0.7.0.2.4.0.0-169,
> > revision=3c1431f45faaca982ecc8dad13a107787b834696,
> > SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git,
> > buildTime=20160210-0711 ]
> > 16/06/17 19:04:47 INFO impl.TimelineClientImpl: Timeline service
> > address: http://usw2stdpma03.glassdoor.local:8188/ws/v1/timeline/
> > 16/06/17 <http://usw2stdpma03.glassdoor.local:8188/ws/v1/timeline/16/06/17>
> > 19:04:48 INFO client.RMProxy: Connecting to ResourceManager at
> > usw2stdpma03.glassdoor.local/172.17.212.107:8050
> > 16/06/17 19:04:48 INFO client.TezClient: Using
> > org.apache.tez.dag.history.ats.acls.ATSHistoryACLPolicyManager to
> > manage Timeline ACLs
> > 16/06/17 19:04:48 INFO impl.TimelineClientImpl: Timeline service
> > address: http://usw2stdpma03.glassdoor.local:8188/ws/v1/timeline/
> > 16/06/17 <http://usw2stdpma03.glassdoor.local:8188/ws/v1/timeline/16/06/17>
> > 19:04:49 INFO examples.OrderedWordCount: Running OrderedWordCount
> > 16/06/17 19:04:49 INFO client.TezC

Re: Tez Job fails - waiting for AM container to be allocated

2016-06-17 Thread Hitesh Shah

-dev@tez for now.

Hello Anandha, 

The usual issue with this is a lack of resources. e.g. no cluster capacity to 
launch the AM, queue configs not allowing another AM to launch, the memory size 
configured for the AM is too large such that it cannot be scheduled on any 
existing node, etc. 

Can you search for this string “1466115469995_0142” within the ResourceManager 
logs? That should shed some more light on what is going on. 

thanks
— Hitesh 


> On Jun 17, 2016, at 6:30 PM, Anandha L Ranganathan  
> wrote:
> 
> Yes.  sufficient resources  are available for that job.  No other jobs are 
> running and only this job is running.
>  
> 
> 
> On Fri, Jun 17, 2016 at 5:16 PM, Jeff Zhang  wrote:
> Please check RM UI whether you have sufficient resources for your app
> 
> 
> On Sat, Jun 18, 2016 at 7:35 AM, Anandha L Ranganathan 
>  wrote:
> I am upgrading one of our cluster from HDP 2.2 to HDP 2.4.0. version.
> 
> 
> 
> The status I see in the Application monitoring URL is
> 
> YARN Applicaiton Status: ACCEPTED: waiting for AM container to be
> allocated, launched and register with RM.  But when we submit the MR job,
> then it is running fine.
> 
> It waits in that state for sometime(300 seconds) and dies and the service
> check is failed.  All nodes are live and Active status.
> 
> 
> 
> We try to run the job manually , and the job stops at this point.
> 
> hadoop --config /usr/hdp/2.4.0.0-169/hadoop/conf jar
> /usr/hdp/current/tez-client/tez-examples*.jar orderedwordcount
> /tmp/tezsmokeinput/sample-tez-test /tmp/tezsmokeoutput1/
> WARNING: Use "yarn jar" to launch YARN applications.
> 16/06/17 19:04:47 INFO client.TezClient: Tez Client Version: [
> component=tez-api, version=0.7.0.2.4.0.0-169,
> revision=3c1431f45faaca982ecc8dad13a107787b834696,
> SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git,
> buildTime=20160210-0711 ]
> 16/06/17 19:04:47 INFO impl.TimelineClientImpl: Timeline service
> address: http://usw2stdpma03.glassdoor.local:8188/ws/v1/timeline/
> 16/06/17 
> 19:04:48 INFO client.RMProxy: Connecting to ResourceManager at
> usw2stdpma03.glassdoor.local/172.17.212.107:8050
> 16/06/17 19:04:48 INFO client.TezClient: Using
> org.apache.tez.dag.history.ats.acls.ATSHistoryACLPolicyManager to
> manage Timeline ACLs
> 16/06/17 19:04:48 INFO impl.TimelineClientImpl: Timeline service
> address: http://usw2stdpma03.glassdoor.local:8188/ws/v1/timeline/
> 16/06/17 
> 19:04:49 INFO examples.OrderedWordCount: Running OrderedWordCount
> 16/06/17 19:04:49 INFO client.TezClient: Submitting DAG application
> with id: application_1466115469995_0142
> 16/06/17 19:04:49 INFO client.TezClientUtils: Using tez.lib.uris value
> from configuration: /hdp/apps/2.4.0.0-169/tez/tez.tar.gz
> 16/06/17 19:04:49 INFO client.TezClient: Stage directory
> /tmp/root/staging doesn't exist and is created
> 16/06/17 19:04:49 INFO client.TezClient: Tez system stage directory
> hdfs://dfs-nameservices/tmp/root/staging/.tez/application_1466115469995_0142
> doesn't exist and is created
> 16/06/17 19:04:49 INFO acls.ATSHistoryACLPolicyManager: Created
> Timeline Domain for History ACLs,
> domainId=Tez_ATS_application_1466115469995_0142
> 16/06/17 19:04:50 INFO client.TezClient: Submitting DAG to YARN,
> applicationId=application_1466115469995_0142,
> dagName=OrderedWordCount, callerContext={ context=TezExamples,
> callerType=null, callerId=null }
> 16/06/17 19:04:50 INFO impl.YarnClientImpl: Submitted application
> application_1466115469995_0142
> 16/06/17 19:04:50 INFO client.TezClient: The url to track the Tez AM:
> http://usw2stdpma03.glassdoor.local:8088/proxy/application_1466115469995_0142/
> 16/06/17 
> 
> 19:04:50 INFO impl.TimelineClientImpl: Timeline service address:
> http://usw2stdpma03.glassdoor.local:8188/ws/v1/timeline/
> 16/06/17 
> 19:04:50 INFO client.RMProxy: Connecting to ResourceManager at
> usw2stdpma03.glassdoor.local/172.17.212.107:8050
> 16/06/17 19:04:51 INFO client.DAGClientImpl: Waiting for DAG to start running
> 
> 
> 
> how do I fix this problem ?
> 
> Thanks
> Anand
> 
> 
> 
> -- 
> Best Regards
> 
> Jeff Zhang
>

Re: Tez 0.8.3 on EMR hanging with Hive task

2016-06-15 Thread Hitesh Shah

If log aggregation is not enabled, the next best thing would be to download the 
application master logs from the RM UI for the apps in question. Those would 
provide a good starting point for figuring out what is going on. 

thanks
— HItesh 


> On Jun 15, 2016, at 8:29 AM, Jose Rozanec  
> wrote:
> 
> Hello, 
> 
> We provide an update. Seems we understood something wrong: hive returned us 
> an error in the query, while Tez job was running not reporting progress. We 
> did not cancel it, since seemed that it hanged. After two hours reported as 
> finished on the UI; while still held running state when listed from YARN for 
> some time more and finished finally finished.
> We have log aggregation enabled, but after the job finished, we still get the 
> same message as reported in the previous email.
> 
> Now will research why Hive detached from Tez while still running; and if we 
> can improve query accept times, since is taking a while to start executing 
> complex queries.
> 
> Thanks, 
> 
> 
> 
> 
> 2016-06-15 12:09 GMT-03:00 Jose Rozanec :
> Hello, 
> 
> I ran the command, and got the following message:
> 16/06/15 15:07:35 INFO impl.TimelineClientImpl: Timeline service address: 
> http://ip-10-64-23-215.ec2.internal:8188/ws/v1/timeline/
> 16/06/15 15:07:35 INFO client.RMProxy: Connecting to ResourceManager at 
> ip-10-64-23-215.ec2.internal/10.64.23.215:8032
> /var/log/hadoop-yarn/apps/hadoop/logs/application_1465996511770_0001 does not 
> exist.
> Log aggregation has not completed or is not enabled.
> 
> I think we are missing some configuration that would help us get more insight?
> 
> Thanks!
> 
> Joze.
> 
> 2016-06-15 12:03 GMT-03:00 Hitesh Shah :
> Hello Joze,
> 
> Would it be possible for you to provide the YARN application logs obtained 
> via “bin/yarn logs -applicationId ” for both of the cases you have 
> seen? Feel free to file JIRAs and attach logs to each of them.
> 
> thanks
> — Hitesh
> 
> > On Jun 15, 2016, at 7:38 AM, Jose Rozanec  
> > wrote:
> >
> > Hello,
> >
> > We are experiencing some issues with Tez 0.8.3 when we issue heavy queries 
> > from Hive. Seems some jobs hang on Tez and never return. Those jobs show up 
> > in the DAG web-ui, but no progress is reported on UI nor on Hive logs. Any 
> > ideas why this could happen? We detect happens with certain memory 
> > configurations, which if missing, the job dies soon (we guess due to OOM).
> >
> > Most probably not related to this, at some point we also got the following 
> > error: "org.apache.tez.dag.api.SessionNotRunning: TezSession has already 
> > shutdown. Application x failed 2 times due to AM Container". We are not 
> > sure can be related to TEZ-2663, which should be solved since version 0.7.1 
> > onwards.
> >
> > Thanks in advance,
> >
> > Joze.
> 
> 
>

Re: Tez 0.8.3 on EMR hanging with Hive task

2016-06-15 Thread Hitesh Shah

Hello Joze, 

Would it be possible for you to provide the YARN application logs obtained via 
“bin/yarn logs -applicationId ” for both of the cases you have seen? 
Feel free to file JIRAs and attach logs to each of them.

thanks
— Hitesh

> On Jun 15, 2016, at 7:38 AM, Jose Rozanec  
> wrote:
> 
> Hello, 
> 
> We are experiencing some issues with Tez 0.8.3 when we issue heavy queries 
> from Hive. Seems some jobs hang on Tez and never return. Those jobs show up 
> in the DAG web-ui, but no progress is reported on UI nor on Hive logs. Any 
> ideas why this could happen? We detect happens with certain memory 
> configurations, which if missing, the job dies soon (we guess due to OOM).
> 
> Most probably not related to this, at some point we also got the following 
> error: "org.apache.tez.dag.api.SessionNotRunning: TezSession has already 
> shutdown. Application x failed 2 times due to AM Container". We are not 
> sure can be related to TEZ-2663, which should be solved since version 0.7.1 
> onwards.
> 
> Thanks in advance, 
> 
> Joze.

Re: My first TEZ job fails

2016-05-20 Thread Hitesh Shah

To clarify, 

if you do the following: 

hdfs -mkdir /tez/
hdfs -put $TEZ_HOME/share/tez-0.7.1.tar.gz /tez/

Then tez.lib.uris should be “/tez/tez-0.7.1.tar.gz” 

— HItesh 


> On May 20, 2016, at 5:11 PM, Mich Talebzadeh  
> wrote:
> 
> I think the instructions can be simplified as follows
> 
> wherever you have your tarball say $TEZ_HOME/share/tez-0.7.1.tar.gz
> 
> you create the said absolute directory in your hdfs and put you tarball there
> 
> Also your uris should refer to the said absolute path (a single directory 
> structure)
> 
>   tez.lib.uris
> /usr/lib/apache-tez-0.7.1-bin/share
>  
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> http://talebzadehmich.wordpress.com
>  
> 
> On 21 May 2016 at 01:06, Mich Talebzadeh  wrote:
> Thanks Hitesh,
> 
> Here
> 
> tez.lib.uris should be a single value and the value should be the absolute 
> path pointing the tez tarball on HDFS
> 
> OK
> 
> are we talking about the one under $TEZ_HOME/share?
> 
> hduser@rhes564: /usr/lib/apache-tez-0.7.1-bin/share> ltr
> -rw-r--r-- 1 hduser hadoop 39694439 May  4 18:47 tez-0.7.1.tar.gz
> 
> 
> 
> And my uris would be the abosulte path to the above tarball under 
> /usr/lib/apache-tez-0.7.1-bin/share
> 
> cat tez-site.xml
> 
>   
> tez.version
> 0.7.1
>   
>   
> tez.lib.uris
> /usr/lib/apache-tez-0.7.1-bin/share
>   
> 
> hduser@rhes564: /home/hduser/hadoop-2.6.0/etc/Hadoop>
> 
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> http://talebzadehmich.wordpress.com
>  
> 
> On 21 May 2016 at 00:58, Hitesh Shah  wrote:
> Hello Mich,
> 
> Yes those are the instructions.
> 
> tez.lib.uris should be a single value and the value should be the absolute 
> path pointing the tez tarball on HDFS. The tarball to be uploaded to HDFS can 
> be found at apache-tez-{x.y.z}/share/tez.tar.gz ( or 
> tez-dist/target/tez-x.y.x.tar.gz if you are using your local build ).
> 
> thanks
> — Hitesh
> 
> > On May 20, 2016, at 4:39 PM, Mich Talebzadeh  
> > wrote:
> >
> > This is the instruction?
> >
> > Created by Hitesh Shah, last modified on May 02, 2016 Go to start of 
> > metadata
> > Making use of the Tez Binary Release tarball
> >
> >   • If the binary tarball's name does not include anything referring to 
> > a hadoop version, then this implies that the tarball was compiled against 
> > the hadoop version that the Tez release compiles against by default. For 
> > example, for 0.7.0 and 0.8.0, the default hadoop version used is 2.6.0 ( 
> > this can be found by looking for the hadoop.version property in the 
> > top-level pom.xml in the source tarball for the release).
> >
> >   • The tarball structure is as follows:
> >
> >
> > ?
> > apache-tez-{x.y.z}/
> >   /tez*.jar
> >   /lib/*.jar
> >   /conf/tez*.xml.template
> >   /share/tez.tar.gz
> >   • Set up Tez by following INSTALL.txt and use 
> > apache-tez-{x.y.z}/share/tez.tar.gz as the full tarball to be uploaded to 
> > HDFS.
> >   • Use the config templates under apache-tez-{x.y.z}/conf/ to create 
> > the tez-site.xml as needed in an appropriate conf directory. If you end up 
> > using apache-tez-{x.y.z}/conf/, then do an export 
> > TEZ_CONF_DIR="apache-tez-{x.y.z}/conf/"
> >   • Add "apache-tez-{x.y.z}/*:apache-tez-{x.y.z}/lib/*:${TEZ_CONF_DIR}" 
> > to HADOOP_CLASSPATH so as to get the tez client jars onto the classpath 
> > invoked when using the "bin/hadoop jar" command to run an example job.
> >
> >
> > Dr Mich Talebzadeh
> >
> > LinkedIn  
> > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > On 21 May 2016 at 00:37, Mich Talebzadeh  wrote:
> > Thanks both
> >
> > so this is the file that needs to go in hdfs correct?
> >
> > hduser@rhes564: /usr/lib/apache-tez-0.7.1-bin/share> ltr tez-0.7.1.tar.gz
> > -rw-r--r-- 1 hduser hadoop 39694439 May  4 18:47 tez-0.7.1.tar.gz
> >
> >
> > In hdfs I have now
> >
> > hduser@rhes564: /usr/lib/apache-tez-0.7.1-bin/share> hdfs dfs -ls 
> > /usr/lib/apache-tez-0.7.1-bin
> >
> > -rw-r--r--   2 hduser supergroup   39694439 2016-05-21 00:31 
> > /usr/

Re: My first TEZ job fails

2016-05-20 Thread Hitesh Shah

Sorry - I forgot to call that out in the instructions as the config templates 
are being generated only from 0.8.x onwards. Will update the wiki.

— Hitesh 

> On May 20, 2016, at 4:51 PM, Mich Talebzadeh  
> wrote:
> 
> With reference to
> 
> ... then do an export TEZ_CONF_DIR="apache-tez-{x.y.z}/conf/"
> 
> By they way there is no conf directory under apache-tez-0.7.1-bin! There are 
> only lib, and share
> 
> hduser@rhes564: /usr/lib/apache-tez-0.7.1-bin> ltr|grep '^d'
> drwxr-xr-x  2 hduser hadoop4096 May 20 22:58 share
> drwxr-xr-x  2 hduser hadoop4096 May 20 22:58 lib
> drwxr-xr-x 85 root   root 40960 May 20 22:58 ..
> drwxr-xr-x  4 hduser hadoop4096 May 20 23:18 .
> hduser@rhes564: /usr/lib/apache-tez-0.7.1-bin>
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> http://talebzadehmich.wordpress.com
>  
> 
> On 21 May 2016 at 00:42, Mich Talebzadeh  wrote:
> Please bear in mind that all I want is to test Hive with TEZ engine. I 
> already have Hive working OK with Spark 1.3.1 engine and I compiled it spark 
> from source code. so hopefully I can use TEZ as Spark engine as well.
> 
> 
> thanks
> 
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> http://talebzadehmich.wordpress.com
>  
> 
> On 21 May 2016 at 00:39, Mich Talebzadeh  wrote:
> This is the instruction?
> 
> Created by Hitesh Shah, last modified on May 02, 2016 Go to start of metadata
> Making use of the Tez Binary Release tarball
> 
>   • If the binary tarball's name does not include anything referring to a 
> hadoop version, then this implies that the tarball was compiled against the 
> hadoop version that the Tez release compiles against by default. For example, 
> for 0.7.0 and 0.8.0, the default hadoop version used is 2.6.0 ( this can be 
> found by looking for the hadoop.version property in the top-level pom.xml in 
> the source tarball for the release).
> 
>   • The tarball structure is as follows:
> 
> 
> ?
> apache-tez-{x.y.z}/
>   /tez*.jar
>   /lib/*.jar
>   /conf/tez*.xml.template
>   /share/tez.tar.gz
>   • Set up Tez by following INSTALL.txt and use 
> apache-tez-{x.y.z}/share/tez.tar.gz as the full tarball to be uploaded to 
> HDFS.
>   • Use the config templates under apache-tez-{x.y.z}/conf/ to create the 
> tez-site.xml as needed in an appropriate conf directory. If you end up using 
> apache-tez-{x.y.z}/conf/, then do an export 
> TEZ_CONF_DIR="apache-tez-{x.y.z}/conf/"
>   • Add "apache-tez-{x.y.z}/*:apache-tez-{x.y.z}/lib/*:${TEZ_CONF_DIR}" 
> to HADOOP_CLASSPATH so as to get the tez client jars onto the classpath 
> invoked when using the "bin/hadoop jar" command to run an example job.
>  
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> http://talebzadehmich.wordpress.com
>  
> 
> On 21 May 2016 at 00:37, Mich Talebzadeh  wrote:
> Thanks both
> 
> so this is the file that needs to go in hdfs correct?
> 
> hduser@rhes564: /usr/lib/apache-tez-0.7.1-bin/share> ltr tez-0.7.1.tar.gz
> -rw-r--r-- 1 hduser hadoop 39694439 May  4 18:47 tez-0.7.1.tar.gz
> 
> 
> In hdfs I have now
> 
> hduser@rhes564: /usr/lib/apache-tez-0.7.1-bin/share> hdfs dfs -ls 
> /usr/lib/apache-tez-0.7.1-bin
> 
> -rw-r--r--   2 hduser supergroup   39694439 2016-05-21 00:31 
> /usr/lib/apache-tez-0.7.1-bin/tez-0.7.1.tar.gz
> 
> 
> Now I only installed tez under
> 
> /usr/lib/apache-tez-0.7.1-bin
> 
> MY Hadoop is installed in
> 
> echo $HADOOP_HOME
> /home/hduser/hadoop-2.6.0
> 
> and my xml-site in in
> 
> $HADOOP_HOME/etc/hadoop
> 
> OK and this is my sml-site content
> 
> hduser@rhes564: /home/hduser/hadoop-2.6.0/etc/hadoop> cat tez-site.xml
> 
>   
> tez.version
> 0.7.1
>   
>   
> tez.lib.uris
> 
> /usr/lib/apache-tez-0.7.1-bin,/usr/lib/apache-tez-0.7.1-bin/lib
>   
> 
> 
> Is the red correct please?
> 
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> http://talebzadehmich.wordpress.com
>  
> 
> On 21 May 2016 at 00:24, Bikas Saha  wrote:
> >> tez.lib.uris assumes that paths are based on the default fs and therefore 
> >> if your setup is using HDFS as default, the paths /usr/lib would be invalid
>

Re: My first TEZ job fails

2016-05-20 Thread Hitesh Shah

Hello Mich, 

Yes those are the instructions. 

tez.lib.uris should be a single value and the value should be the absolute path 
pointing the tez tarball on HDFS. The tarball to be uploaded to HDFS can be 
found at apache-tez-{x.y.z}/share/tez.tar.gz ( or 
tez-dist/target/tez-x.y.x.tar.gz if you are using your local build ).

thanks
— Hitesh

> On May 20, 2016, at 4:39 PM, Mich Talebzadeh  
> wrote:
> 
> This is the instruction?
> 
> Created by Hitesh Shah, last modified on May 02, 2016 Go to start of metadata
> Making use of the Tez Binary Release tarball
> 
>   • If the binary tarball's name does not include anything referring to a 
> hadoop version, then this implies that the tarball was compiled against the 
> hadoop version that the Tez release compiles against by default. For example, 
> for 0.7.0 and 0.8.0, the default hadoop version used is 2.6.0 ( this can be 
> found by looking for the hadoop.version property in the top-level pom.xml in 
> the source tarball for the release).
> 
>   • The tarball structure is as follows:
> 
> 
> ?
> apache-tez-{x.y.z}/
>   /tez*.jar
>   /lib/*.jar
>   /conf/tez*.xml.template
>   /share/tez.tar.gz
>   • Set up Tez by following INSTALL.txt and use 
> apache-tez-{x.y.z}/share/tez.tar.gz as the full tarball to be uploaded to 
> HDFS.
>   • Use the config templates under apache-tez-{x.y.z}/conf/ to create the 
> tez-site.xml as needed in an appropriate conf directory. If you end up using 
> apache-tez-{x.y.z}/conf/, then do an export 
> TEZ_CONF_DIR="apache-tez-{x.y.z}/conf/"
>   • Add "apache-tez-{x.y.z}/*:apache-tez-{x.y.z}/lib/*:${TEZ_CONF_DIR}" 
> to HADOOP_CLASSPATH so as to get the tez client jars onto the classpath 
> invoked when using the "bin/hadoop jar" command to run an example job.
>  
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> http://talebzadehmich.wordpress.com
>  
> 
> On 21 May 2016 at 00:37, Mich Talebzadeh  wrote:
> Thanks both
> 
> so this is the file that needs to go in hdfs correct?
> 
> hduser@rhes564: /usr/lib/apache-tez-0.7.1-bin/share> ltr tez-0.7.1.tar.gz
> -rw-r--r-- 1 hduser hadoop 39694439 May  4 18:47 tez-0.7.1.tar.gz
> 
> 
> In hdfs I have now
> 
> hduser@rhes564: /usr/lib/apache-tez-0.7.1-bin/share> hdfs dfs -ls 
> /usr/lib/apache-tez-0.7.1-bin
> 
> -rw-r--r--   2 hduser supergroup   39694439 2016-05-21 00:31 
> /usr/lib/apache-tez-0.7.1-bin/tez-0.7.1.tar.gz
> 
> 
> Now I only installed tez under
> 
> /usr/lib/apache-tez-0.7.1-bin
> 
> MY Hadoop is installed in
> 
> echo $HADOOP_HOME
> /home/hduser/hadoop-2.6.0
> 
> and my xml-site in in
> 
> $HADOOP_HOME/etc/hadoop
> 
> OK and this is my sml-site content
> 
> hduser@rhes564: /home/hduser/hadoop-2.6.0/etc/hadoop> cat tez-site.xml
> 
>   
> tez.version
> 0.7.1
>   
>   
> tez.lib.uris
> 
> /usr/lib/apache-tez-0.7.1-bin,/usr/lib/apache-tez-0.7.1-bin/lib
>   
> 
> 
> Is the red correct please?
> 
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> http://talebzadehmich.wordpress.com
>  
> 
> On 21 May 2016 at 00:24, Bikas Saha  wrote:
> >> tez.lib.uris assumes that paths are based on the default fs and therefore 
> >> if your setup is using HDFS as default, the paths /usr/lib would be invalid
> 
> Are you sure? The below paths looks right to me except that the contents of 
> the directories are wrong.
> 
> tez.lib.uris
> /usr/lib/apache-tez-0.7.1-bin,/usr/lib/apache-tez-0.7.1-bin/lib
> 
> hduser@rhes564: /usr/lib/apache-tez-0.7.1-bin> hdfs dfs -ls 
> /usr/lib/apache-tez-0.7.1-bin Found 2 items
> -rw-r--r--   2 hduser supergroup   53092828 2016-05-20 23:15 
> /usr/lib/apache-tez-0.7.1-bin/apache-tez-0.7.1-bin.tar.gz
> drwxr-xr-x   - hduser supergroup  0 2016-05-20 23:27 
> /usr/lib/apache-tez-0.7.1-bin/lib
> 
> 
> -Original Message-
> From: Hitesh Shah [mailto:hit...@apache.org]
> Sent: Friday, May 20, 2016 4:18 PM
> To: user@tez.apache.org
> Subject: Re: My first TEZ job fails
> 
> Can you try the instructions mentioned at 
> https://cwiki.apache.org/confluence/display/TEZ/Tez+Release+FAQ ?
> 
> tez.lib.uris assumes that paths are based on the default fs and therefore if 
> your setup is using HDFS as default, the paths /usr/lib would be invalid.
> 
> — HItesh
> 
> > On May 20, 2016, at 3:39 PM, Mich Talebzadeh  
> > wrote:
> >
> >

Re: My first TEZ job fails

2016-05-20 Thread Hitesh Shah

Sorry - missed the fact that the /usr/lib paths were also being used on HDFS.  

In any case, the contents of 
/usr/lib/apache-tez-0.7.1-bin,/usr/lib/apache-tez-0.7.1-bin/lib will be only 
from the minimal tarball and therefore will not work out of the box as the 
relevant hadoop-yarn jars would be missing. The Release FAQ wiki has the 
correct instructions on how to use the binary release. 

— Hitesh


> On May 20, 2016, at 4:24 PM, Bikas Saha  wrote:
> 
>>> tez.lib.uris assumes that paths are based on the default fs and therefore 
>>> if your setup is using HDFS as default, the paths /usr/lib would be invalid
> 
> Are you sure? The below paths looks right to me except that the contents of 
> the directories are wrong.
> 
> tez.lib.uris
> /usr/lib/apache-tez-0.7.1-bin,/usr/lib/apache-tez-0.7.1-bin/lib
> 
> hduser@rhes564: /usr/lib/apache-tez-0.7.1-bin> hdfs dfs -ls 
> /usr/lib/apache-tez-0.7.1-bin Found 2 items
> -rw-r--r--   2 hduser supergroup   53092828 2016-05-20 23:15 
> /usr/lib/apache-tez-0.7.1-bin/apache-tez-0.7.1-bin.tar.gz
> drwxr-xr-x   - hduser supergroup  0 2016-05-20 23:27 
> /usr/lib/apache-tez-0.7.1-bin/lib
> 
> 
> -Original Message-
> From: Hitesh Shah [mailto:hit...@apache.org] 
> Sent: Friday, May 20, 2016 4:18 PM
> To: user@tez.apache.org
> Subject: Re: My first TEZ job fails
> 
> Can you try the instructions mentioned at 
> https://cwiki.apache.org/confluence/display/TEZ/Tez+Release+FAQ ?
> 
> tez.lib.uris assumes that paths are based on the default fs and therefore if 
> your setup is using HDFS as default, the paths /usr/lib would be invalid.
> 
> — HItesh
> 
>> On May 20, 2016, at 3:39 PM, Mich Talebzadeh  
>> wrote:
>> 
>> Still failing with /apache-tez-0.7.1-bin I am afraid.
>> 
>> OK this is my tez-site.xml
>> 
>> hduser@rhes564: /home/hduser/hadoop-2.6.0/etc/hadoop> cat tez-site.xml 
>> 
>>  
>>tez.version
>>0.7.1
>>  
>>  
>>tez.lib.uris
>>
>> /usr/lib/apache-tez-0.7.1-bin,/usr/lib/apache-tez-0.7.1-bin/lib
>>  
>> 
>> 
>> This is what I have put in hdfs directory
>> 
>> hduser@rhes564: /usr/lib/apache-tez-0.7.1-bin> hdfs dfs -ls 
>> /usr/lib/apache-tez-0.7.1-bin Found 2 items
>> -rw-r--r--   2 hduser supergroup   53092828 2016-05-20 23:15 
>> /usr/lib/apache-tez-0.7.1-bin/apache-tez-0.7.1-bin.tar.gz
>> drwxr-xr-x   - hduser supergroup  0 2016-05-20 23:27 
>> /usr/lib/apache-tez-0.7.1-bin/lib
>> 
>> Also I put all /usr/lib/apache-tez-0.7.1-bin/lib/*.jar in 
>> /usr/lib/apache-tez-0.7.1-bin/lib
>> 
>> hduser@rhes564: /usr/lib/apache-tez-0.7.1-bin> hdfs dfs -ls 
>> /usr/lib/apache-tez-0.7.1-bin/lib Found 22 items
>> -rw-r--r--   2 hduser supergroup 124846 2016-05-20 23:27 
>> /usr/lib/apache-tez-0.7.1-bin/lib/RoaringBitmap-0.4.9.jar
>> -rw-r--r--   2 hduser supergroup  41123 2016-05-20 23:27 
>> /usr/lib/apache-tez-0.7.1-bin/lib/commons-cli-1.2.jar
>> -rw-r--r--   2 hduser supergroup  58160 2016-05-20 23:27 
>> /usr/lib/apache-tez-0.7.1-bin/lib/commons-codec-1.4.jar
>> -rw-r--r--   2 hduser supergroup 588337 2016-05-20 23:27 
>> /usr/lib/apache-tez-0.7.1-bin/lib/commons-collections-3.2.2.jar
>> -rw-r--r--   2 hduser supergroup 751238 2016-05-20 23:27 
>> /usr/lib/apache-tez-0.7.1-bin/lib/commons-collections4-4.1.jar
>> -rw-r--r--   2 hduser supergroup 185140 2016-05-20 23:27 
>> /usr/lib/apache-tez-0.7.1-bin/lib/commons-io-2.4.jar
>> -rw-r--r--   2 hduser supergroup 284220 2016-05-20 23:27 
>> /usr/lib/apache-tez-0.7.1-bin/lib/commons-lang-2.6.jar
>> -rw-r--r--   2 hduser supergroup1599627 2016-05-20 23:27 
>> /usr/lib/apache-tez-0.7.1-bin/lib/commons-math3-3.1.1.jar
>> -rw-r--r--   2 hduser supergroup1648200 2016-05-20 23:27 
>> /usr/lib/apache-tez-0.7.1-bin/lib/guava-11.0.2.jar
>> -rw-r--r--   2 hduser supergroup 664918 2016-05-20 23:27 
>> /usr/lib/apache-tez-0.7.1-bin/lib/hadoop-mapreduce-client-common-2.6.0.jar
>> -rw-r--r--   2 hduser supergroup1509399 2016-05-20 23:27 
>> /usr/lib/apache-tez-0.7.1-bin/lib/hadoop-mapreduce-client-core-2.6.0.jar
>> -rw-r--r--   2 hduser supergroup 130458 2016-05-20 23:27 
>> /usr/lib/apache-tez-0.7.1-bin/lib/jersey-client-1.9.jar
>> -rw-r--r--   2 hduser supergroup 147952 2016-05-20 23:27 
>> /usr/lib/apache-tez-0.7.1-bin/lib/jersey-json-1.9.jar
>> -rw-r--r--   2 hduser supergroup  81743 2016-05-20 23:27 
>> /usr/lib/apache-tez-0.7.1-bin/lib/jettison-1.3.4.jar
>> -rw-r--r--   2 hduser supergroup 539912 2016-05-20 23:27 
>> /usr/lib/apache-tez-0.

Re: My first TEZ job fails

2016-05-20 Thread Hitesh Shah

(Subject.java:720)
> 
> at 
> org.apache.hadoop.security.UserGroupInformation.getTokenIdentifiers(UserGroupInformation.java:1400)
> 
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.selectNMTokenIdentifier(ContainerManagerImpl.java:618)
> 
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:699)
> 
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.startContainers(ContainerManagementProtocolPBServiceImpl.java:60)
> 
> at 
> org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:95)
> 
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
> 
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
> 
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
> 
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
> 
> at java.security.AccessController.doPrivileged(Native Method)
> 
> at javax.security.auth.Subject.doAs(Subject.java:422)
> 
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> 
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
> 
> at org.apache.hadoop.ipc.Client.call(Client.java:1468)
> 
> at org.apache.hadoop.ipc.Client.call(Client.java:1399)
> 
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> 
> at com.sun.proxy.$Proxy80.startContainers(Unknown Source)
> 
> at 
> org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
> 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119)
> 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
> 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 
> at java.lang.Thread.run(Thread.java:745)
> 
> . Failing the application.
> 
> Thanks
> 
>  
> 
>  
> 
>  
> 
>  
> 
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> http://talebzadehmich.wordpress.com
>  
>  
> 
> On 20 May 2016 at 19:06, Hitesh Shah  wrote:
> 
> Logs from `bin/yarn logs -applicationId application_1463758195355_0002` would 
> be more useful to debug your setup issue. The RM logs usually do not shed 
> much light on why an application failed.
> Can you confirm that you configured tez.lib.uris correctly to point to the 
> tez tarball on HDFS (tez tar should be the one obtained from 
> tez-dist/target/tez-0.8.3.tar.gz) ?
> 
> — Hitesh
> 
> 
> > On May 20, 2016, at 10:24 AM, Mich Talebzadeh  
> > wrote:
> >
> > Hi,
> >
> > I have just compiled and installed TEZ, trying to do a test with
> >
> > hadoop jar ./tez-examples-0.8.3.jar orderedwordcount /tmp/input/test.txt 
> > /tmp/out
> >
> > The job fails as follows. This is from yarn log
> >
> > 2016-05-20 18:19:26,945 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
> > Auth successful for appattempt_1463758195355_0002_01 (auth:SIMPLE)
> > 2016-05-20 18:19:26,950 WARN org.apache.hadoop.ipc.Server: IPC Server 
> > handler 0 on 59093, call 
> > org.apache.hadoop.yarn.api.ContainerManagementProtocolPB.startContainers 
> > from 50.140.197.217:46784 Call#2 Retry#0
> > java.lang.NoSuchMethodError: 
> > org.apache.hadoop.yarn.proto.YarnProtos$ApplicationIdProto.hashLong(J)I
> > at 
> > org.apache.hadoop.yarn.proto.YarnProtos$ApplicationIdProto.hashCode(YarnProtos.java:2616)
> > at 
> > org.apache.hadoop.yarn.proto.YarnProtos$ApplicationAttemptIdProto.hashCode(YarnProtos.java:3154)
> > at 
> > org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$NMTokenIdentifierProto.hashCode(YarnSecurityTokenProtos.java:410)
> > at 
> > org.apache.hadoop.yarn.security.NMTokenIdentifier.hashCode(NMTokenIdentifier.java:126)
> > at java.util.HashMap.hash(HashMap.java:338)
> > at java.util.HashMap.put(HashMap.java:611)
> > at java.util.HashSet.add(HashSet.java:219)
> > at 
> > javax.security.auth.Subject$ClassSet.populateSet(Subject.java:1409)
> > at javax.security.aut

Re: My first TEZ job fails

2016-05-20 Thread Hitesh Shah

Logs from `bin/yarn logs -applicationId application_1463758195355_0002` would 
be more useful to debug your setup issue. The RM logs usually do not shed much 
light on why an application failed.
Can you confirm that you configured tez.lib.uris correctly to point to the tez 
tarball on HDFS (tez tar should be the one obtained from 
tez-dist/target/tez-0.8.3.tar.gz) ? 

— Hitesh

> On May 20, 2016, at 10:24 AM, Mich Talebzadeh  
> wrote:
> 
> Hi,
> 
> I have just compiled and installed TEZ, trying to do a test with
> 
> hadoop jar ./tez-examples-0.8.3.jar orderedwordcount /tmp/input/test.txt 
> /tmp/out
> 
> The job fails as follows. This is from yarn log
> 
> 2016-05-20 18:19:26,945 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
> Auth successful for appattempt_1463758195355_0002_01 (auth:SIMPLE)
> 2016-05-20 18:19:26,950 WARN org.apache.hadoop.ipc.Server: IPC Server handler 
> 0 on 59093, call 
> org.apache.hadoop.yarn.api.ContainerManagementProtocolPB.startContainers from 
> 50.140.197.217:46784 Call#2 Retry#0
> java.lang.NoSuchMethodError: 
> org.apache.hadoop.yarn.proto.YarnProtos$ApplicationIdProto.hashLong(J)I
> at 
> org.apache.hadoop.yarn.proto.YarnProtos$ApplicationIdProto.hashCode(YarnProtos.java:2616)
> at 
> org.apache.hadoop.yarn.proto.YarnProtos$ApplicationAttemptIdProto.hashCode(YarnProtos.java:3154)
> at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$NMTokenIdentifierProto.hashCode(YarnSecurityTokenProtos.java:410)
> at 
> org.apache.hadoop.yarn.security.NMTokenIdentifier.hashCode(NMTokenIdentifier.java:126)
> at java.util.HashMap.hash(HashMap.java:338)
> at java.util.HashMap.put(HashMap.java:611)
> at java.util.HashSet.add(HashSet.java:219)
> at javax.security.auth.Subject$ClassSet.populateSet(Subject.java:1409)
> at javax.security.auth.Subject$ClassSet.(Subject.java:1369)
> at javax.security.auth.Subject.getPublicCredentials(Subject.java:720)
> at 
> org.apache.hadoop.security.UserGroupInformation.getTokenIdentifiers(UserGroupInformation.java:1400)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.selectNMTokenIdentifier(ContainerManagerImpl.java:618)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:699)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.startContainers(ContainerManagementProtocolPBServiceImpl.java:60)
> at 
> org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:95)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
> 2016-05-20 18:19:27,929 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Event EventType: KILL_CONTAINER sent to absent container 
> container_1463758195355_0002_01_01
> 2016-05-20 18:19:27,944 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
> Auth successful for appattempt_1463758195355_0002_02 (auth:SIMPLE)
> 2016-05-20 18:19:27,949 WARN org.apache.hadoop.ipc.Server: IPC Server handler 
> 0 on 59093, call 
> org.apache.hadoop.yarn.api.ContainerManagementProtocolPB.startContainers from 
> 50.140.197.217:46785 Call#3 Retry#0
> java.lang.NoSuchMethodError: 
> org.apache.hadoop.yarn.proto.YarnProtos$ApplicationIdProto.hashLong(J)I
> at 
> org.apache.hadoop.yarn.proto.YarnProtos$ApplicationIdProto.hashCode(YarnProtos.java:2616)
> at 
> org.apache.hadoop.yarn.proto.YarnProtos$ApplicationAttemptIdProto.hashCode(YarnProtos.java:3154)
> at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$NMTokenIdentifierProto.hashCode(YarnSecurityTokenProtos.java:410)
> at 
> org.apache.hadoop.yarn.security.NMTokenIdentifier.hashCode(NMTokenIdentifier.java:126)
> at java.util.HashMap.hash(HashMap.java:338)
> at java.util.HashMap.put(HashMap.java:611)
> at java.util.HashSet.add(HashSet.java:219)
> at javax.security.auth.Subject$ClassSet.populateSet(Subject.java:1409)
> at javax.security.auth.Subject$ClassSet.(Subject.java:1369)
> at javax.security.auth.Subject.getPublicCredentials(Subject.java:720)
> at 
> org.apache.hadoop.security.UserGroupInformation.getTokenIdentifiers(UserGr

Re: data discrepancies related to parallelism

2016-05-05 Thread Hitesh Shah

Thanks for the info, Kurt. You may wish to post this question to the Pig lists 
too to see if anyone has seen this.

— Hitesh 


> On May 5, 2016, at 11:05 AM, Kurt Muehlner  wrote:
> 
> Hi Hitesh,
> 
> We are using Pig 0.15.0 and Tez 0.8.2.
> 
> Thanks,
> Kurt
> 
> 
> 
> On 5/5/16, 11:00 AM, "Hitesh Shah"  wrote:
> 
>> What version are you running with? 
>> 
>> thanks
>> — Hitesh

Re: data discrepancies related to parallelism

2016-05-05 Thread Hitesh Shah

What version are you running with? 

thanks
— Hitesh 

> On May 5, 2016, at 10:31 AM, Kurt Muehlner  wrote:
> 
> Hello,
> 
> We have a Pig/Tez application which is exhibiting a strange problem.  This 
> application was recently migrated from Pig/MR to Pig/Tez.  We carefully 
> vetted during QA that both MR and Tez versions produced identical results.  
> However, after deploying to production, we noticed that occasionally, results 
> are not the same (either as compared to MR results, or results of Tez 
> processing the same data on a QA cluster).
> 
> We’re still looking into the root cause, but I’d like to reach out to the 
> user group in case anyone has seen anything similar, or has suggestions on 
> what might be wrong/what to investigate.
> 
> *** What we know so far ***
> Results discrepancy occurs ONLY when the number of containers given to the 
> application by YARN is less than the number requested (we have disabled 
> auto-parallelism, and are using SET_DEFAULT_PARALLEL=50 in all pig scripts).  
> When this occurs, we also see a corresponding discrepancy in the the file 
> system counters HDFS_READ_OPS and HDFS_BYTES_READ (lower when number of 
> containers is low), despite the fact that in all cases number of records 
> processed is identical.
> 
> Thus, when the production cluster is very busy, we get invalid results.  We 
> have kept a separate instance of the Pig/Tez application running on another 
> cluster where it never competes for resources, so we have been able to 
> compare results for each run of the application, which has allowed us to 
> diagnose the problem this far.  By comparing results on these two clusters, 
> we also know that the ratio (actual HDFS_READ_OPS)/(expected HDFS_READ_OPS) 
> correlates with the ratio (actual containers)/(requested containers).  
> Likewise, we see the same correlation between hdfs ops ratio and container 
> ratio.
> 
> Below are some relevant counters.  For each counter, the first line is the 
> value from the production cluster showing the problem, and the second line is 
> the value from the QA cluster running on the same data.
> 
> Any hints/suggestions/questions are most welcome.
> 
> Thanks,
> Kurt
> 
> org.apache.tez.common.counters.DAGCounter
> 
>  NUM_SUCCEEDED_TASKS=950
>  NUM_SUCCEEDED_TASKS=950
> 
>  TOTAL_LAUNCHED_TASKS=950
>  TOTAL_LAUNCHED_TASKS=950
> 
> File System Counters
> 
>  FILE_BYTES_READ=7745801982
>  FILE_BYTES_READ=8003771938
> 
>  FILE_BYTES_WRITTEN=9725468612
>  FILE_BYTES_WRITTEN=9675253887
> 
>  *HDFS_BYTES_READ=9487600888  (when number of containers equals the number 
> requested, this counter is the same between the two clusters)
>  *HDFS_BYTES_READ=17996466110
> 
>  *HDFS_READ_OPS=3080  (when number of containers equals the number requested, 
> this counter is the same between the two clusters)
>  *HDFS_READ_OPS=3600
> 
>  HDFS_WRITE_OPS=900
>  HDFS_WRITE_OPS=900
> 
> org.apache.tez.common.counters.TaskCounter
>  INPUT_RECORDS_PROCESSED=28729671
>  INPUT_RECORDS_PROCESSED=28729671
> 
> 
>  OUTPUT_RECORDS=33655895
>  OUTPUT_RECORDS=33655895
> 
>  OUTPUT_BYTES=28290888628
>  OUTPUT_BYTES=28294000270
> 
> Input(s):
> Successfully read 2254733 records (1632743360 bytes) from: "input1"
> Successfully read 2254733 records (1632743360 bytes) from: "input1"
> 
> 
> Output(s):
> Successfully stored 0 records in: “output1”
> Successfully stored 0 records in: "output1”
> 
> Successfully stored 56019 records (10437069 bytes) in: “output2”
> Successfully stored 56019 records (10437069 bytes) in: "output2”
> 
> Successfully stored 2254733 records (1651936175 bytes) in: "output3”
> Successfully stored 2254733 records (1651936175 bytes) in: "output3”
> 
> Successfully stored 1160599 records (823479742 bytes) in: "output4”
> Successfully stored 1160599 records (823480450 bytes) in: "output4”
> 
> Successfully stored 28605 records (21176320 bytes) in: "output5”
> Successfully stored 28605 records (21177552 bytes) in: "output5”
> 
> Successfully stored 6574 records (4442933 bytes) in: "output6”
> Successfully stored 6574 records (4442933 bytes) in: "output6”
> 
> Successfully stored 111416 records (164375858 bytes) in: "output7”
> Successfully stored 111416 records (164379800 bytes) in: "output7”
> 
> Successfully stored 542 records (387761 bytes) in: "output8”
> Successfully stored 542 records (387762 bytes) in: "output8"
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>

Re: Varying vcores/ram for hive queries running Tez engine

2016-05-04 Thread Hitesh Shah

Bikas’ comment ( and mine below ) is relevant only for task specific settings. 
Hive does not override any settings for the Tez AM so the tez configs for the 
AM memory/vcores will reflect at runtime. 

I believe Hive has a proxy config - hive.tez.cpu.vcores - for (3) which may be 
why your setting for (3) is not taking effect. Additionally, Hive also tends to 
fallback to MR based values if tez specific values are not specified which 
might be something else you may wish to ask on the Hive user list. 

thanks
— Hitesh


> On May 4, 2016, at 10:14 PM, Bikas Saha  wrote:
> 
> IIRC 1) will override 2) since 2) is the tez config and 1) is the Hive config 
> that is a proxy for 2).
> 
> Bikas
> 
> Date: Mon, 25 Apr 2016 13:57:38 +0530
> Subject: Varying vcores/ram for hive queries running Tez engine
> From: nk94.nitinku...@gmail.com
> To: u...@hive.apache.org; user@tez.apache.org
> 
> I was trying to benchmark some hive queries. I am using the tez execution 
> engine. I varied the values of the following properties:
>   • hive.tez.container.size
>   • tez.task.resource.memory.mb
>   • tez.task.resource.cpu.vcores
> Changes in values for property 1 is reflected properly. However it seems that 
> hive does not respect changes in values of property 3; it always allocates 
> one vcore per requested container (RM is configured to use the 
> DominantResourceCalculator). This got me thinking about the precedence of 
> property values in hive and tez.
> I have the following questions with respect to these configurations
>   • Does hive respect the set values for the properties 2 and 3 at all?
>   • If I set property 1 to a value say 2048 MB and property 2 is set to a 
> value of say 1024 MB does this mean that I am wasting about a GB of memory 
> for each spawned container?
>   • Is there a property in hive similar to property 1 that allows me to 
> use the 'set' command in the .hql file to specify the number of vcores to use 
> per container?
>   • Changes in value for the property tez.am.resource.cpu.vcores are 
> reflected at runtime. However I do not observe the same behaviour with 
> property 3. Are there other configurations that take precedence over it?
> Your inputs and suggestions would be highly appreciated.
> 
> Thanks!
> 
> 
> PS: Tests conducted on a 5 node cluster running HDP 2.3.0

Re: Description of tez counters

2016-04-11 Thread Hitesh Shah

Take a look at TaskCounter and DAGCounter under 
https://git-wip-us.apache.org/repos/asf?p=tez.git;a=tree;f=tez-api/src/main/java/org/apache/tez/common/counters;h=df3784e54d1fa6075dcbbca8d1405e309bce1460;hb=HEAD
 and let us know if that is insufficient. 

thanks
— Hitesh 

On Apr 11, 2016, at 4:42 AM, Nitin Kumar  wrote:

> Hi!
> 
> I was going through the counters made available by the TEZ view in HDP 
> distribution (2.3.0). I tried to find descriptions for each of the available 
> counters but could not get a comprehensive list. I got the description for a 
> few of them in "Hadoop - The Definitive Guide 4th edition".
> 
> I would be highly obliged if someone could direct me to a list of the 
> description for tez counters.
> 
> Thanks and regards,
> Nitin

Re: Unable To Build Apache Tez 0.7.1 on Hadoop 2.7

2016-04-06 Thread Hitesh Shah

We have been seeing some recent failures due to some form of npm registry 
errors even though no code has changed (  
https://issues.apache.org/jira/browse/TEZ-3201 ) - is this what you are seeing? 

BIGTOP-1826 is a different issue. That is due to the frontend-plugin not being 
compatible with certain versions of maven ( 0.0.22 is not compatible with mvn 
3.3 ).This was fixed in Tez by switching the frontend plugin version based on 
the mvn version in use. If you check the top-level pom, it is defaulting to 
0.0.23 but falls back to 0.0.22 if you have an older version of mvn. 

That said, a simpler approach for your case is to remove tez-ui from the module 
list at the top-level and re-compile ( i.e. not build tez-ui module). The 
tez-ui.war is not really used by the tez runtime but only used as part of the 
Tez UI. For this, you can download the war from 
https://repository.apache.org/content/groups/public/org/apache/tez/tez-ui/ and 
use as needed if you have the UI deployed. 

thanks
— Hitesh 


On Apr 6, 2016, at 2:47 PM, Sam Joe  wrote:

> Thanks for your valuable suggestion. I used -fae option to skip the errors 
> and proceed with the ones which is working. 
> sudo mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true 
> -Dfrontend-maven-plugin.version=0.0.22 -fae
> 
> Now I'm stuck with only the following 2 errors:
> 
> [ERROR] Failed to execute goal 
> com.github.eirslett:frontend-maven-plugin:0.0.16:npm (npm install) on project 
> tez-ui: Failed to run task: 'npm install --color=false' failed. (error code 
> 34) -> [Help 1]
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-assembly-plugin:2.4:single (package-tez) on 
> project tez-dist: Failed to create assembly: Artifact: 
> org.apache.tez:tez-ui:war:0.7.0 (included by module) does not have an 
> artifact with a file. Please ensure the package phase is run before the 
> assembly is generated. -> [Help 2]
> 
> 
> [INFO] tez ... SUCCESS [3.861s]
> [INFO] tez-api ... SUCCESS [37.493s]
> [INFO] tez-common  SUCCESS [3.843s]
> [INFO] tez-runtime-internals . SUCCESS [5.211s]
> [INFO] tez-runtime-library ... SUCCESS [9.675s]
> [INFO] tez-mapreduce . SUCCESS [11.038s]
> [INFO] tez-examples .. SUCCESS [1.355s]
> [INFO] tez-dag ... SUCCESS [21.797s]
> [INFO] tez-tests . SUCCESS [4.109s]
> [INFO] tez-ui  FAILURE [1:12.348s]
> [INFO] tez-plugins ... SUCCESS [0.057s]
> [INFO] tez-yarn-timeline-history . SUCCESS [1.816s]
> [INFO] tez-yarn-timeline-history-with-acls ... SUCCESS [1.292s]
> [INFO] tez-mbeans-resource-calculator  SUCCESS [0.640s]
> [INFO] tez-tools . SUCCESS [0.148s]
> [INFO] tez-dist .. FAILURE [8.107s]
> [INFO] Tez ... SUCCESS [0.063s]
> 
> For the npm error I see a open JIRA : 
> https://issues.apache.org/jira/browse/BIGTOP-1826
> 
> Do you have any suggestion?
> 
> Thanks.
> 
> 
> On Wed, Apr 6, 2016 at 4:13 PM Hitesh Shah  wrote:
> Not sure why you are hitting those timeouts. Are you running this on a 
> small-sized VM which may be impacting the tests? Running locally on my 
> laptop, I don’t seem to be having any issues doing a mvn test on 0.7.0 
> against hadoop-2.7.0.
> 
> One option would be to just do a fail-at-end and then re-run the failed tests 
> separately to see if they are just flaky on your env.
> 
> thanks
> — Hitesh
> 
> On Apr 6, 2016, at 12:48 PM, Sam Joe  wrote:
> 
> > Hi Hitesh,
> >
> > My environment: Hadoop 2.7.0
> > As suggested by you I'm trying with apache-tez-0.7.0-src
> >
> > I've to use the following command to avoid npm and Dfrontend missing error:
> >
> > sudo mvn clean package -Dfrontend-maven-plugin.version=0.0.22
> >
> > I keep getting timeout error while building Tez as shown in the error log 
> > below:
> >
> > ---
> > Test set: org.apache.tez.client.TestTezClient
> > ---
> > Tests run: 8, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 9.013 sec 
> > <<< FAILURE!
> > testTezclientSession(org.apache.tez.client.TestTezClient)  Time e

Re: Unable To Build Apache Tez 0.7.1 on Hadoop 2.7

2016-04-06 Thread Hitesh Shah

Not sure why you are hitting those timeouts. Are you running this on a 
small-sized VM which may be impacting the tests? Running locally on my laptop, 
I don’t seem to be having any issues doing a mvn test on 0.7.0 against 
hadoop-2.7.0. 

One option would be to just do a fail-at-end and then re-run the failed tests 
separately to see if they are just flaky on your env. 

thanks
— Hitesh

On Apr 6, 2016, at 12:48 PM, Sam Joe  wrote:

> Hi Hitesh,
> 
> My environment: Hadoop 2.7.0
> As suggested by you I'm trying with apache-tez-0.7.0-src
> 
> I've to use the following command to avoid npm and Dfrontend missing error:
> 
> sudo mvn clean package -Dfrontend-maven-plugin.version=0.0.22
> 
> I keep getting timeout error while building Tez as shown in the error log 
> below:
> 
> ---
> Test set: org.apache.tez.client.TestTezClient
> ---
> Tests run: 8, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 9.013 sec <<< 
> FAILURE!
> testTezclientSession(org.apache.tez.client.TestTezClient)  Time elapsed: 
> 5.062 sec  <<< ERROR!
> java.lang.Exception: test timed out after 5000 milliseconds
> 
> (Earlier I was able to build apache-tez-0.5.0-src in the same environment 
> however had to upgrade to 0.7.0 in order to avoid some bugs fixed in 0.7.0 
> version). 
> 
> Please help.
> 
> 
> Thanks,
> Joel
> 
> On Wed, Apr 6, 2016 at 3:01 PM Hitesh Shah  wrote:
> Hello Sam,
> 
> Couple of things to confirm:
> - I assume you are building branch-0.7 of Tez for 0.7.1-SNAPSHOT as there 
> has not yet been a release of 0.7.1?
> - For hadoop, are you using hadoop-2.7.0 or hadoop-2.7.1 ( though this 
> really should not be too relevant here )?
> 
> I took branch-0.7 of the current code base and compiled it against 
> hadoop-2.7.1. I also ran the test that was failing and it did not fail.
> 
> Picked up _JAVA_OPTIONS:  -Djava.awt.headless=true
> Running org.apache.tez.dag.app.TestMockDAGAppMaster
> 2016-04-06 11:31:18.076 java[20695:1903] Unable to load realm info from 
> SCDynamicStore
> Tests run: 17, Failures: 0, Errors: 0, Skipped: 3, Time elapsed: 5.788 sec
> 
> I think this might be a flaky test that we need to fix. I would suggest 
> filing a bug for this with the surefire logs for the test in question so that 
> it will help us debug the flaky failure. Please update the jira with hadoop 
> version too.  If this test is consistently failing for you, let us know ( if 
> not, a quick fix would be to just patch the code with an @Ignore to skip the 
> test )
> 
> thanks
> — Hitesh
> 
> 
> On Apr 6, 2016, at 10:57 AM, Sam Joe  wrote:
> 
> > I am using the following command for building Tez.
> >
> > sudo mvn clean package -Dfrontend-maven-plugin.version=0.0.22
> >
> > Thanks,
> > Joel
> >
> > On Wed, Apr 6, 2016 at 1:51 PM, Sam Joe  wrote:
> > Hi,
> >
> > Could any help me in resolving the following error which I am getting while 
> > building Apache Tez 0.7.1 on Hadoop 2.7:
> >
> > testCommitOutputOnDAGSuccess(org.apache.tez.dag.app.TestMockDAGAppMaster)  
> > Time elapsed: 0.453 sec  <<< ERROR!
> > org.apache.tez.dag.api.TezException: App master already running a DAG
> >   at 
> > org.apache.tez.dag.app.DAGAppMaster.submitDAGToAppMaster(DAGAppMaster.java:1256)
> >
> > I can see that there is a Jira found in the following link but not sure how 
> > to use the fix.
> >
> > https://issues.apache.org/jira/browse/TEZ-2307
> >
> > Please help.
> >
> > Thanks,
> > Sam
> >
>

Re: Unable To Build Apache Tez 0.7.1 on Hadoop 2.7

2016-04-06 Thread Hitesh Shah

Hello Sam, 

Couple of things to confirm: 
- I assume you are building branch-0.7 of Tez for 0.7.1-SNAPSHOT as there 
has not yet been a release of 0.7.1?
- For hadoop, are you using hadoop-2.7.0 or hadoop-2.7.1 ( though this 
really should not be too relevant here )?

I took branch-0.7 of the current code base and compiled it against 
hadoop-2.7.1. I also ran the test that was failing and it did not fail. 

Picked up _JAVA_OPTIONS:  -Djava.awt.headless=true
Running org.apache.tez.dag.app.TestMockDAGAppMaster
2016-04-06 11:31:18.076 java[20695:1903] Unable to load realm info from 
SCDynamicStore
Tests run: 17, Failures: 0, Errors: 0, Skipped: 3, Time elapsed: 5.788 sec

I think this might be a flaky test that we need to fix. I would suggest filing 
a bug for this with the surefire logs for the test in question so that it will 
help us debug the flaky failure. Please update the jira with hadoop version 
too.  If this test is consistently failing for you, let us know ( if not, a 
quick fix would be to just patch the code with an @Ignore to skip the test )
 
thanks
— Hitesh


On Apr 6, 2016, at 10:57 AM, Sam Joe  wrote:

> I am using the following command for building Tez.
> 
> sudo mvn clean package -Dfrontend-maven-plugin.version=0.0.22
> 
> Thanks,
> Joel
> 
> On Wed, Apr 6, 2016 at 1:51 PM, Sam Joe  wrote:
> Hi,
> 
> Could any help me in resolving the following error which I am getting while 
> building Apache Tez 0.7.1 on Hadoop 2.7:
> 
> testCommitOutputOnDAGSuccess(org.apache.tez.dag.app.TestMockDAGAppMaster)  
> Time elapsed: 0.453 sec  <<< ERROR!
> org.apache.tez.dag.api.TezException: App master already running a DAG
>   at 
> org.apache.tez.dag.app.DAGAppMaster.submitDAGToAppMaster(DAGAppMaster.java:1256)
> 
> I can see that there is a Jira found in the following link but not sure how 
> to use the fix.
> 
> https://issues.apache.org/jira/browse/TEZ-2307
> 
> Please help.
> 
> Thanks,
> Sam
>

Re: Apply patches to Apache Tez

2016-04-06 Thread Hitesh Shah

Apache releases are officially source-only releases. Some projects do provide 
binary jars for convenience but that depends on the project. Additionally, 
these will only be available for “releases” only and not for each and every 
patch applied on a branch. 

In your case, the only option is to download the source code for the release in 
question ( http://tez.apache.org/releases/index.html ). Download the patch file 
from JIRA and apply the patch against the source code. Build the source and 
deploy as explained in my previous email.

If you are willing to live with the downtime, you can re-build the code and 
replace the jars/tarball in place for the current ones ( new locations are 
needed if you want to do a live/rolling upgrade with no downtime - however the 
new location approach will also allow you to do some testing to verify 
correctness of your newly applied patches ). 

Additionally, for cases like this, feel free to ask/push the project community 
for making a new release of 0.7.1 to make your life a bit simpler. 0.7.1 has 
been pending on our plate for quite some time and we have been a bit lax on 
making a release for it. 

thanks
— Hitesh

On Apr 6, 2016, at 11:41 AM, Sam Joe  wrote:

> Hi Hitesh,
> 
> That surely helps!
> 
> However, how do I apply the .patch file to existing releases. For example, 
> Tez 0.7.0 has a bug which has been fixed through a JIRA with a .patch file 
> provided. No new set of jars are provided.
> 
> How do I apply that .patch file to my existing setup of jars?
> 
> Appreciate your help and time.
> 
> Thanks,
> Joel
> 
> On Wed, Apr 6, 2016 at 2:28 PM, Hitesh Shah  wrote:
> Every component has a different approach to how it is deployed/upgraded.
> 
> I can cover how you can go about patching Tez on an existing production 
> system. The steps should be similar to that described in INSTALL.md in the 
> source tree with a few minor gotchas to be aware of:
> 
>- Deploying Tez has 2 aspects:
>  - installing the client jars on the local filesystem which can then be 
> added to the class paths of various components such as Hive/Pig, etc that use 
> Tez. These components need the tez-api, tez-common, tez-mapreduce, 
> tez-runtime-library jars in their classpath for the most part ( this set is 
> bundled as tez-minimal tarball in tez-dist when you build Tez ). The 
> classpath manipulation is usually done by adding the tez jars to 
> HADOOP_CLASSPATH.
>  - installing the tez tarball on HDFS and configuring the configs to 
> point to the location of the tez tarball on HDFS.
> 
> Usually most bugs/patches tend to get applied to tez-dag and 
> tez-runtime-internals so for the most part you will likely only need to patch 
> the tez tarball. If you are moving to a new version, both the client side and 
> HDFS tarball need to be upgraded as there is an in-built check to ensure that 
> both sides are consistent/compatible.
> 
> To upgrade client side jars, it should be a simple option to install the new 
> jars in an appropriate location and modifying HADOOP_CLASSPATH to point to 
> the new location. Likewise for the tez tarball - upload the new tarball to a 
> new location and modify configs to point to the new location. The exact steps 
> would be the following:
>1) Upload new tez tarball to new location on HDFS
>2) Backup tez configs to a new tez config dir and modify tez.lib.uris to 
> point to the new tarball location
>3) Install new tez client side jars.
>4) Update HADOOP_CLASSPATH to contain location of new tez client jars as 
> well as new tez config dir
> 
> What the above does is ensure that existing jobs do not start failing in 
> between while things are being upgraded. As long as the old tarball is not 
> deleted while old jobs are runnings, existing jobs should not fail. New jobs 
> submitted with the new HADOOP_CLASSPATH will pick up the newly deployed bits.
> 
> Hope that helps
> — Hitesh
> 
> 
> On Apr 6, 2016, at 10:34 AM, Sam Joe  wrote:
> 
> > Hi,
> >
> > How do you apply patches to Tez or any other Hadoop component? For example 
> > if there is a bug in the existing classes used in a Hadoop component and 
> > it's resolved in a Jira, how do you apply that patch to the existing 
> > on-premise Hadoop setup? I think we should use Git but don't know the exact 
> > steps to do that. Please help.
> >
> >
> > Thanks,
> > Sam
> 
>

Re: Apply patches to Apache Tez

2016-04-06 Thread Hitesh Shah

Every component has a different approach to how it is deployed/upgraded. 

I can cover how you can go about patching Tez on an existing production system. 
The steps should be similar to that described in INSTALL.md in the source tree 
with a few minor gotchas to be aware of:

   - Deploying Tez has 2 aspects: 
 - installing the client jars on the local filesystem which can then be 
added to the class paths of various components such as Hive/Pig, etc that use 
Tez. These components need the tez-api, tez-common, tez-mapreduce, 
tez-runtime-library jars in their classpath for the most part ( this set is 
bundled as tez-minimal tarball in tez-dist when you build Tez ). The classpath 
manipulation is usually done by adding the tez jars to HADOOP_CLASSPATH.
 - installing the tez tarball on HDFS and configuring the configs to point 
to the location of the tez tarball on HDFS. 

Usually most bugs/patches tend to get applied to tez-dag and 
tez-runtime-internals so for the most part you will likely only need to patch 
the tez tarball. If you are moving to a new version, both the client side and 
HDFS tarball need to be upgraded as there is an in-built check to ensure that 
both sides are consistent/compatible.  

To upgrade client side jars, it should be a simple option to install the new 
jars in an appropriate location and modifying HADOOP_CLASSPATH to point to the 
new location. Likewise for the tez tarball - upload the new tarball to a new 
location and modify configs to point to the new location. The exact steps would 
be the following: 
   1) Upload new tez tarball to new location on HDFS
   2) Backup tez configs to a new tez config dir and modify tez.lib.uris to 
point to the new tarball location
   3) Install new tez client side jars.
   4) Update HADOOP_CLASSPATH to contain location of new tez client jars as 
well as new tez config dir

What the above does is ensure that existing jobs do not start failing in 
between while things are being upgraded. As long as the old tarball is not 
deleted while old jobs are runnings, existing jobs should not fail. New jobs 
submitted with the new HADOOP_CLASSPATH will pick up the newly deployed bits. 

Hope that helps
— Hitesh

On Apr 6, 2016, at 10:34 AM, Sam Joe  wrote:

> Hi,
> 
> How do you apply patches to Tez or any other Hadoop component? For example if 
> there is a bug in the existing classes used in a Hadoop component and it's 
> resolved in a Jira, how do you apply that patch to the existing on-premise 
> Hadoop setup? I think we should use Git but don't know the exact steps to do 
> that. Please help.
> 
> 
> Thanks,
> Sam

Re: Tez UI in Pig

2016-04-05 Thread Hitesh Shah

Hi Kurt, 

The Tez UI as documented should work with any version beyond 0.5.2 if the 
history logging is configured to use YARN timeline. As for scopes, some bits of 
the vertex description are currently not displayed in the UI though I am not 
sure if Pig has integrated with that API yet. Depending on the version of 
hadoop you are running and the scale at which you are running, there are some 
known issues with the YARN timeline impl from a scalability perspective but the 
Yahoo folks have implemented some fixes/config workarounds to get around those. 
@Jon Eagles, any chance of publishing a wiki for the configs that you recommend 
running with for YARN Timeline with the level db impl? ( and also the HDFS 
based impl though that this is not really available in any hadoop release as of 
now ). 

If you are trying out the UI, it would be good if you also try out tez-ui2 as 
it has some enhancements coming down the pipe such as a vertex swim lane which 
provides a better overall view of the vertices and how they progress/time they 
took. The UI2 version is fairly new so feedback will be highly appreciated. 

@Rohini, has Pig started setting the vertex info? 
@Sreenath, do we have an open jira for the vertex description to be displayed 
in the UI?

thanks
— Hitesh

On Apr 5, 2016, at 11:04 AM, Kurt Muehlner  wrote:

> I have a question about the availability of the Tez web UI in Pig on Tez.  
> The Pig ‘Performance and Efficiency’ doc states, "Tez specific GUI is not 
> available yet, there is no GUI to track task progress. However, log message 
> is available in GUI.”  What does this mean, precisely?  We have not deployed 
> and configured the Tez UI described here:  
> https://tez.apache.org/tez-ui.html.  Will that UI work when running Tez on 
> Pig?  If so, what does ‘Tez specific GUI is not available yet’ mean?
> 
> What I am most specifically concerned about is the ability to see which Pig 
> aliases are being assigned to which Tez vertices, or failing that, which Pig 
> aliases are being processed by a particular Tez DAG.  This is currently not 
> available in logs in pig 0.15.0, although I’m aware it is in master.
> 
> What are best practices for Pig 0.15.0?
> 
> Thanks,
> Kurt

Re: pig on tez hang with connection reset

2016-03-23 Thread Hitesh Shah

For any of the earlier hangs when the app was killed, would you mind attaching 
the full app logs obtained via “bin/yarn logs …” to the jira too. That could be 
something to look at until you get the stack dump. 

thanks
— Hitesh

On Mar 23, 2016, at 1:57 PM, Kurt Muehlner  wrote:

> Hi Hitesh,
> 
> Thanks for the quick response.
> 
> We are using Pig 0.15.0 and Tez 0.8.2.
> 
> I will certainly file that jira.  This application is a batch process which 
> runs hourly, and only hangs on some fraction of instances (~ 10%).  If 
> previous patterns continue, it shouldn’t be too long before I’m able to get 
> the stack dump.
> 
> That host is a node manager.  The log messages I originally posted are from 
> one of the syslog_dag files.  From where I posted until we kill the 
> application, the only messages we see in that log are more ‘Releasing 
> container’ messages, the last one being:
> 
> 2016-03-21 16:39:03,900 [INFO] [DelayedContainerManager] 
> |rm.YarnTaskSchedulerService|: No taskRequests. Container's idle timeout 
> delay expired or is new. Releasing container, 
> containerId=container_e11_1437886552023_169758_01_000974, 
> containerExpiryTime=1458603543753, idleTimeout=5000, taskRequestsCount=0, 
> heldContainers=85, delayedContainers=0, isNew=false
> 
> 
> After that, that file is not appended to.
> 
> Unfortunately, I no longer have stdout/stderr from one of the script 
> invocations that resulted in a hang, but what we see there is a never-ending 
> loop of dag progress messages, with some number of completed tasks, some 
> number of in-progress tasks, and those numbers never changing.  I’m unsure if 
> there are still pending tasks.  I’ll look at that next time it happens.
> 
> Looking carefully in the node manager logs, it seems that particular host has 
> finished every task it starts.  Stdout shows properly paired messages of the 
> form...
> 
> 2016-03-21 16:38:53 Starting to run new task attempt: 
> attempt_1437886552023_169758_3_08_11_0
> 2016-03-21 16:38:55 Completed running task attempt: 
> attempt_1437886552023_169758_3_08_11_0
> 
> 
> . . . for every task.  Syslog looks like this until we kill the application:
> 
> Attempting to fetch new task for container 
> container_e11_1437886552023_169758_01_000960
> 2016-03-21 16:38:53,586 [INFO] [TezChild] |task.ContainerReporter|: Got 
> TaskUpdate for containerId= container_e11_1437886552023_169758_01_000960: 
> 5794 ms after starting to poll. TaskInfo: shouldDie: false, 
> currentTaskAttemptId: attempt_1437886552023_169758_3_08_11_0
> 2016-03-21 16:38:53,587 [INFO] [main] |common.TezUtilsInternal|: Redirecting 
> log file based on addend: attempt_1437886552023_169758_3_08_11_0
> 2016-03-21 16:38:55,078 [INFO] [TezChild] |task.ContainerReporter|: 
> Attempting to fetch new task for container 
> container_e11_1437886552023_169758_01_000960
> 2016-03-21 16:39:24,929 [INFO] [TezChild] |task.ContainerReporter|: Sleeping 
> for 200ms before retrying getTask again. Got null now. Next getTask sleep 
> message after 3ms
> 2016-03-21 16:39:55,069 [INFO] [TezChild] |task.ContainerReporter|: Sleeping 
> for 200ms before retrying getTask again. Got null now. Next getTask sleep 
> message after 3ms
> 2016-03-21 16:40:25,192 [INFO] [TezChild] |task.ContainerReporter|: Sleeping 
> for 200ms before retrying getTask again. Got null now. Next getTask sleep 
> message after 3ms
> 2016-03-21 16:40:55,314 [INFO] [TezChild] |task.ContainerReporter|: Sleeping 
> for 200ms before retrying getTask again. Got null now. Next getTask sleep 
> message after 3ms
> 2016-03-21 16:41:25,437 [INFO] [TezChild] |task.ContainerReporter|: Sleeping 
> for 200ms before retrying getTask again. Got null now. Next getTask sleep 
> message after 3ms
> . . . etc.
> 
> 
> Is there anything else I can provide for now?
> 
> Thanks,
> Kurt
> 
> 
> 
> On 3/23/16, 11:43 AM, "Hitesh Shah"  wrote:
> 
>> Hello Kurt, 
>> 
>> Can you file a jira with a stack dump for the ApplicationMaster process when 
>> it is in this hung state and also include all the application master logs. 
>> Also please mention what version of Pig and Tez you are running. 
>> 
>> The main question would be whether the AM is really hung or does it look 
>> stuck if it is waiting on a particular task ( or set of tasks ) to complete. 
>> “No taskRequests” implies that the dag has no pending tasks to run which 
>> means all the tasks it needs to run are either already running on assigned 
>> containers or completed. Is "10.102.173.86 “ a node manager? If yes, are 
>> there are any tasks running on it which seem to be stuck? We can probably 
>> figure s

Re: pig on tez hang with connection reset

2016-03-23 Thread Hitesh Shah

Hello Kurt, 

Can you file a jira with a stack dump for the ApplicationMaster process when it 
is in this hung state and also include all the application master logs. Also 
please mention what version of Pig and Tez you are running. 

The main question would be whether the AM is really hung or does it look stuck 
if it is waiting on a particular task ( or set of tasks ) to complete. “No 
taskRequests” implies that the dag has no pending tasks to run which means all 
the tasks it needs to run are either already running on assigned containers or 
completed. Is "10.102.173.86 “ a node manager? If yes, are there are any tasks 
running on it which seem to be stuck? We can probably figure some of this out 
from the syslog_dag_* file. 

thanks
— Hitesh 

On Mar 23, 2016, at 11:18 AM, Kurt Muehlner  wrote:

> I posted about this issue in the Pig user mailing list as well, but thought 
> I’d try here too.
> 
> I have recently been testing converting an existing Pig M/R application to 
> run on Tez.  I’ve had to work around a few issues, but the performance 
> improvement is significant (~ 25 minutes on M/R, 5 minutes on Tez).
> 
> Currently the problem I’m running into is that occasionally when processing a 
> DAG the application hangs.  When this happens, I find the following in the 
> syslog for that dag:
> 
> 016-03-21 16:39:01,643 [INFO] [DelayedContainerManager] 
> |rm.YarnTaskSchedulerService|: No taskRequests. Container's idle timeout 
> delay expired or is new. Releasing container, 
> containerId=container_e11_1437886552023_169758_01_000822, 
> containerExpiryTime=1458603541415, idleTimeout=5000, taskRequestsCount=0, 
> heldContainers=112, delayedContainers=27, isNew=false
> 2016-03-21 16:39:01,825 [INFO] [DelayedContainerManager] 
> |rm.YarnTaskSchedulerService|: No taskRequests. Container's idle timeout 
> delay expired or is new. Releasing container, 
> containerId=container_e11_1437886552023_169758_01_000824, 
> containerExpiryTime=1458603541692, idleTimeout=5000, taskRequestsCount=0, 
> heldContainers=111, delayedContainers=26, isNew=false
> 2016-03-21 16:39:01,990 [INFO] [Socket Reader #1 for port 53324] 
> |ipc.Server|: Socket Reader #1 for port 53324: readAndProcess from client 
> 10.102.173.86 threw exception [java.io.IOException: Connection reset by peer]
> java.io.IOException: Connection reset by peer
>at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>at org.apache.hadoop.ipc.Server.channelRead(Server.java:2593)
>at org.apache.hadoop.ipc.Server.access$2800(Server.java:135)
>at 
> org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1471)
>at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:762)
>at 
> org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:636)
>at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:607)
> 2016-03-21 16:39:02,032 [INFO] [DelayedContainerManager] 
> |rm.YarnTaskSchedulerService|: No taskRequests. Container's idle timeout 
> delay expired or is new. Releasing container, 
> containerId=container_e11_1437886552023_169758_01_000811, 
> containerExpiryTime=1458603541828, idleTimeout=5000, taskRequestsCount=0, 
> heldContainers=110, delayedContainers=25, isNew=false
> 2016-03-21 16:39:02,266 [INFO] [DelayedContainerManager] 
> |rm.YarnTaskSchedulerService|: No taskRequests. Container's idle timeout 
> delay expired or is new. Releasing container, 
> containerId=container_e11_1437886552023_169758_01_000963, 
> containerExpiryTime=1458603542166, idleTimeout=5000, taskRequestsCount=0, 
> heldContainers=109, delayedContainers=24, isNew=false
> 2016-03-21 16:39:02,305 [INFO] [DelayedContainerManager] 
> |rm.YarnTaskSchedulerService|: No taskRequests. Container's idle timeout 
> delay expired or is new. Releasing container, 
> containerId=container_e11_1437886552023_169758_01_000881, 
> containerExpiryTime=1458603542119, idleTimeout=5000, taskRequestsCount=0, 
> heldContainers=108, delayedContainers=23, isNew=false
> 
> 
> It will continue logging some number more ‘Releasing container’ messages, and 
> then soon stop all logging, and stop submitting tasks. I also do not see any 
> errors or exceptions in the container logs for the host identified in the 
> IOException.  Is there some other place I should look on that host to find an 
> indication of what’s going wrong?
> 
> Any thoughts on what’s going on here?  Is this a state from which an 
> application should be able to recover?  We do not see the application hang 
> when running on M/R.
> 
> One thing I tried to work around the hang was to enable speculation, on the 
> theory that some task failed to send some state change event to the AM, and 
> that speculat

Re: tez and beeline and hs2

2016-02-25 Thread Hitesh Shah

As Gopal mentioned, it is optional. The log message could probably be set to a 
WARN to not confuse users. 

— Hitesh 

On Feb 25, 2016, at 7:26 AM, Stephen Sprague  wrote:

> hey guys,
> still not getting jobs to the running state via tez->beeline->hs2.
> 
> lemme ask this first:  Is is mandatory that the Tez UI be up an running for 
> this to work?  
> 
> this looks to be a hard error (see below) but given the "local cli" works i 
> don't think its *mandatory*. But maybe via HS2 it is?
> 
> from yarn logs:
> 2016-02-25 07:07:56,744 [ERROR] [main] |web.WebUIService|: Tez UI History URL 
> is not set
> 
> Thanks,
> Stephen
> PS setting up the web service is just another step and complication i'm 
> hoping to avoid for just testing. :)
> 
> On Mon, Feb 22, 2016 at 10:43 AM, Bikas Saha  wrote:
> Hi Stephen,
> 
>  
> 
> Thanks for bearing with any delays on our side and keeping us updated. In the 
> end if we figure out that this is less about bugs in Tez/Hive and more about 
> lack of well documented best practices, then it would be useful to produce a 
> wiki page about this.
> 
>  
> 
> Thanks
> 
> Bikas
> 
>  
> 
> From: Stephen Sprague [mailto:sprag...@gmail.com] 
> Sent: Monday, February 22, 2016 6:59 AM
> To: user@tez.apache.org
> Subject: Re: tez and beeline and hs2
> 
>  
> 
> just an update.  i haven't given up!  i've just been pulled into other things 
> this weekend and am hoping to pick it up again this week.
> 
>  
> 
> On Fri, Feb 19, 2016 at 9:30 AM, Hitesh Shah  wrote:
> 
> Not exactly. I think the UI bits might be a red-herring. Bouncing YARN and 
> HS2 also should unlikely be needed unless you are modifying configs.
> 
> There is likely a bug ( the NPE being logged ) in the shutting down code for  
> the org.apache.tez.dag.history.logging.impl.SimpleHistoryLoggingService ( if 
> it was not started properly ) but the fact that it is shutting down means 
> that there is something else wrong ( which should have happened before the 
> shutdown sequence kicked in ). Feel free to file a bug with the logs attached 
> if you cannot attach them over the mailing list here.
> 
> thanks
> — Hitesh
> 
> 
> 
> On Feb 19, 2016, at 7:16 AM, Stephen Sprague  wrote:
> 
> > Hi Gopal,
> > nice. I should have known that command as you and Hitesh have given that 
> > advice in the past on various threads here! sorry for that.
> >
> > And sure enough the smoking gun reveals itself.  thank you.
> >
> > {quote}
> > 2016-02-18 18:40:19,672 [ERROR] [main] |web.WebUIService|: Tez UI History 
> > URL is not set
> > 2016-02-18 18:40:19,731 [WARN] 
> > [ServiceThread:org.apache.tez.dag.history.HistoryEventHandler] 
> > |service.AbstractService|: When stopping the service 
> > org.apache.tez.dag.history.logging.impl.SimpleHistoryLoggingService : 
> > java.lang.NullPointerException
> > java.lang.NullPointerException
> > {quote}
> >
> > So it looks like this Tez UI app is required for 'jdbc' mode.  Lemme 
> > research that puppy and perhaps bounce Yarn and HS2.
> >
> > Thanks again for shining the light!
> >
> > Cheeers,
> > Stephen.
> >
> > On Thu, Feb 18, 2016 at 10:09 PM, Gopal Vijayaraghavan  
> > wrote:
> >
> > >
> > >http://dwrdevnn1.sv2.trulia.com:8088/proxy/application_1455811467110_0307/
> > >
> > ><http://dwrdevnn1.sv2.trulia.com:8088/proxy/application_1455811467110_0307
> > >/>
> > ...
> > > So my question is whatya suppose is causing this?  I'm pretty darn sure
> > >the classpath is legit now.
> >
> > Two steps forward, one step back :)
> >
> > yarn logs -applicationId application_1455811467110_0307 for answers.
> >
> > Cheers,
> > Gopal
> >
> >
> >
> 
>  
> 
>

Re: tez and beeline and hs2

2016-02-19 Thread Hitesh Shah

Not exactly. I think the UI bits might be a red-herring. Bouncing YARN and HS2 
also should unlikely be needed unless you are modifying configs.

There is likely a bug ( the NPE being logged ) in the shutting down code for  
the org.apache.tez.dag.history.logging.impl.SimpleHistoryLoggingService ( if it 
was not started properly ) but the fact that it is shutting down means that 
there is something else wrong ( which should have happened before the shutdown 
sequence kicked in ). Feel free to file a bug with the logs attached if you 
cannot attach them over the mailing list here.

thanks
— Hitesh

On Feb 19, 2016, at 7:16 AM, Stephen Sprague  wrote:

> Hi Gopal,
> nice. I should have known that command as you and Hitesh have given that 
> advice in the past on various threads here! sorry for that.
> 
> And sure enough the smoking gun reveals itself.  thank you.
> 
> {quote}
> 2016-02-18 18:40:19,672 [ERROR] [main] |web.WebUIService|: Tez UI History URL 
> is not set
> 2016-02-18 18:40:19,731 [WARN] 
> [ServiceThread:org.apache.tez.dag.history.HistoryEventHandler] 
> |service.AbstractService|: When stopping the service 
> org.apache.tez.dag.history.logging.impl.SimpleHistoryLoggingService : 
> java.lang.NullPointerException
> java.lang.NullPointerException
> {quote}
> 
> So it looks like this Tez UI app is required for 'jdbc' mode.  Lemme research 
> that puppy and perhaps bounce Yarn and HS2.
> 
> Thanks again for shining the light!
> 
> Cheeers,
> Stephen.
> 
> On Thu, Feb 18, 2016 at 10:09 PM, Gopal Vijayaraghavan  
> wrote:
> 
> >
> >http://dwrdevnn1.sv2.trulia.com:8088/proxy/application_1455811467110_0307/
> >
> > >/>
> ...
> > So my question is whatya suppose is causing this?  I'm pretty darn sure
> >the classpath is legit now.
> 
> Two steps forward, one step back :)
> 
> yarn logs -applicationId application_1455811467110_0307 for answers.
> 
> Cheers,
> Gopal
> 
> 
>

Re: tez and beeline

2016-02-17 Thread Hitesh Shah

Hello Stephen, 

This question should ideally be posted to user@hive as this mainly relates to 
HS2 functionality and not really Tez.

That said, a couple of things to look at/try out: 

1) Unrelated point - "set mapreduce.framework.name=yarn-tez;” - this is not 
needed. What this setting does is tell the MR framework to run its jobs using 
the Tez engine. ( MR can be considered a 2-vertex DAG hence it can be 
translated into Tez and therefore run on Tez ).

2) Can you check the RM UI and see if there are any applications in the 
ACCEPTED state? This usually means that HS2 is trying to launch Tez 
Applications but cannot as there is not enough capacity in your cluster.
3) If you are monitoring the HS2 logs, it should likely print out the 
applicationId it is using to submit the query to. This can help you pinpoint 
the exact app in the YARN RM UI. If there are no apps in the accepted state, do 
you see any other errors in the HS2 logs? If the app is indeed running, can you 
check the app logs via the RM UI? 

thanks
— Hitesh

On Feb 17, 2016, at 10:00 AM, Stephen Sprague  wrote:

> Hey guys,
> 
> sorry to bother you again but i was trying to get tez working with beeline 
> now (it does work great with the local hive client i might add) so i'm sure 
> i'm missing something simple.
> 
> if i use the 'local' beeline client as such 'beeline -u jdbc:hive2://'  tez 
> works fine but that comes with start up costs just like the hive local client 
> does so that's what i'm trying to avoid by going through hiveserver2.
> 
> * so using this paradigm: 
> 
> beeline -u 'jdbc:hive2://dwrdevnn1.sv2.truila.com:10001/default;auth=noSasl 
> sprague nopwd org.apache.hive.jdbc.HiveDriver' < set hive.execution.engine=tez;
> set mapreduce.framework.name=yarn-tez;
> select count(*) from omniture.hit_data where date_key=20160210;
> SQL
> 
> * things just hang at this point.
> 
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> issuing: !connect 
> jdbc:hive2://dwrdevnn1.sv2.trulia.com:10001/default;auth=noSasl sprague nopwd 
> org.apache.hive.jdbc.HiveDriver '' ''
> Connecting to jdbc:hive2://dwrdevnn1.sv2.trulia.com:10001/default;auth=noSasl
> Connected to: Apache Hive (version 1.2.1)
> Driver: Hive JDBC (version 1.2.1)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Executing command:
>  set hive.execution.engine=tez;
>  set mapreduce.framework.name=yarn-tez;
>  select count(*) from omniture.hit_data where date_key=20160210;
> 
> Getting log thread is interrupted, since query is done!
> No rows affected (0.067 seconds)
> Getting log thread is interrupted, since query is done!
> No rows affected (0.003 seconds)
> 
> 
> Any recommendations that i need to make to get things to work via HS2?  
> (engine=mr works w/o issue)
> 
> Cheers,
> Stephen.

Re: jansi dependendency?

2016-02-15 Thread Hitesh Shah

One option may be to try using HADOOP_USER_CLASSPATH_FIRST with it set to true 
and adding the hive-exec.jar to the front of HADOOP_CLASSPATH. Using this ( and 
verifying by running “hadoop classpath”), you could try to get hive-exec.jar to 
the front of the classpath and see if that makes a difference to the class 
loading order. 

— Hitesh 

On Feb 15, 2016, at 4:43 PM, Stephen Sprague  wrote:

> Hi Jan,
> ahhh. I see.  my $HIVE_HOME/lib/hive-exec-1.2.1.jar does indeed have 
> TezJobMonitor in it. So this has really nothing to do with the server side 
> Tez jar files i built as part of the install instructions but rather the hive 
> client side code.
> 
> got it.  I'll dig away at that!
> 
> Thank you!
> 
> Cheers,
> Stephen.
> 
> On Mon, Feb 15, 2016 at 2:18 PM, Jan Morlock  
> wrote:
> Hi Stephen,
> 
> the code inside org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor intends
> to use org.fusesource.jansi.Ansi. The Project hive-exec (where
> TezJobMonitor is part of) therefore has a dependency on jline (where
> Ansi is a part of). However on your class path a jar file containing an
> older version of Ansi is found first. This version lacks the bold()
> method and therefore the NoSuchMethodError is thrown.
> 
> What you have to do is identifying the jar file containing the older
> version of Ansi. Afterwards you have to organize the class path in a way
> that the more up-to-date version is found first.
> 
> I hope this helps.
> Cheers
> Jan
> 
> 
> Am 15.02.2016 um 20:27 schrieb Stephen Sprague:
> > hey guys,
> > I'm looking to run Hive on Tez and have followed the instructions to a
> > tee - but i just can't seem to get around this Jansi error despite
> > everything i've tried.  Now given i'm not exactly a Java programmer what
> > may appear to you as something pretty trivial i'm at an impasse - but
> > not for lack of trying!
> >
> > would anyone here have any tips?
> >
> > thanks,
> > Stephen
> >
> > PS Here's my traceback and logging.
> >
> > $ hive
> > SLF4J: Class path contains multiple SLF4J bindings.
> > SLF4J: Found binding in
> > [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > SLF4J: Found binding in
> > [jar:file:/home/spragues/downloads/apache-tez-0.7.0-src/tez-dist/target/tez-0.7.0-minimal/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> > explanation.
> > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> > SLF4J: Class path contains multiple SLF4J bindings.
> > SLF4J: Found binding in
> > [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > SLF4J: Found binding in
> > [jar:file:/home/spragues/downloads/apache-tez-0.7.0-src/tez-dist/target/tez-0.7.0-minimal/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> > explanation.
> > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> >
> > Logging initialized using configuration in
> > jar:file:/usr/lib/hive-1.2.1-standalone/lib/hive-common-1.2.1.jar!/hive-log4j.properties
> > *hive> set hive.execution.engine=tez;*
> >
> > *hive> select count(*) from omniture.hit_data where date_key=20160210;*
> > Query ID = spragues_20160215111912_f4b6bc39-d29d-42bb-b0bc-262f8c99f58c
> > Total jobs = 1
> > Launching Job 1 out of 1
> >
> >
> > Status: Running (Executing on YARN cluster with App id
> > application_1453472707474_6031)
> >
> > 
> > java.lang.NoSuchMethodError:
> > org.fusesource.jansi.Ansi.bold()Lorg/fusesource/jansi/Ansi;
> > at
> > org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor.reprintLineWithColorAsBold(TezJobMonitor.java:205)
> > at
> > org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor.printStatusInPlace(TezJobMonitor.java:611)
> > at
> > org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor.monitorExecution(TezJobMonitor.java:320)
> > at
> > org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:168)
> > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
> > at
> > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
> > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653)
> > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412)
> > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
> > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
> > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
> > at
> > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
> > at
> > org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
> > at
> > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
> >

Re: Failing attemption at org.apache.tez.client.TezClient.waitTillReady

2016-02-12 Thread Hitesh Shah

 groovy-all-2.1.6.jar
> -rw-rw-r--  1 2189117 2015-04-30 03:08 guava-14.0.1.jar
> -rw-rw-r--  1   76643 2014-01-30 07:07 hamcrest-core-1.1.jar
> -rw-rw-r--  1  121403 2015-06-19 18:05 hive-accumulo-handler-1.2.1.jar
> -rw-rw-r--  1   47713 2015-06-19 18:04 hive-ant-1.2.1.jar
> -rw-rw-r--  1  138361 2015-06-19 18:05 hive-beeline-1.2.1.jar
> -rw-rw-r--  1   39019 2015-06-19 18:05 hive-cli-1.2.1.jar
> -rw-rw-r--  1  292290 2015-06-19 18:03 hive-common-1.2.1.jar
> -rw-rw-r--  1  121668 2015-06-19 18:05 hive-contrib-1.2.1.jar
> -rw-rw-r--  120599030 2015-06-19 18:04 hive-exec-1.2.1.jar
> -rw-rw-r--  1  115935 2015-06-19 18:05 hive-hbase-handler-1.2.1.jar
> -rw-rw-r--  1   28091 2015-06-19 18:06 hive-hwi-1.2.1.jar
> -rw-rw-r--  117360142 2015-06-19 18:05 hive-jdbc-1.2.1-standalone.jar
> -rw-rw-r--  1  100580 2015-06-19 18:05 hive-jdbc-1.2.1.jar
> -rw-rw-r--  1 5505100 2015-06-19 18:04 hive-metastore-1.2.1.jar
> -rw-rw-r--  1  916706 2015-06-19 18:03 hive-serde-1.2.1.jar
> -rw-rw-r--  1 1878543 2015-06-19 18:04 hive-service-1.2.1.jar
> -rw-rw-r--  1   32390 2015-06-19 18:03 hive-shims-0.20S-1.2.1.jar
> -rw-rw-r--  1   60070 2015-06-19 18:03 hive-shims-0.23-1.2.1.jar
> -rw-rw-r--  18949 2015-06-19 18:03 hive-shims-1.2.1.jar
> -rw-rw-r--  1  108914 2015-06-19 18:03 hive-shims-common-1.2.1.jar
> -rw-rw-r--  1   13065 2015-06-19 18:03 hive-shims-scheduler-1.2.1.jar
> -rw-rw-r--  1   14530 2015-06-19 18:06 hive-testutils-1.2.1.jar
> -rw-rw-r--  1  719304 2015-04-30 03:08 httpclient-4.4.jar
> -rw-rw-r--  1  321639 2015-04-30 03:08 httpcore-4.4.jar
> -rw-rw-r--  1 1282424 2015-04-30 03:25 ivy-2.4.0.jar
> -rw-rw-r--  1  611863 2015-04-30 03:25 janino-2.7.6.jar
> -rw-rw-r--  1   60527 2015-04-30 03:26 jcommander-1.32.jar
> -rw-rw-r--  1  201124 2014-01-30 07:09 jdo-api-3.0.1.jar
> -rw-rw-r--  1 1681148 2014-05-13 09:25 jetty-all-7.6.0.v20120127.jar
> -rw-rw-r--  1 1683027 2014-01-30 11:30 
> jetty-all-server-7.6.0.v20120127.jar
> -rw-rw-r--  1  213854 2015-04-30 03:08 jline-2.12.jar
> -rw-rw-r--  1  588001 2015-04-30 03:23 joda-time-2.5.jar
> -rw-rw-r--  1   12131 2014-05-13 09:25 jpam-1.1.jar
> -rw-rw-r--  1   45944 2014-01-30 07:10 json-20090211.jar
> -rw-rw-r--  1   33031 2015-04-30 03:23 jsr305-3.0.0.jar
> -rw-rw-r--  1   15071 2014-01-30 07:09 jta-1.1.jar
> -rw-rw-r--  1  245039 2015-04-30 03:08 junit-4.11.jar
> -rw-rw-r--  1  313686 2015-04-30 03:23 libfb303-0.9.2.jar
> -rw-rw-r--  1  227712 2015-04-30 03:08 libthrift-0.9.2.jar
> -rw-rw-r--  1  481535 2014-01-30 07:06 log4j-1.2.16.jar
> -rw-rw-r--  1  447676 2014-01-30 11:30 mail-1.4.1.jar
> -rw-rw-r--  1   94421 2015-04-30 03:26 maven-scm-api-1.4.jar
> -rw-rw-r--  1   40066 2015-04-30 03:26 
> maven-scm-provider-svn-commons-1.4.jar
> -rw-rw-r--  1   69858 2015-04-30 03:26 maven-scm-provider-svnexe-1.4.jar
> -rw-rw-r--  1 1208356 2015-04-30 03:08 netty-3.7.0.Final.jar
> -rw-rw-r--  1   19827 2015-04-30 03:23 opencsv-2.3.jar
> -rw-rw-r--  1   65261 2014-01-30 07:07 oro-2.0.8.jar
> -rw-rw-r--  1   29555 2014-01-30 07:08 paranamer-2.3.jar
> -rw-rw-r--  1 2796935 2015-04-30 03:23 parquet-hadoop-bundle-1.6.0.jar
> -rw-rw-r--  1   48557 2015-04-30 03:25 
> pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar
> drwxrwxr-x  64096 2015-09-10 20:14 php
> -rw-rw-r--  1  250546 2014-01-30 11:29 plexus-utils-1.5.6.jar
> drwxrwxr-x 104096 2015-09-10 20:14 py
> -rw-rw-r--  1   25429 2015-04-30 03:26 regexp-1.3.jar
> -rw-rw-r--  1  105112 2014-01-30 07:08 servlet-api-2.5.jar
> -rw-rw-r--  1 1251514 2014-01-30 07:08 snappy-java-1.0.5.jar
> -rw-r--r--  1   162976273 2015-09-10 20:16 
> spark-assembly-1.4.1-hadoop2.6.0.jar
> -rw-rw-r--  1   26514 2014-01-30 07:08 stax-api-1.0.1.jar
> -rw-rw-r--  1  148627 2014-01-30 07:09 stringtemplate-3.2.1.jar
> -rw-rw-r--  1   93210 2015-04-30 03:26 super-csv-2.2.0.jar
> -rw-rw-r--  1   55953 2014-01-30 11:30 tempus-fugit-1.1.jar
> -rw-rw-r--  1  392124 2014-01-30 07:07 velocity-1.5.jar
> -rw-rw-r--  1   94672 2014-01-30 07:08 xz-1.0.jar
> -rw-rw-r--  1  792964 2015-04-30 03:08 zookeeper-3.4.6.jar
> 
> * and hadoop-common-2.6.0.jar have addDeprecations(DeprecationDelta[]) method.
> 
> 
> 
> 
> 2016-02-12 17:29 GMT+09:00 Hitesh Shah :
> It seems to me that the Tez AM classpath somehow has a hadoop-common jar that 
> does not have the Configuration.addDeprecations() api that YARN needs.
> 
> For the Tez AM, the classpath is fully constructed based on the tez tarball ( 
> from HDFS using distributed cache ) and additional jars that Hive ad

Re: Failing attemption at org.apache.tez.client.TezClient.waitTillReady

2016-02-12 Thread Hitesh Shah

It seems to me that the Tez AM classpath somehow has a hadoop-common jar that 
does not have the Configuration.addDeprecations() api that YARN needs.

For the Tez AM, the classpath is fully constructed based on the tez tarball ( 
from HDFS using distributed cache ) and additional jars that Hive adds ( 
hive-exec.jar, etc ). It does not use HADOOP_CLASSPATH or anything else from 
the cluster nodes. HADOOP_CLASSPATH is only used on the client node where the 
hive shell runs. Can you confirm that hive was also compiled against 
hadoop-2.6.0 as that might be pulling in a different version of hadoop-common? 

thanks
— Hitesh

On Feb 12, 2016, at 12:16 AM, no jihun  wrote:

> Thanks Hitesh Shah.
> 
> It claims 
> 
> 2016-02-12 14:59:07,388 [ERROR] [main] |app.DAGAppMaster|: Error starting 
> DAGAppMaster
> 
> java.lang.NoSuchMethodError: 
> org.apache.hadoop.conf.Configuration.addDeprecations([Lorg/apache/hadoop/conf/Configuration$DeprecationDelta;)V
> at 
> org.apache.hadoop.yarn.conf.YarnConfiguration.addDeprecatedKeys(YarnConfiguration.java:79)
> at 
> org.apache.hadoop.yarn.conf.YarnConfiguration.(YarnConfiguration.java:73)
> at org.apache.tez.dag.app.DAGAppMaster.main(DAGAppMaster.java:2271)
> 
> I am not sure but according to this 
> thread(http://grokbase.com/t/cloudera/cdh-user/12765svj61/libjars-and-hadoop-jar-command)
> this perhaps caused by $HADOOP_CLASSPATH problem.
> 
> But I wander should I copy "tez-dist/target/tez-0.8.2" to all cluster then 
> export below?
> export TEZ_JARS=/home1/apps/tez-0.8.2
> export TEZ_CONF_DIR=$TEZ_JARS/conf
> export 
> HADOOP_CLASSPATH=$TEZ_CONF_DIR:$TEZ_JARS/*:$TEZ_JARS/lib/*:$HADOOP_CLASSPATH
> 
> I did this only on name nodes.
> 
> 
> 2016-02-12 16:47 GMT+09:00 Hitesh Shah :
> Run the following command: “bin/yarn logs -applicationId 
> application_1452243782005_0292” . This should give you the logs for 
> container_1452243782005_0292_02_01 which may shed more light on why the 
> Tez ApplicationMaster is failing to launch when triggered via Hive.
> 
> thanks
> — Hitesh
> 
> 
> 
> On Feb 11, 2016, at 10:48 PM, no jihun  wrote:
> 
> > Hi all.
> >
> > When I execute a query on hive I got an error below.(so do in hive cli)
> > no more detailed log found.
> >
> > what should I check?
> > any advice will be appreciated.
> >
> > versions
> > - tez-0.8.2
> > - hadoop 2.6.0
> >
> > ---
> >
> > hive > set hive.execution.engine=tez;
> > hive > select count(*) from contents;
> >
> > WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please 
> > use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties 
> > files.
> >
> > Logging initialized using configuration in 
> > file:/home1/eco/hive/conf/hive-log4j.properties
> > hive> set hive.execution.engine=tez;
> > hive> select count(*) from agg_band_contents;
> > Query ID = irteam_20160212145903_9300f3b2-3942-4423-8586-73d2eaff9e58
> > Total jobs = 1
> > Launching Job 1 out of 1
> > Exception in thread "Thread-10" java.lang.RuntimeException: 
> > org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. 
> > Application application_1452243782005_0292 failed 2 times due to AM 
> > Container for appattempt_1452243782005_0292_02 exited with  exitCode: 1
> > For more detailed output, check application tracking 
> > page:http://xstathn003:8088/proxy/application_1452243782005_0292/Then, 
> > click on links to logs of each attempt.
> > Diagnostics: Exception from container-launch.
> > Container id: container_1452243782005_0292_02_01
> > Exit code: 1
> > Stack trace: ExitCodeException exitCode=1:
> > at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
> > at org.apache.hadoop.util.Shell.run(Shell.java:455)
> > at 
> > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
> > at 
> > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
> > at 
> > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
> > at 
> > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> > at 
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at 
> > java.util.concurrent.ThreadPoolExecutor$Wo

Re: Failing attemption at org.apache.tez.client.TezClient.waitTillReady

2016-02-11 Thread Hitesh Shah

Run the following command: “bin/yarn logs -applicationId 
application_1452243782005_0292” . This should give you the logs for 
container_1452243782005_0292_02_01 which may shed more light on why the Tez 
ApplicationMaster is failing to launch when triggered via Hive. 

thanks
— Hitesh



On Feb 11, 2016, at 10:48 PM, no jihun  wrote:

> Hi all.
> 
> When I execute a query on hive I got an error below.(so do in hive cli)
> no more detailed log found.
> 
> what should I check?
> any advice will be appreciated.
> 
> versions
> - tez-0.8.2
> - hadoop 2.6.0
> 
> ---
> 
> hive > set hive.execution.engine=tez;
> hive > select count(*) from contents;
> 
> WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use 
> org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
> 
> Logging initialized using configuration in 
> file:/home1/eco/hive/conf/hive-log4j.properties
> hive> set hive.execution.engine=tez;
> hive> select count(*) from agg_band_contents;
> Query ID = irteam_20160212145903_9300f3b2-3942-4423-8586-73d2eaff9e58
> Total jobs = 1
> Launching Job 1 out of 1
> Exception in thread "Thread-10" java.lang.RuntimeException: 
> org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. 
> Application application_1452243782005_0292 failed 2 times due to AM Container 
> for appattempt_1452243782005_0292_02 exited with  exitCode: 1
> For more detailed output, check application tracking 
> page:http://xstathn003:8088/proxy/application_1452243782005_0292/Then, click 
> on links to logs of each attempt.
> Diagnostics: Exception from container-launch.
> Container id: container_1452243782005_0292_02_01
> Exit code: 1
> Stack trace: ExitCodeException exitCode=1: 
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
> at org.apache.hadoop.util.Shell.run(Shell.java:455)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 
> 
> Container exited with a non-zero exit code 1
> Failing this attempt. Failing the application.
> at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:535)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:74)
> Caused by: org.apache.tez.dag.api.SessionNotRunning: TezSession has already 
> shutdown. Application application_1452243782005_0292 failed 2 times due to AM 
> Container for appattempt_1452243782005_0292_02 exited with  exitCode: 1
> For more detailed output, check application tracking 
> page:http://xstathn003:8088/proxy/application_1452243782005_0292/Then, click 
> on links to logs of each attempt.
> Diagnostics: Exception from container-launch.
> Container id: container_1452243782005_0292_02_01
> Exit code: 1
> Stack trace: ExitCodeException exitCode=1: 
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
> at org.apache.hadoop.util.Shell.run(Shell.java:455)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 
> 
> Container exited with a non-zero exit code 1
> Failing this attempt. Failing the application.
> at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:784)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:205)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:116)
> at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:532)
> ... 1 more
> Interrupting... Be pati

Re: Question regarding instability of EdgeProperty DataSourceType

2016-01-31 Thread Hitesh Shah

There are 3 types defined as you have noticed:

persisted_reliable: assumes a vertex output is stored in a reliable store like 
HDFS. This states that if the node on which the task ran disappears, the output 
is still available. 
persisted: vertex output stored on local disk where the task ran. 
ephemeral: vertex output stored in task memory. 

From a data transmission point of view, all data is always transmitted over 
network unless there is a case where the downstream task is running on same 
machine as the task that generated the output. In that case, it can read from 
local disk if needed.

You are right that the in-memory support is not built out so a co-located task 
potentially reading from another task’s memory is therefore not supported 
today. The network channel requires a bit more explanation. Generally, all data 
is persisted to disk. This means that data transferred over the network is 
first written locally and then eventually pulled/pushed to a downstream task as 
needed. This does not mean that all the data needs to be generated first before 
being sent downstream. Data can be still generated in “chunks” and then sent 
downstream as when as a chunk becomes available. ( this functionality is 
internally called “pipelined shuffle” if you end up searching through the 
code/jiras ). However, again to clarify, there is no pure streaming support yet 
where data is kept in memory and pushed downstream. 

To add, the in-memory approach requires changing Tez to have a different fault 
tolerance model to be applied and it also needs more cluster capacity to ensure 
that both upstream and downstream tasks can run concurrently. Do you see this 
as a requirement for something that you are planning to use Tez for? 

thanks
— Hitesh

On Jan 31, 2016, at 12:30 AM, Gal Vinograd  wrote:

> Hey,
> 
> I read though tez-api code and noticed that PERSISTED is the only stable data 
> source type. Does that mean that data isn't transmitted between vertices 
> though in-memory or network channels?
> 
> Thanks :)

Re: Classpath Composition

2016-01-28 Thread Hitesh Shah

Assuming you have the guava jar available on all nodes, you can set 
“tez.cluster.additional.classpath.prefix” to point to it and this classpath 
value will be prepended to the classpath of the tez runtime layers. However, 
please note that this is not a guarantee to work if the guava jar from your own 
code ends up causing the Tez framework to not find the guava APIs it uses. 

Please feel free to add your comments on the approach proposed at 
https://issues.apache.org/jira/browse/TEZ-2164 - this is mainly to hide Tez’s 
use of Guava but does not really fix the problems caused by Hadoop, etc 
requiring an older guava jar. Also, some of the guava APIs used by Tez are not 
available in guava-18 hence some changes were needed to Tez too. Hadoop in 
recent times has ported over some of the code from guava into its own code to 
avoid the compat issues caused by users using a newer version of guava ( older 
versions did not work seamlessly ). This may also be a point to consider 
depending on what patches CDH has bundled into its hadoop jars.

Also to clarify, the guava jar coming into the Tez runtime will be a result of 
the guava jar bundled into the tez tarball ( configured as part of tez.lib.uris 
). The tez runtime classpath is in the form of “:$PWD/*:$PWD/tezlib/*” where tezlib is the dir into which the tez 
tarball is uncompressed. 

thanks
— Hitesh

On Jan 28, 2016, at 9:01 AM, Jan Morlock  wrote:

> Using Apache Tez in combination with Hive or Pig we *sometimes* encounter 
> error messages similar to the following ones:
> 
> Caused by: java.lang.NoSuchMethodError: 
> com.google.common.base.Splitter.splitToList(Ljava/lang/CharSequence;)
> Caused by: java.lang.NoSuchMethodError: 
> com.google.common.base.Stopwatch.elapsedTime(Ljava/util/concurrent/TimeUnit;)
> 
> The origin of this problem is that multiple Google Guava jars with different 
> versions exist on the classpath.
> The most ancient version (11.0.2) comes with several Hadoop components (see 
> also https://issues.apache.org/jira/browse/HADOOP-10101). Our own code uses a 
> more up-to-date version.
> 
> When using Apache Tez, the composition of the classpath seems to be random. 
> Therefore sometimes an old Guava version lacking modern methods is found 
> first on the classpath leading to the exception listed above.
> 
> My question is the following one: are there any techniques or Tez options 
> steering the classpath composition, perhaps similar to 
> mapreduce.job.user.classpath.first?
> 
> We are using CDH 5.4.5.
> 
> Thank you very much in advance.

Re: What's the application scenario of Apache TEZ

2016-01-20 Thread Hitesh Shah

Couple of other points to add to Bikas’s email: 

Regarding your question on small data: No - Tez is geared to work in both small 
data and extremely large data cases. Hive should likely perform better with Tez 
regardless of data size unless there is a bad query plan created that is 
non-optimal for Tez.

For 3). Hive/Pig/Cascading when used with MR would deconstruct a single hive 
query/pig script into multiple MR jobs. This would end up reading/writing 
from/to HDFS multiple times. Furthermore, with MR, you are stuck to fitting all 
your code into a Mapper and Reducer ( each with only a single input and output 
) and using Shuffle for data transfer. This introduces additional 
inefficiencies. With Tez, a single hive query can be converted into a single 
DAG. Vertices can run any kind of logic and the edges between vertices are not 
restricted to “shuffle-like” data transfer which allows more optimizations at 
the query planning stages. The fact that Tez allows Hive/Pig to use smarter 
ways of processing queries/scripts is what is usually the biggest win in terms 
of performance. Spark is similarly better than MR as it provides a richer 
operator library in some sense. As for comparing Spark vs Tez, to some extent, 
it is likely comparing apples to oranges as Tez is quite a low-level library. 
Depending on how an application is written to make use of Tez vs Spark, you 
will find different cases where one is faster than the other. 

— Hitesh

On Jan 20, 2016, at 8:44 AM, LLBian  wrote:

> 
> Hello,Tez experts:
>   I have known that, tez is used in DAG cases.
>Because it can control the intermediate results do not write to disk, 
> and container reuse, so it is more effective in processing small amount of 
> data than mr. So, mybe I will think that hive on tez is better than hive on 
> mr in processing small amount of data, am I right?
>  Well, now, my questions are:
> (1)Even though there are main design themes in https://tez.apache.org/ ， I am 
> still not very clear about its application scenarios，and If there are some 
> real and main enterprise applications，so much the better.
> (2)I am still not very clear what question It is mainly used to solving？ 
> (3) Why it is use for hive and pig? how is it better than spark or mr？
> (4)I looked at your official PPT and paper “Apache Tez: A Unifying Framework 
> for Modeling and Building Data Processing Applications" , but still not very 
> clearly. 
> How to understand this :"Don’t solve problems that have already been solved. 
> Or else you will have to solve them again!"? Is there any real example?
> 
>  Apache tez is a great product ， I hope to learn more about it.
> 
> Any reply are very appreciated.
> 
> Thankyou & Best Regards.
> 
> ---LLBian
> 
>

Re: when split data in AM lead to "path not exsit or is not a directory" Execption

2016-01-15 Thread Hitesh Shah

l jars.  I can not understand why the yarn container didn't 
> download "conf/hbasetable"，why？ 
> (2)As mentioned earlier，I also packaged this "conf/hbasetable" to conf.jar, 
> and it was downloaded to the AM container path, why it can not be  parsed or 
> decompressed ?
> 
>   Is there any configuration options can do this?
> 
> best wishes & Thankyou.
> --LLBian
> 
> 
> At 2016-01-14 11:18:55, "Hitesh Shah"  wrote:
>> Hello 
>> 
>> You are right that when hive.compute.splits.in.am is true, the splits are 
>> computed in the cluster in the Tez AM container. 
>> 
>> Now, there are a bunch of options to consider but the general gist is that 
>> if you are familiar with MapReduce Distributed Cache or YARN local 
>> resources, you need to add the files that your custom input format needs to 
>> Tez’s version of the distributed cache. The simplest approach for you may be 
>> to just use “add jar” from Hive which will automatically add these files to 
>> the distributed cache ( this will copy them from local filesystem into HDFS 
>> and also make them available in the Tez AM container ). The other option is 
>> upload all the necessary files to HDFS, say “/tmp/additionalfiles/“ and then 
>> specify “hdfs://nnhost:nnport/tmp/additionalfiles/“ for property 
>> “tez.aux.uris” in tez-site.xml.  This will add all contents of this HDFS dir 
>> as part of the distributed cache. Please note that Tez does not do recursive 
>> searches in the dir but it supports a comma-separate list of files/dirs for 
>> tez.aux.uris 
>> 
>> Next, to debug this, you can do the following:
>>  - set "yarn.nodemanager.delete.debug-delay-sec” in yarn-site.xml to a value 
>> like 1200 to help debugging. This will require NodeManager restarts.
>>  - next, run your query.
>>  - Find the application on the YARN ResourceManager UI. This app page will 
>> also tell you which node the AM is running on or ran on. 
>>  - Go to this node and search for launch_container.sh for the container in 
>> question ( these files will be found in one of the dirs configured for 
>> yarn.local-dirs based on your yarn-site.xml )
>>  - Looking inside launch_container.sh, look for $CWD and see the contents of 
>> the dir pointed to by $CWD. This will give you an idea of the localized 
>> files ( from distributed cache ).
>> 
>> If you have more questions, can you first clarify what information/files are 
>> needed for your plugin to run? 
>> 
>> thanks
>> — Hitesh
>> 
>> 
>> 
>> On Jan 13, 2016, at 7:01 PM, LLBian  wrote:
>> 
>>> And,also the log is in yarn container. 
>>> I try to solve this problem by packaging nbconf/ to a jar file under 
>>> $HIVE_HOME,then put it under $HIVE_HOME/nblib, it was uploaded to  
>>> /tmp/hadoop_yarn/root/_tez_session_dir/,but did not work.
>>> 
>>> Best regards.
>>> LLBian
>>> 
>>> 01-14-2016
>>> 
>>> At 2016-01-14 10:47:18, "LLBian"  wrote:
>>>> Hi,all
>>>>   I'm a green hand in using apache tez. Recently,I met some of the 
>>>> difficulty:
>>>>   our team has developed a plug-in on hive. It is similar to the function 
>>>> of HBaseHandler,but customized code. Now my task is to ensure it can be 
>>>> compatible with tez.  while that is the background.My question is:
>>>> （1）I have a directory named nbconf,it is created under $HIVE_HOME, under 
>>>> it,there is a sub-directory named conf/hbasetable. 
>>>> （2）I also have a directory named nblib,it is  created under 
>>>> $HIVE_HOME,used for Tez JARs.
>>>> （3）when I set  hive.compute.splits.in.am=true,it throws Exception in hive 
>>>> log:
>>>>  ……
>>>> [map1]java.lang.ExceptionInInitializerError:
>>>> ……
>>>> ……
>>>> Caused by java.lang.RuntimeException:[conf/hbasetable/] path not exsit or 
>>>> is not a directory
>>>> ……
>>>> 
>>>> But actually it exists！It is under local $HIVE_HOME/nbconf. When I set  
>>>> hive.compute.splits.in.am=FALSE,it works well. So, I guess,maybe because 
>>>> computing splits in Cluster AM,not in localdisk. Mybe I should load some 
>>>> files or directory(eg.conf/hbasetable)HDFS,If tez wish to do so,where 
>>>> should I put them？:
>>>> the tez session dirctory?
>>>> /tmp/hadoop_yarn/root/_tez_session_dir/? 
>>>> /tmp/hadoop_yarn/root/_tez_session_dir/.tez/?
>>>> /tmp/hadoop_yarn/root/_tez_session_dir/.tez/AppId/?
>>>> I tryed these,but they all didn't work.
>>>> 
>>>> becase it is OK when debugging, so I don`t know how to take up the matter. 
>>>>  I don't know where to put this customed directory "[conf/hbasetable]" on 
>>>> HDFS.
>>>> 
>>>> I am eager to get your guidance. Any help is greatly appreciated .
>>>> (Please forgive my poor English)
>>>> 
>>>> LLBian
>>> 
>>> 
>>> 
>>> 
>>

Re: when split data in AM lead to "path not exsit or is not a directory" Execption

2016-01-13 Thread Hitesh Shah

Hello 

You are right that when hive.compute.splits.in.am is true, the splits are 
computed in the cluster in the Tez AM container. 

Now, there are a bunch of options to consider but the general gist is that if 
you are familiar with MapReduce Distributed Cache or YARN local resources, you 
need to add the files that your custom input format needs to Tez’s version of 
the distributed cache. The simplest approach for you may be to just use “add 
jar” from Hive which will automatically add these files to the distributed 
cache ( this will copy them from local filesystem into HDFS and also make them 
available in the Tez AM container ). The other option is upload all the 
necessary files to HDFS, say “/tmp/additionalfiles/“ and then specify 
“hdfs://nnhost:nnport/tmp/additionalfiles/“ for property “tez.aux.uris” in 
tez-site.xml.  This will add all contents of this HDFS dir as part of the 
distributed cache. Please note that Tez does not do recursive searches in the 
dir but it supports a comma-separate list of files/dirs for tez.aux.uris 

Next, to debug this, you can do the following:
   - set "yarn.nodemanager.delete.debug-delay-sec” in yarn-site.xml to a value 
like 1200 to help debugging. This will require NodeManager restarts.
   - next, run your query.
   - Find the application on the YARN ResourceManager UI. This app page will 
also tell you which node the AM is running on or ran on. 
   - Go to this node and search for launch_container.sh for the container in 
question ( these files will be found in one of the dirs configured for 
yarn.local-dirs based on your yarn-site.xml )
   - Looking inside launch_container.sh, look for $CWD and see the contents of 
the dir pointed to by $CWD. This will give you an idea of the localized files ( 
from distributed cache ).

If you have more questions, can you first clarify what information/files are 
needed for your plugin to run? 

thanks
— Hitesh
 
 

On Jan 13, 2016, at 7:01 PM, LLBian  wrote:

> And,also the log is in yarn container. 
> I try to solve this problem by packaging nbconf/ to a jar file under 
> $HIVE_HOME,then put it under $HIVE_HOME/nblib, it was uploaded to  
> /tmp/hadoop_yarn/root/_tez_session_dir/,but did not work.
> 
> Best regards.
>  LLBian
> 
> 01-14-2016
> 
> At 2016-01-14 10:47:18, "LLBian"  wrote:
> >Hi,all
> >I'm a green hand in using apache tez. Recently,I met some of the 
> > difficulty:
> >our team has developed a plug-in on hive. It is similar to the function 
> > of HBaseHandler,but customized code. Now my task is to ensure it can be 
> > compatible with tez.  while that is the background.My question is:
> >（1）I have a directory named nbconf,it is created under $HIVE_HOME, under 
> >it,there is a sub-directory named conf/hbasetable. 
> >（2）I also have a directory named nblib,it is  created under $HIVE_HOME,used 
> >for Tez JARs.
> >（3）when I set  hive.compute.splits.in.am=true,it throws Exception in hive 
> >log:
> >   ……
> >[map1]java.lang.ExceptionInInitializerError:
> >……
> >……
> >Caused by java.lang.RuntimeException:[conf/hbasetable/] path not exsit or is 
> >not a directory
> >……
> >
> >But actually it exists！It is under local $HIVE_HOME/nbconf. When I set  
> >hive.compute.splits.in.am=FALSE,it works well. So, I guess,maybe because 
> >computing splits in Cluster AM,not in localdisk. Mybe I should load some 
> >files or directory(eg.conf/hbasetable)HDFS,If tez wish to do so,where should 
> >I put them？:
> >the tez session dirctory?
> > /tmp/hadoop_yarn/root/_tez_session_dir/? 
> > /tmp/hadoop_yarn/root/_tez_session_dir/.tez/?
> > /tmp/hadoop_yarn/root/_tez_session_dir/.tez/AppId/?
> >I tryed these,but they all didn't work.
> >
> >becase it is OK when debugging, so I don`t know how to take up the matter.  
> >I don't know where to put this customed directory "[conf/hbasetable]" on 
> >HDFS.
> >
> >I am eager to get your guidance. Any help is greatly appreciated .
> >(Please forgive my poor English)
> >
> >LLBian
> 
> 
> 
>

Re: Cross platform job submission

2016-01-12 Thread Hitesh Shah

This is probably something that was missed for Tez. Would you mind filing a bug 
for this? The fix is to always use {{VAR}} ( instead of say $VAR ) which is 
then automatically changed by YARN to $VAR or %VAR% based on the env where the 
container is being launched. 

— Hitesh 

On Jan 12, 2016, at 7:26 AM, Johannes Zillmann  wrote:

> Hey there,
> 
> running into the situation where the Tez client is located on a windows 
> machine and the Hadoop cluster is located on Linux machines.
> Job submission fails for both, Tez and MapReduce with such a exception:
> 
> java.lang.RuntimeException: Job job_1448280010118_1679 failed! Failure info: 
> Application application_1448280010118_1679 failed 2 times due to AM Container 
> for appattempt_1448280010118_1679_02 exited with  exitCode: 1
> For more detailed output, check application tracking 
> page:http://hmaster.datameer.lan:8088/proxy/application_1448280010118_1679/Then,
>  click on links to logs of each attempt.
> Diagnostics: Exception from container-launch.
> Container id: container_e13_1448280010118_1679_02_01
> Exit code: 1
> Exception message: /bin/bash: line 0: fg: no job control
> 
> Stack trace: ExitCodeException exitCode=1: /bin/bash: line 0: fg: no job 
> control
> 
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
>   at org.apache.hadoop.util.Shell.run(Shell.java:455)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> 
> 
> For MapReduce i was able to fix the problem by setting the property 
> mapreduce.app-submission.cross-platform=true (see 
> https://issues.apache.org/jira/browse/MAPREDUCE-4052)
> Is there anything similar for Tez ?
> 
> Johannes
>

Re: Writing intermediate data

2015-12-08 Thread Hitesh Shah

To clarify, by information handshake, I meant how to tell the downstream vertex 
tasks where the generating task wrote data to and also when to start reading 
data. If this can be somehow be pre-defined at the plan build time, sure, you 
probably don’t need a lot of info to be sent downstream as it can be driven via 
some config + rules ( i.e. configured base path + appId + vertexId/Name + 
taskId ). However, there are some gotchas involved in terms of when the 
downstream vertex can start reading data. If the downstream task starts reading 
data before the upstream task has finished completely, this should be fine as 
long as the upstream task does not fail and a new attempt is not launched. If a 
new attempt has to be launched, the downstream task would need to revert all 
processing from earlier data and replay the new attempt’s data for correctness. 
A simple answer for this can be a signal trigger say by touching a file saying 
task1 is done which informs downstream tasks that task1 data is ready to be 
read. And obviously ( similar to shuffle data ), if the data location is more 
dynamic or more heavily protected ( say using dynamically generated 
secrets/tokens ), then additional information needs to be sent downstream. 

thanks
— Hitesh


On Dec 8, 2015, at 5:50 PM, Raajay  wrote:

> Thanks for the valuable inputs.
> 
> A quick clarification : 
> 
> " - Tez uses DataMovementEvents to inform the downstream vertex on where to 
> pull data from. This information handshake is part of the Input/Output pair 
> implementation."
> 
> If the edges had type PERSISTED_RELIABLE, the information handshake is 
> probably not needed. Is that right ? 
> 
> - Raajay
> 
> On Tue, Dec 8, 2015 at 6:17 PM, Hitesh Shah  wrote:
> The other way to look at this problem is that for a given edge between 2 
> vertices, the data format and transfer mechanism is governed by the Output of 
> the upstream vertex and the Input of the downstream vertex. You can 
> potentially write your own Input and Output pair that work with HDFS or 
> tachyon for intermediate data but there are a few things to be aware of:
>- Depending on the operator, the data is expected to be potentially 
> partitioned and/or sorted. This will drive how you store and read data
>- Tez uses DataMovementEvents to inform the downstream vertex on where to 
> pull data from. This information handshake is part of the Input/Output pair 
> implementation.
>- The failure systems also change depending on the kind of storage. Today, 
> most edges uses type PERSISTED. This means that the data can survive the 
> container going away but not if the node/machine disappears. If using HDFS, 
> that would become type PERSISTED_RELIABLE. This means that the data is always 
> reliably available ( even if the node on which the data was generated 
> disappears ). I don’t believe this is handled yet so that would be a new 
> enhancement to Tez to fix the failure semantics for such an edge.
> 
> If you are using Hive, this would mean making changes to Hive too to change 
> the DAG plan as needed.
> 
> thanks
> — Hitesh
> 
> 
> On Dec 8, 2015, at 3:54 PM, Siddharth Seth  wrote:
> 
> > Using hdfs (or a filesystem other than local) is not supported yet. tmpfs 
> > would be your best bet in that case - we have tested with this before, but 
> > this has capacity limitations, and mixing tmpfs with regular disks does not 
> > provide a deterministic mechanism of selecting memory as the intermediate 
> > storage.
> > Not sure if tachyon has an nfs interface to access it - otherwise that 
> > could have been an option.
> >
> > We have made simple changes in the past to use HDFS for shuffle - primarily 
> > as experiments. None of that is available as patches, but IIRC - the 
> > changes were not very complicated. This would involve changing the fetcher 
> > to skip HTTP and use a pre-determined path on a specified filesystem to 
> > fetch data. Also, the producer to write out to a specific path on a 
> > non-local FileSystem.
> >
> > On Mon, Dec 7, 2015 at 11:57 AM, Raajay  wrote:
> > I wish to setup a Tez data analysis framework, where the data resides in 
> > memory. Currently, I have tez (and also Hive) setup such that it can read 
> > from an in-memory filesystem like Tachyon.
> >
> > However, the intermediate data is still written to disk at the each 
> > processing node. I considered writing to tmpfs, however, such a setup does 
> > not fall back to disk gracefully.
> >
> > Does Tez have an interface to write intermediate data to HDFS like 
> > filesystem ? If yes, what are the settings ?
> >
> > Does setting "yarn.nodemanager.local-dirs" to some HDFS or Tachyon URI 
> > suffice ?
> >
> > Thanks,
> > Raajay
> >
> 
>

Re: Writing intermediate data

2015-12-08 Thread Hitesh Shah

The other way to look at this problem is that for a given edge between 2 
vertices, the data format and transfer mechanism is governed by the Output of 
the upstream vertex and the Input of the downstream vertex. You can potentially 
write your own Input and Output pair that work with HDFS or tachyon for 
intermediate data but there are a few things to be aware of:
   - Depending on the operator, the data is expected to be potentially 
partitioned and/or sorted. This will drive how you store and read data
   - Tez uses DataMovementEvents to inform the downstream vertex on where to 
pull data from. This information handshake is part of the Input/Output pair 
implementation. 
   - The failure systems also change depending on the kind of storage. Today, 
most edges uses type PERSISTED. This means that the data can survive the 
container going away but not if the node/machine disappears. If using HDFS, 
that would become type PERSISTED_RELIABLE. This means that the data is always 
reliably available ( even if the node on which the data was generated 
disappears ). I don’t believe this is handled yet so that would be a new 
enhancement to Tez to fix the failure semantics for such an edge.

If you are using Hive, this would mean making changes to Hive too to change the 
DAG plan as needed. 

thanks
— Hitesh
 

On Dec 8, 2015, at 3:54 PM, Siddharth Seth  wrote:

> Using hdfs (or a filesystem other than local) is not supported yet. tmpfs 
> would be your best bet in that case - we have tested with this before, but 
> this has capacity limitations, and mixing tmpfs with regular disks does not 
> provide a deterministic mechanism of selecting memory as the intermediate 
> storage.
> Not sure if tachyon has an nfs interface to access it - otherwise that could 
> have been an option.
> 
> We have made simple changes in the past to use HDFS for shuffle - primarily 
> as experiments. None of that is available as patches, but IIRC - the changes 
> were not very complicated. This would involve changing the fetcher to skip 
> HTTP and use a pre-determined path on a specified filesystem to fetch data. 
> Also, the producer to write out to a specific path on a non-local FileSystem.
> 
> On Mon, Dec 7, 2015 at 11:57 AM, Raajay  wrote:
> I wish to setup a Tez data analysis framework, where the data resides in 
> memory. Currently, I have tez (and also Hive) setup such that it can read 
> from an in-memory filesystem like Tachyon. 
> 
> However, the intermediate data is still written to disk at the each 
> processing node. I considered writing to tmpfs, however, such a setup does 
> not fall back to disk gracefully.
> 
> Does Tez have an interface to write intermediate data to HDFS like filesystem 
> ? If yes, what are the settings ?
> 
> Does setting "yarn.nodemanager.local-dirs" to some HDFS or Tachyon URI 
> suffice ?
> 
> Thanks,
> Raajay
>

Re: Issue: Hive with Tez on CDH-5.4.2

2015-12-04 Thread Hitesh Shah

I don’t believe I have seen this error reported before. The error mainly seems 
to be coming from somewhere in the Hive codebase so the hive mailing list might 
provide a more relevant answer. If you don’t get one, would you mind setting 
“tez.am.log.level” to DEBUG in your tez-site.xml, re-run the query and attach 
the yarn logs ( via bin/yarn logs -applicationId ) to a new jira? 

thanks
— Hitesh


On Dec 4, 2015, at 12:08 PM, Sudheer NV  wrote:

> Hi All,
> 
> I have installed Tez (0.7.0) on Cloudera VM CDH 5.4.2 (hadoop-2.6.0, 
> hive-1.1.0).  Execution of demo orderedWordCount example got executed 
> successfully. But hive queries on Tez execution engine is throwing below 
> error. Any help is appreciated. Thanks!!
> 
> ERROR LOGS:
> 
> hive> set hive.execution.engine=tez;
> hive> select count(*) from sample_07;
> Query ID = cloudera_2015120411_d37e3a42-6924-423b-9523-f43c1cca90e8
> Total jobs = 1
> Launching Job 1 out of 1
> Tez session was closed. Reopening...
> Session re-established.
> 
> 
> Status: Running (Executing on YARN cluster with App id 
> application_1449255750568_0004)
> 
> 
> VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
> KILLED
> 
> Map 1 FAILED -1  00   -1   0  
>  0
> Reducer 2 KILLED  1  001   0  
>  0
> 
> VERTICES: 00/02  [>>--] 0%ELAPSED TIME: 0.47 s
>  
> 
> Status: Failed
> Vertex failed, vertexName=Map 1, vertexId=vertex_1449255750568_0004_1_00, 
> diagnostics=[Vertex vertex_1449255750568_0004_1_00 [Map 1] killed/failed due 
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: sample_07 initializer failed, 
> vertex=vertex_1449255750568_0004_1_00 [Map 1], 
> java.lang.IllegalArgumentException: Illegal Capacity: -12185
>   at java.util.ArrayList.(ArrayList.java:142)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:330)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:306)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:408)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:129)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> ]
> Vertex killed, vertexName=Reducer 2, vertexId=vertex_1449255750568_0004_1_01, 
> diagnostics=[Vertex received Kill in INITED state., Vertex 
> vertex_1449255750568_0004_1_01 [Reducer 2] killed/failed due to:null]
> DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask
> 
> 
> Regards,
> Sudheer

Re: Running Tez with Tachyon

2015-11-16 Thread Hitesh Shah

 at 
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651)
>   at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
>   at 
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
>   at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
>   at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
>   at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.(FileOutputCommitter.java:105)
>   at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.(FileOutputCommitter.java:80)
>   at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.getOutputCommitter(FileOutputFormat.java:309)
>   at 
> org.apache.tez.mapreduce.committer.MROutputCommitter.getOutputCommitter(MROutputCommitter.java:137)
>   ... 24 more
> Caused by: java.lang.ClassNotFoundException: Class tachyon.hadoop.TFS not 
> found
>   at 
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
>   at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
>   ... 35 more
> , Vertex vertex_1447296197811_0003_1_02 [Sorter] killed/failed due 
> to:INIT_FAILURE], Vertex killed, vertexName=Summation, 
> vertexId=vertex_1447296197811_0003_1_01, diagnostics=[Vertex received Kill in 
> INITED state., Vertex vertex_1447296197811_0003_1_01 [Summation] 
> killed/failed due to:OTHER_VERTEX_FAILURE], DAG did not succeed due to 
> VERTEX_FAILURE. failedVertices:2 killedVertices:1]
> 
> Best,
> 
> --
> Jiří Šimša
> 
> On Thu, Nov 12, 2015 at 8:52 AM, Hitesh Shah  wrote:
> The general approach for add-on jars requires 2 steps:
> 
> 1) On the client host, where the job is submitted, you need to ensure that 
> the add-on jars are in the local classpath. This is usually done by adding 
> them to HADOOP_CLASSPATH. Please do pay attention to adding the jars via 
> "/*” instead of just "”
> 2) Next, "tez.aux.uris”. This controls additional files/jars needed in the 
> runtime on the cluster. Upload the tachyon jar to HDFS and ensure that you 
> provide the path to either the dir on HDFS or the full path to the file and 
> specify that in tez.aux.uris.
> 
> The last thing to note is that you may need to pull additional transitive 
> dependencies of tachyon if it is not self-contained jar.
> 
> thanks
> — HItesh
> 
> On Nov 12, 2015, at 1:06 AM, Bikas Saha  wrote:
> 
> > Can you provide the full stack trace?
> >
> > Are you getting the exception on the client (while submitting the job) or 
> > in the cluster (after the job started to run)?
> >
> > For the client side, the fix would be to add tachyon jars to the client 
> > classpath. Looks like you tried some client side classpath fixes. You could 
> > run ‘hadoop classpath’ to print the classpath being picked up by the 
> > ‘hadoop jar’ command. And scan its output to check if your tachyon jars are 
> > being picked up correctly or not.
> >
> > Bikas
> >
> > From: Jiří Šimša [mailto:jiri.si...@gmail.com]
> > Sent: Wednesday, November 11, 2015 6:54 PM
> > To: user@tez.apache.org
> > Subject: Running Tez with Tachyon
> >
> > Hello,
> >
> > I have followed the Tez installation instructions 
> > (https://tez.apache.org/install.html) and was able to successfully run the 
> > ordered word count example:
> >
> > $ hadoop jar ./tez-examples/target/tez-examples-0.8.2-SNAPSHOT.jar 
> > orderedwordcount /input.txt /output.txt
> >
> > Next, I wanted to see if I can do the same, this time reading from and 
> > writing to Tachyon (http://tachyon-project.org/) using:
> >
> > $ hadoop jar ./tez-examples/target/tez-examples-0.8.2-SNAPSHOT.jar 
> > orderedwordcount tachyon://localhost:19998/input.txt 
> > tachyon://localhost:19998/output.txt
> >
> > Unsurprisingly, this resulted in the "Class tachyon.hadoop.TFS not found" 
> > error because Tez needs the Tachyon client jar that defines the 
> > tachyon.hadoop.TFS class. To that end, I have tried several options (listed 
> > below) to provide this jar to Tez, none of which seems to have worked:
> >
> > 1) Adding the Tachyon client jar to HADOOP_CLASSPATH
> > 2) Specifying the Tachyon client jar with the -libjars flag for the above 
> > command.
> > 3) Copying the Tachyon client jar into the 
> > $HADOOP_HOME/share/hadoop/common/lib directory of my HADOOP installation.
> > 4) Copying the Tachyon client jar into HDFS and specifying a path to it 
> > through the tez.aux.uris property in the tez-site.xml file (in a similar 
> > fashion the tez.lib.uris property specifies the path to the Tez tarball).
> > 5) I modified the source code of the ordered word count example, adding a 
> > call to TezClient#addAppMasterLocalFiles(...), providing a URI for the 
> > Tachyon client jar uploaded to HDFS.
> >
> > Any advice on how to pass the Tachyon client jar to Tez to resolve this 
> > issue would be greatly appreciated. Thank you.
> >
> > Best,
> >
> > --
> > Jiří Šimša
> 
> 
>

Re: Running Tez with Tachyon

2015-11-12 Thread Hitesh Shah

The general approach for add-on jars requires 2 steps:

1) On the client host, where the job is submitted, you need to ensure that the 
add-on jars are in the local classpath. This is usually done by adding them to 
HADOOP_CLASSPATH. Please do pay attention to adding the jars via "/*” 
instead of just "”
2) Next, "tez.aux.uris”. This controls additional files/jars needed in the 
runtime on the cluster. Upload the tachyon jar to HDFS and ensure that you 
provide the path to either the dir on HDFS or the full path to the file and 
specify that in tez.aux.uris. 

The last thing to note is that you may need to pull additional transitive 
dependencies of tachyon if it is not self-contained jar.

thanks
— HItesh

On Nov 12, 2015, at 1:06 AM, Bikas Saha  wrote:

> Can you provide the full stack trace?
>  
> Are you getting the exception on the client (while submitting the job) or in 
> the cluster (after the job started to run)?
>  
> For the client side, the fix would be to add tachyon jars to the client 
> classpath. Looks like you tried some client side classpath fixes. You could 
> run ‘hadoop classpath’ to print the classpath being picked up by the ‘hadoop 
> jar’ command. And scan its output to check if your tachyon jars are being 
> picked up correctly or not.
>  
> Bikas
>  
> From: Jiří Šimša [mailto:jiri.si...@gmail.com] 
> Sent: Wednesday, November 11, 2015 6:54 PM
> To: user@tez.apache.org
> Subject: Running Tez with Tachyon
>  
> Hello,
>  
> I have followed the Tez installation instructions 
> (https://tez.apache.org/install.html) and was able to successfully run the 
> ordered word count example:
>  
> $ hadoop jar ./tez-examples/target/tez-examples-0.8.2-SNAPSHOT.jar 
> orderedwordcount /input.txt /output.txt
>  
> Next, I wanted to see if I can do the same, this time reading from and 
> writing to Tachyon (http://tachyon-project.org/) using:
>  
> $ hadoop jar ./tez-examples/target/tez-examples-0.8.2-SNAPSHOT.jar 
> orderedwordcount tachyon://localhost:19998/input.txt 
> tachyon://localhost:19998/output.txt
>  
> Unsurprisingly, this resulted in the "Class tachyon.hadoop.TFS not found" 
> error because Tez needs the Tachyon client jar that defines the 
> tachyon.hadoop.TFS class. To that end, I have tried several options (listed 
> below) to provide this jar to Tez, none of which seems to have worked:
>  
> 1) Adding the Tachyon client jar to HADOOP_CLASSPATH
> 2) Specifying the Tachyon client jar with the -libjars flag for the above 
> command.
> 3) Copying the Tachyon client jar into the 
> $HADOOP_HOME/share/hadoop/common/lib directory of my HADOOP installation.
> 4) Copying the Tachyon client jar into HDFS and specifying a path to it 
> through the tez.aux.uris property in the tez-site.xml file (in a similar 
> fashion the tez.lib.uris property specifies the path to the Tez tarball).
> 5) I modified the source code of the ordered word count example, adding a 
> call to TezClient#addAppMasterLocalFiles(...), providing a URI for the 
> Tachyon client jar uploaded to HDFS.
>  
> Any advice on how to pass the Tachyon client jar to Tez to resolve this issue 
> would be greatly appreciated. Thank you.
>  
> Best,
>  
> --
> Jiří Šimša

Re: Constant Full GC making Tez Hive job take almost forever

2015-10-23 Thread Hitesh Shah

Hello Juho 

As you are probably aware, each hive query will largely have different memory 
requirements depending on what kind of plan it ends up executing. For the most 
part, a common container size and general settings work well for most queries.
In this case, this might need additional tuning to either fix the hive query 
plan or correctly size the Tez container just for this query as well as tuning 
any other Hive knobs that may be making the wrong assumptions about data stats 
or available memory to play with causing this query to run very slowly. 

As a first step, it would be good if you can help provide the explain plan for 
the query, hive-site/tez-site for configs being used and the yarn application 
logs for the completed query. If you have the Tez UI available, you can click 
the “Download data” on the dag details page too which can be used to run 
against the various perf analyzers available in Tez to see what the issue is.

thanks
— Hitesh


On Oct 23, 2015, at 1:08 AM, Juho Autio  wrote:

> Hi,
> 
> I'm running a Hive script with tez-0.7.0. The progress is real slow and in 
> the container logs I'm seeing constant Full GC lines, so that there doesn't 
> seem to be no time for the JVM to actually execute anything between the GC 
> pauses.
> 
> When running the same Hive script with mr execution engine, the job goes 
> through normally.
> 
> So there's something specific to Tez's memory usage that causes the Full GC 
> issue.
> 
> Also with similar clusters & configuration other Hive jobs have gone through 
> with Tez just fine. This issue happens when I just add a little more data to 
> be processed by the script. With a smaller workload it goes through also with 
> Tez engine with the expected execution time.
> 
> For example an extract from one of the container logs:
> 
> application_1445328511212_0001/container_1445328511212_0001_01_000292/stdout.gz
> 
> 791.208: [Full GC 
> [PSYoungGen: 58368K->56830K(116736K)] 
> [ParOldGen: 348914K->348909K(349184K)] 
> 407282K->405740K(465920K) 
> [PSPermGen: 43413K->43413K(43520K)], 1.4063790 secs] [Times: user=5.22 
> sys=0.04, real=1.40 secs] 
> Heap
>  PSYoungGen  total 116736K, used 58000K [0xf550, 
> 0x0001, 0x0001)
>   eden space 58368K, 99% used 
> [0xf550,0xf8da41a0,0xf8e0)
>   from space 58368K, 0% used 
> [0xf8e0,0xf8e0,0xfc70)
>   to   space 58368K, 0% used 
> [0xfc70,0xfc70,0x0001)
>  ParOldGen   total 349184K, used 348909K [0xe000, 
> 0xf550, 0xf550)
>   object space 349184K, 99% used 
> [0xe000,0xf54bb4b0,0xf550)
>  PSPermGen   total 43520K, used 43413K [0xd5a0, 
> 0xd848, 0xe000)
>   object space 43520K, 99% used 
> [0xd5a0,0xd84657a8,0xd848)
> 
> If I understand the GC log correctly, it seems like ParOldGen is full and 
> Full GC doesn't manage to free space from there. So maybe Tez has created too 
> many objects that can't be released. It could be a memory leak. Or maybe this 
> is just not big enough minimum heap for Tez in general? I could probably fix 
> the problem by changing configuration somehow to simply have less containers 
> and thus bigger heap size per container? Still, changing to bigger nodes 
> doesn't seem like a solution that would eventually scale, so I would prefer 
> to resolve this properly.
> 
> Please, could you help me with how to troubleshoot & fix this issue?
> 
> Cheers,
> Juho

Re: Tez Code 1 & Tez Code 2

2015-10-22 Thread Hitesh Shah

Hello Dale, 

I think I can guess what is happening. Hue keeps connections open between 
itself and the HiveServer2. Now what happens is that the Tez session times 
itself out if queries are not submitted to it within a certain time window ( to 
stop wasting resources on a YARN cluster ). 

This can be inferred from the following log:
2015-10-13 15:39:43,938 INFO [Timer-1] app.DAGAppMaster: Session timed out, 
lastDAGCompletionTime=1444746866023 ms, sessionTimeoutInterval=30 ms

There are a couple of fixes that were done in Tez and Hive so that such 
timeouts cause the Tez sessions to be re-cycled correctly. Due to a combination 
of Tez not throwing a proper error in earlier versions to Hive not handling 
such submission failures correctly, Hive continues to try to use the “dead” 
sessions causing submitted queries to fail. 

In your case, for the short term, a restart of the HiveServer should address 
the issue but until the HiveServer ( and/or Tez ) is patched, you will see such 
issues crop up from time to time. 

thanks
— Hitesh


On Oct 19, 2015, at 5:52 AM, Bradman, Dale  wrote:

> Hi Jeff,
> 
> The RM is fairly stable and reliable. As I said, the command works when 
> passing it through Beeline. Just not in Hue.
> 
> Resource manager log snippet:
> 
> 2015-10-19 09:51:51,811 INFO  ipc.Server (Server.java:run(2060)) - IPC Server 
> handler 32 on 8032, call 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
> from 10.10.7.223:33554 Call#625674 Retry#0
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1444742140034_0009' doesn't exist in RM.
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:324)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:170)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:401)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
> 2015-10-19 09:52:46,900 INFO  ipc.Server (Server.java:run(2060)) - IPC Server 
> handler 25 on 8032, call 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
> from 10.10.7.223:33599 Call#625743 Retry#0
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1444742140034_0009' doesn't exist in RM.
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:324)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:170)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:401)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
> 2015-10-19 09:53:45,479 INFO  ipc.Server (Server.java:run(2060)) - IPC Server 
> handler 10 on 8032, call 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
> from 10.10.7.223:33651 Call#625830 Retry#0
> 
> 
> Thanks,
> Dale
> 
>> On 19 Oct 2015, at 13:46, Jianfeng (Jeff) Zhang  
>> wrote:
>> 
>> 
>> Hi Dale,
>> 
>> Does it happen frequently ? Does the RM work normally (can still accept new 
>> jobs) when this happens  ? 
>> From the logs, it seems AM meet errors when heartbeat with RM. And it switch 
>> between 2 RM for a long time. It might be the RM issue, could you check the 
>> RM logs ?
>> 
>>  
>> Best Regard,
>> Jeff Zhang
>> 
>> 
>> From: , Dale 
>> Reply-To: "user@tez.apache.org" 
>> Date: Monday, October 19, 2015 at 8:35 PM
>> To: "

Re: Tez View in Ambari

2015-10-15 Thread Hitesh Shah

In previous releases of the Tez view, it did not have the integration in place 
for the Hive query info. 

There is one approach that you can try:

Ambari has something called standalone mode. You can deploy a new ambari-server 
version 2.1 ( on a different host ) and just instantiate a Tez view on it. This 
ambari instance does not need to be used to manage/install a cluster but the 
Tez view can be configured to point to the YARN ResourceManager and YARN 
timeline server from your original cluster. This will likely get you the new 
set of features ( some of the hive-tez integrations came in newer versions of 
Hive/Tez so some features might not be available depending on your deployed 
cluster versions). 

There are is another approach which would involve compiling a new version of 
the Tez view against the old ambari-server code and loading into your existing 
ambari-server instance too. I am not too sure on whether the view from 
ambari-server 2.1 will work on ambari-server 2.0 ( might be worth a try but not 
sure if the ambari view apis are fully compatible across both releases )

thanks
— Hitesh

On Oct 15, 2015, at 3:04 AM, Bradman, Dale  wrote:

> Thanks. I am using Ambari 2.0. I am guessing that you are referring to Ambari 
> 2.1?
> 
> 
> 
>> On 15 Oct 2015, at 10:59, Rajesh Balamohan  wrote:
>> 
>> In recent versions, hive queries are directly available in DAG details 
>> (which has the app and DAG id).  Example screenshot is given below.
>> 
>> 
>> 
>> 
>> Which versions of Hive/Tez are you using?
>> 
>> ~Rajesh.B
>> 
>> 
>> On Thu, Oct 15, 2015 at 3:08 PM, Bradman, Dale  
>> wrote:
>> Hi,
>> 
>> I am heavily using the Tez UI within Ambari and it's superb - really helps 
>> tune your queries. 
>> 
>> Anyway, I was wondering if there is an easier way to determine which 
>> applicationID belongs to which query. I’m running 100’s of queries in 
>> parallel and need a better way to determine what query has been run rather 
>> than trying to second guess the UI.
>> 
>> Is there a way (either via the UI or otherwise) to do this?
>> 
>> Thanks,
>> Dale
>> 
>> 
>> 
>> Capgemini is a trading name used by the Capgemini Group of companies which 
>> includes Capgemini UK plc, a company registered in England and Wales (number 
>> 943935) whose registered office is at No. 1, Forge End, Woking, Surrey, GU21 
>> 6DB.
>> This message contains information that may be privileged or confidential and 
>> is the property of the Capgemini Group. It is intended only for the person 
>> to whom it is addressed. If you are not the intended recipient, you are not 
>> authorized to read, print, retain, copy, disseminate, distribute, or use 
>> this message or any part thereof. If you receive this message in error, 
>> please notify the sender immediately and delete all copies of this message.
>> 
> 
> 
> 
> Capgemini is a trading name used by the Capgemini Group of companies which 
> includes Capgemini UK plc, a company registered in England and Wales (number 
> 943935) whose registered office is at No. 1, Forge End, Woking, Surrey, GU21 
> 6DB.

Re: Getting dot files for DAGs

2015-10-01 Thread Hitesh Shah

I don’t believe the binary should need changing at all unless you need 
enhancements from recent commits. It should just be setting up the UI and 
configuring Tez for using YARN Timeline.

The instructions that you can follow:
  http://tez.apache.org/tez-ui.html 
  http://tez.apache.org/tez_yarn_timeline.html

thanks
— Hitesh

On Oct 1, 2015, at 11:07 AM, James Pirz  wrote:

> Thanks for suggesting, I never used Tez UI before, and learned about it 
> yesterday.
> I am trying to find out how I can enable/use it. Apparently it needs some 
> changes in the binary that I am using (I had built the binary for tez 0.7 
> almost 2 months ago).
> 
> 
> 
> 
> On Wed, Sep 30, 2015 at 10:27 PM, Jörn Franke  wrote:
> Why not use tez ui?
> 
> Le jeu. 1 oct. 2015 à 2:29, James Pirz  a écrit :
> I am using Tez 0.7.0 on Hadopp 2.6 to run Hive queries.
> I am interested in checking DAGs for my queries visually, and I realized that 
> I can do that by graphviz once I can get "dot" files of my DAGs. My issue is 
> I can not find those files, they are not in the log directory of Yarn or 
> Hadoop or under /tmp .
> 
> Any hint as where I can find those files would be great. Do I need to add any 
> settings to my tez-site.xml in-order to enable generating them ?
> 
> Thanks. 
>

Re: Getting dot files for DAGs

2015-10-01 Thread Hitesh Shah

Hello Andre,

For ATS, in the TEZ_DAG_ID entity, the dagPlan is already serialized and 
available in the otherInfo section. It is not in the .dot format but it is used 
by the Tez UI to come up with the graphical view of the dag plan.

thanks
— Hitesh


On Oct 1, 2015, at 1:28 AM, Andre Kelpe  wrote:

> Maybe it would be a good idea to send the dot file to the ATS along with the 
> other information you are sending. I too wanted to look at a dot file the 
> other day and had problem finding it back.
> 
> - André
> 
> On Thu, Oct 1, 2015 at 4:00 AM, Hitesh Shah  wrote:
> The .dot file is generated into the Tez Application Master’s container log 
> dir. Firstly, you need to figure out the yarn application in which the 
> query/Tez DAG ran. Once you have the applicationId, you can use one of these 
> 2 approaches:
> 
> 1) Go to the YARN ResourceManager UI, find the application and click through 
> to the Application Master logs. The .dot file for the dag should be visible 
> there.
> 2) Using the application Id ( if the application has completed), get the yarn 
> logs using “bin/yarn logs -applicationId ” - once you have the logs, 
> you will be able to find the contents of the .dot file within them. This 
> approach only works if you have YARN log aggregation enabled.
> 
> thanks
> — Hitesh
> 
> 
> On Sep 30, 2015, at 5:29 PM, James Pirz  wrote:
> 
> > I am using Tez 0.7.0 on Hadopp 2.6 to run Hive queries.
> > I am interested in checking DAGs for my queries visually, and I realized 
> > that I can do that by graphviz once I can get "dot" files of my DAGs. My 
> > issue is I can not find those files, they are not in the log directory of 
> > Yarn or Hadoop or under /tmp .
> >
> > Any hint as where I can find those files would be great. Do I need to add 
> > any settings to my tez-site.xml in-order to enable generating them ?
> >
> > Thanks.
> 
> 
> 
> 
> -- 
> André Kelpe
> an...@concurrentinc.com
> http://concurrentinc.com

Re: Getting dot files for DAGs

2015-09-30 Thread Hitesh Shah

The .dot file is generated into the Tez Application Master’s container log dir. 
Firstly, you need to figure out the yarn application in which the query/Tez DAG 
ran. Once you have the applicationId, you can use one of these 2 approaches: 

1) Go to the YARN ResourceManager UI, find the application and click through to 
the Application Master logs. The .dot file for the dag should be visible there.
2) Using the application Id ( if the application has completed), get the yarn 
logs using “bin/yarn logs -applicationId ” - once you have the logs, you 
will be able to find the contents of the .dot file within them. This approach 
only works if you have YARN log aggregation enabled.

thanks
— Hitesh

On Sep 30, 2015, at 5:29 PM, James Pirz  wrote:

> I am using Tez 0.7.0 on Hadopp 2.6 to run Hive queries.
> I am interested in checking DAGs for my queries visually, and I realized that 
> I can do that by graphviz once I can get "dot" files of my DAGs. My issue is 
> I can not find those files, they are not in the log directory of Yarn or 
> Hadoop or under /tmp .
> 
> Any hint as where I can find those files would be great. Do I need to add any 
> settings to my tez-site.xml in-order to enable generating them ?
> 
> Thanks.

Re: Getting DAG Id from Hive on tez

2015-09-15 Thread Hitesh Shah

This is a question that is probably meant for the Hive mailing list. I believe 
either the Hive query Id or the information from the query plan ( as used in 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/hooks/ATSHook.java
 ) should probably be able to give you that info but the Hive devs 
(d...@hive.apache.org) will likely have a better answer for this. 

— HItesh 


On Sep 15, 2015, at 1:56 AM, Dharmesh Kakadia 
 wrote:

> Thanks Hitesh.
> I am able to filter out the particular dag, now the only problem is how to 
> get the DAG name. I see from a previous mail on the list[1] that Tez uses 
> Hive queryId + counter as the dag name. I have access to the hive query Id, 
> but how do I know the name from it? Is there a way to get the name/Id of the 
> DAG directly ? Just trying out counter=1..N will be pretty bad for me in 
> pre-hook.
> [1] 
> http://mail-archives.apache.org/mod_mbox/hive-user/201408.mbox/%3c1409015953.23241.yahoomail...@web161603.mail.bf1.yahoo.com%3E
> 
> Thanks,
> Dharmesh
> 
> The Hive query id maps to the Tez dag name. You can try the following call 
> against timeline:
> 
> 
> /ws/v1/timeline/TEZ_DAG_ID?primaryFilter=dagName:{tezDagName}
> 
> thanks
> — Hitesh
> 
> On Sep 14, 2015, at 10:45 PM, Dharmesh Kakadia <
> dharmesh.kaka...@research.iiit.ac.in
> >
> wrote:
> 
> > Hi,
> > 
> > I am running Hive on Tez, with timeline server. We have a pre-hook in hive 
> > to maintain
> the statistics of what jobs ran by whom and how much resource it tool etc. 
> that we had been
> using it with Hive-MR. Now I am trying to port this pre-hook to work with 
> Hive on Tez. 
> > 
> > I plan to using timeline server for querying the stat, but I am not able to 
> > get the DAG
> ID in the hook. How to get the DAG Id from Tez? Any help will be great.
> > 
> > Thanks,
> > Dharmesh
> 
>

Re: Getting DAG Id from Hive on tez

2015-09-14 Thread Hitesh Shah

The Hive query id maps to the Tez dag name. You can try the following call 
against timeline: 

/ws/v1/timeline/TEZ_DAG_ID?primaryFilter=dagName:{tezDagName}

thanks
— Hitesh

On Sep 14, 2015, at 10:45 PM, Dharmesh Kakadia 
 wrote:

> Hi,
> 
> I am running Hive on Tez, with timeline server. We have a pre-hook in hive to 
> maintain the statistics of what jobs ran by whom and how much resource it 
> tool etc. that we had been using it with Hive-MR. Now I am trying to port 
> this pre-hook to work with Hive on Tez. 
> 
> I plan to using timeline server for querying the stat, but I am not able to 
> get the DAG ID in the hook. How to get the DAG Id from Tez? Any help will be 
> great.
> 
> Thanks,
> Dharmesh

Re: Over writing files

2015-09-11 Thread Hitesh Shah

This is probably a question for the Hive dev mailing list on how the 
staging/output directory name is determined. i.e. 
".hive-staging_hive_2015-09-11_00-07-40_043_6365145769624003668-1”. You may 
need to change this value in the config being used to configure the output of 
the vertex that is doing the write to HDFS.

— Hitesh


On Sep 11, 2015, at 1:09 PM, Raajay  wrote:

> I am running DAGs generated by Hive using my custom Tez Client. So I 
> serialize a DAG, load it back and submit it later. Everything works great the 
> first time; however, on second runs the I get a RunTime exception (snippet 
> below)
> 
> My guess, it since the same DAG is run again, the output tables (have same 
> id) and that prevents overwrite. 
> 
> Where should i introduce randomness in the file name ? Should I change some 
> name field in FileSinkDescriptor every time I re-run the dag ? 
> 
> Thanks
> Raajay
> 
> 
>  Vertex failed, vertexName=Reducer 3, 
> vertexId=vertex_1441949856963_0011_1_04, diagnostics=[Task failed, 
> taskId=task_1441949856963_0011_1_04_00, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Failure while running task:java.lang.RuntimeException: 
> java.lang.RuntimeException: Hive Runtime Error while closing operators: 
> Unable to rename output from: 
> hdfs://10.10.1.2:8020/apps/hive/output_tab/.hive-staging_hive_2015-09-11_00-07-40_043_6365145769624003668-1/_task_tmp.-ext-1/_tmp.00_0
>  to: 
> hdfs://10.10.1.2:8020/apps/hive/output_tab/.hive-staging_hive_2015-09-11_00-07-40_043_6365145769624003668-1/_tmp.-ext-1/00_0
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:345)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171

Re: Missing libraries.

2015-09-11 Thread Hitesh Shah

“tez.aux.uris” supports a comma separated list of files and directories on HDFS 
or any distributed filesystem ( no tar balls, archives, etc and no support for 
file:// ). When Hive submits a query to Tez, it adds its hive-exec.jar to the 
tez runtime ( similar to MR distributed cache ). If you are hitting a class not 
found again, you are missing some jars from the classpath. If you are using 
your own InputFormat, try doing an “add jar “ in your hive script or add that 
custom jar to tez.aux.uris after uploading it to HDFS.

If you are modifying the hive-exec.jar, you may need to ensure that Hive is 
using that jar and not overriding your custom hive-exec.jar.

— Hitesh

On Sep 11, 2015, at 12:56 AM, Raajay  wrote:

> Yeah. I added the hive-exec.jar that contains HiveSpltGenerator to HDFS. I 
> still hit the exception
> 
> On Fri, Sep 11, 2015 at 2:43 AM, Jianfeng (Jeff) Zhang 
>  wrote:
> 
> Have you try using jar rather than tar.gz ?
> 
> 
> Best Regard,
> Jeff Zhang
> 
> 
> From: Raajay 
> Reply-To: "user@tez.apache.org" 
> Date: Friday, September 11, 2015 at 3:15 PM
> To: "user@tez.apache.org" 
> Subject: Missing libraries.
> 
> I am running DAGs generated by Hive for Tez in offline mode; as in I store 
> the DAGs to disk and then run them later using my own Tez Client.
> 
> I have been able to get this setup going in local mode. However, while 
> running on the cluster, I hit Processor class not found exception (snippet 
> below). I figure this is because, custom processor classes defined in Hive 
> (eg: HiveSplitGenerator) is not visible while executing a mapper.
> 
> I have uploaded, hive exec jar (apache-hive-2.0.0-SNAPSHOT-bin.tar.gz) to 
> HDFS and pointed ${tez.aux.uris} to that location. Not sure what more is 
> needed to make hive Classes visible to tez tasks ? "tar.gz" does not work ?
> 
> 
> 2015-09-11 00:59:02,973 INFO [Dispatcher thread: Central] impl.VertexImpl: 
> Recovered Vertex State, vertexId=vertex_1441949856963_0006_1_02 [Map 1], 
> state=NEW, numInitedSourceVertices=0, numStartedSourceVertices=0, 
> numRecoveredSourceVertices=0, recoveredEvents=0, tasksIsNull=false, numTasks=0
> 2015-09-11 00:59:02,974 INFO [Dispatcher thread: Central] impl.VertexImpl: 
> Root Inputs exist for Vertex: Map 4 : {a={InputName=a}, 
> {Descriptor=ClassName=org.apache.tez.mapreduce.input.MRInputLegacy, 
> hasPayload=true}, 
> {ControllerDescriptor=ClassName=org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator,
>  hasPayload=false}}
> 2015-09-11 00:59:02,974 INFO [Dispatcher thread: Central] impl.VertexImpl: 
> Starting root input initializer for input: a, with class: 
> [org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator]
> 2015-09-11 00:59:02,974 INFO [Dispatcher thread: Central] impl.VertexImpl: 
> Setting vertexManager to RootInputVertexManager for 
> vertex_1441949856963_0006_1_00 [Map 4]
> 2015-09-11 00:59:02,979 INFO [Dispatcher thread: Central] impl.VertexImpl: 
> Num tasks is -1. Expecting VertexManager/InputInitializers/1-1 split to set 
> #tasks for the vertex vertex_1441949856963_0006_1_00 [Map 4]
> 2015-09-11 00:59:02,979 INFO [Dispatcher thread: Central] impl.VertexImpl: 
> Vertex will initialize from input initializer. vertex_1441949856963_0006_1_00 
> [Map 4]
> 2015-09-11 00:59:02,980 INFO [Dispatcher thread: Central] impl.VertexImpl: 
> Vertex will initialize via inputInitializers vertex_1441949856963_0006_1_00 
> [Map 4]. Starting root input initializers: 1
> 2015-09-11 00:59:02,981 ERROR [Dispatcher thread: Central] 
> common.AsyncDispatcher: Error in dispatcher thread
> org.apache.tez.dag.api.TezUncheckedException: Unable to load class: 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator
>   at 
> org.apache.tez.common.ReflectionUtils.getClazz(ReflectionUtils.java:45)
>   at 
> org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:96)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager.createInitializer(RootInputInitializerManager.java:137)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager.runInputInitializers(RootInputInitializerManager.java:114)
> 
> 
>

Re: Creating TaskLocationHints

2015-09-11 Thread Hitesh Shah

Tez tries to obey them but as you call out, it also depends on YARN. 

Tez follows a simple heuristic. It tries a best effort to do data local 
allocation. After a certain delay expires, it tries to then allow a task to be 
assigned to either a data local or rack local container and then after another 
timeout picks any available container. These fallbacks are configurable ( i.e. 
whether to allow fall backs ) as well as the time delay. There is also some 
additional priority given to already launched containers as compared to new 
allocations from YARN.

Search for FALLBACK in TezConfiguration.java or check the attachments in 
https://issues.apache.org/jira/browse/TEZ-2294 for documentation. 

— Hitesh 

On Sep 11, 2015, at 12:05 AM, Raajay  wrote:

> I was able to get it working with "hostnames". thanks!
> 
> To dig deeper, how much does Tez obey the hints provided? How are Vertex 
> Location Hints handled ? What if YARN is not able to provide containers in 
> requested locations ?
> 
> Raajay
> 
> On Thu, Sep 10, 2015 at 10:19 AM, Hitesh Shah  wrote:
> In almost all cases, this is usually hostnames. The general flow is find the 
> block locations for the data source, extract the hostname from there and 
> provide it to YARN so that it can provide a container on the same host as the 
> datanode having the data. As long as YARN is using hostnames, the container 
> locality matching should work correctly. I will need to go and check the YARN 
> codebase to see if it does some additional reverse dns lookups for IPs to 
> also function correctly but to be safe, hostnames should work.
> 
> I don’t believe Tez has yet introduced support for working with 
> application-level YARN node labels.
> 
> thanks
> — Hitesh
> 
> On Sep 10, 2015, at 12:43 AM, Raajay  wrote:
> 
> > While creating TaskLocationHints, using the static function
> >
> > TaskLocationHint.createTaskLocationHint(Set nodes, Set 
> > racks)
> >
> > what should the Strings be ? IP address of the nodes ? Node labels ? Or 
> > hostnames ?
> >
> > Thanks
> > Raajay
> 
>

Re: how to allocate more containers?

2015-09-11 Thread Hitesh Shah

Most of the parallelism ( no. of containers ) is controlled by the upper layer 
application. There is some minor control that Tez does when it groups splits 
together but for the most part, the upper layer decides how many containers to 
run. 

You should look at Hive configs to see how to control the no. of splits 
generated by the HiveInputFormat ( or Orc, etc ). As for Tez, you can look at 
the grouping parameters ( check source of TezMapReduceSplitsGrouper for the 
config knobs ). By either limiting the max. size of a grouped split, you can 
allow more splits to run in parallel. However, if the no. of splits given to 
Tez by Hive itself is too low, then Tez cannot increase parallelism further.

https://cwiki.apache.org/confluence/display/TEZ/How+initial+task+parallelism+works
 has some details on how parallelism but it does not call out what configs are 
useful in tuning.

thanks
— Hitesh 

On Sep 10, 2015, at 9:38 PM, Xiaoyong Zhu  wrote:

> Hi
>  
> I am wondering if there is a configuration I can change to allocate more 
> containers for a certain Tez application? I am using Hive on Tez.
>  
> Thanks!
>  
> Xiaoyong

Re: Creating TaskLocationHints

2015-09-10 Thread Hitesh Shah

In almost all cases, this is usually hostnames. The general flow is find the 
block locations for the data source, extract the hostname from there and 
provide it to YARN so that it can provide a container on the same host as the 
datanode having the data. As long as YARN is using hostnames, the container 
locality matching should work correctly. I will need to go and check the YARN 
codebase to see if it does some additional reverse dns lookups for IPs to also 
function correctly but to be safe, hostnames should work.

I don’t believe Tez has yet introduced support for working with 
application-level YARN node labels. 

thanks
— Hitesh 

On Sep 10, 2015, at 12:43 AM, Raajay  wrote:

> While creating TaskLocationHints, using the static function
> 
> TaskLocationHint.createTaskLocationHint(Set nodes, Set racks)
> 
> what should the Strings be ? IP address of the nodes ? Node labels ? Or 
> hostnames ?
> 
> Thanks
> Raajay

Re: Error of setting vertex location hints

2015-09-10 Thread Hitesh Shah

There are 2 aspects to using Vertex Location Hints and parallelism. All of this 
depends on how you define the work that needs to be done by a particular task. 

I will take the MR approach and compare it to the more dynamic approach that 
Jeff has been explaining. 

For MR, all the work was decided upfront on the client-side. i.e. how many 
tasks are needed and which task will process what split. From a Tez point of 
view, what this means is that you can configure the vertex with a fixed 
parallelism ( i.e. not -1 ) and set up the Vertex location hints as needed. 
This also implies that you need to configure the Input for that vertex with all 
the necessary information on what work it needs to do via its user payload.

tez-tests/src/main/java/org/apache/tez/mapreduce/examples/FilterLinesByWord.java
 has an option to generate the splits on the client. You can follow this code 
path to see how the DAG is setup. The same approach is also used for running 
any MapReduce job via Tez using the yarn-tez config knob ( MR always generates 
splits on the client ). 

The dynamic approach that Tez follows is that for vertices which are taking 
input from HDFS ( or any other source for that matter ) will have parallelism 
set to -1 ( and no location hints defined at dag plan creation time ). The 
Input has an Initializer attached to it which runs in the ApplicationMaster, 
looks at the data to be processed, figures out how many tasks to run, where to 
run the tasks and also what shard/partition of work to assign to each task. 
There are multiple facets to this which have been mostly covered by Jeff in his 
earlier replies. 

thanks
— Hitesh

On Sep 10, 2015, at 1:15 AM, Jianfeng (Jeff) Zhang  
wrote:

> >>> I am trying to create a scenario where the mappers (root tasks) are 
> >>> necessarily not executed at the data location
> Not sure your purpose. Usually data locality can improve performance.
> 
> 
> >>> Can the number of tasks for the tokenizer be a value *NOT* equal to the 
> >>> number of HDFS blocks of the file ?
> Yes, it can.  Two ways
> *  MRInput internally use InputFormat to determine how to split. So all the 
> methods in InputFormat are applied to MRInput too. 
>Like mapreduce.input.fileinputformat.split.minsize & 
> mapreduce.input.fileinputformat.split.maxsize
> 
> * Another way is to use TezGroupedSplitsInputFormat which is provided by tez. 
> This InputFormat will group several splits together as a new split to be 
> consumed by one mapper.
>   You can use the following parameters to tune that, and please refer 
> MRInputConfigBuilder.groupSplits
>   • tez.grouping.split-waves 
>   • tez.grouping.max-size
>   • tez.grouping.min-size
> 
> >>>  Can a mapper be scheduled at a location different than the location of 
> >>> its input block ? If yes, how ? 
> Yes, it is possible. Tez will always use the split info, there’s no 
> option to disable it. If you really want to, you need to create new 
> InputInitializer. I think you just need to make a little changes on 
> MRInputAMSplitGenerator
>
> https://github.com/zjffdu/tez/blob/a3a7700dea0a315ad613aa2d8a7223eb73878cb5/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/common/MRInputAMSplitGenerator.java
> 
> 
> You just need to make a little changes on the following code snippet 
> 
> InputConfigureVertexTasksEvent configureVertexEvent = 
> InputConfigureVertexTasksEvent.create(
> inputSplitInfo.getNumTasks(),
> VertexLocationHint.create(inputSplitInfo.getTaskLocationHints()), 
>  // make code changes here 
> InputSpecUpdate.getDefaultSinglePhysicalInputSpecUpdate());
> events.add(configureVertexEvent);
> 
> 
> 
> Best Regard,
> Jeff Zhang
> 
> 
> From: Raajay 
> Reply-To: "user@tez.apache.org" 
> Date: Thursday, September 10, 2015 at 2:07 PM
> To: "user@tez.apache.org" 
> Subject: Re: Error of setting vertex location hints
> 
> The input is a hdfs file. I am trying to create a scenario where the mappers 
> (root tasks) are necessarily not executed at the data location. So for now, I 
> chose the Location Hint for the tasks in a random fashion. I figured by 
> populating VertexLocation hint, with address of random nodes, I could achieve 
> it.
> 
> This requires setting parallelism to be the number of elements in 
> VertexLocation hint; which led to the errors.
> 
> Summarizing, for the work count example,
> 
> 1. Can the number of tasks for the tokenizer be a value *NOT* equal to the 
> number of HDFS blocks of the file ?
> 
> 2. Can a mapper be scheduled at a location different than the location of its 
> input block ? If yes, how ? 
> 
> Raajay
> 
> 
> 
> 
> On Thu, Sep 10, 2015 at 12:30 AM, Jianfeng (Jeff) Zhang 
>  wrote:
> >>> In the WordCount example, while creating the Tokenizer Vertex, neither 
> >>> the parallelism or VertexLocation hints is specified. My guess is that at 
> >>> runtime, based on InputInitializer, these values are populated.
> Correct, the parallelism and Verte

Re: Pig(0.14.0) on Tez(0.7.0)

2015-09-02 Thread Hitesh Shah

Pig 0.14 was released around the time when tez-0.5 was the stable release. Tez 
0.7 is compatible with tez 0.5 so pig 0.14 should work with it. This is a 
question which you should also ask on the pig mailing lists ( I don’t believe 
anyone from the Pig community has raised any bugs in this area ). 

thanks
— Hitesh

On Sep 2, 2015, at 9:48 PM, Sandeep Kumar  wrote:

> As you correctly pointed out there was issue of guava library only. In my 
> code there were some UDFs which were using guava-0.16.0.jar. 
> 
> I've removed it and now there are no exceptions. 
> 
> Just for curiosity. Can i use tez-0.7.0 with latest PIG-0.14.0? Is it tested 
> earlier?
> 
> Regards,
> Sandeep
> 
> On Wed, Sep 2, 2015 at 8:44 PM, Hitesh Shah  wrote:
> Based on the stack trace, the following issue seems to be the cause:
> 
> Caused by: java.lang.NoSuchMethodError: 
> com.google.common.base.Stopwatch.elapsedTime(Ljava/util/concurrent/TimeUnit;)J
> at 
> org.apache.tez.runtime.library.common.shuffle.HttpConnection.validate(HttpConnection.java:221)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.setupConnection(FetcherOrderedGrouped.java:328)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.copyFromHost(FetcherOrderedGrouped.java:245)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.run(FetcherOrderedGrouped.java:167)
> 
> This means that the guava library version is picking up a new version at 
> runtime.
> 
> A quick test is to run say orderedwordcount from tez-examples to verify 
> standalone Tez has no issues. Also, you may wish to check the contents of 
> “tez.lib.uris” to verify that it has guava-11.0.2.
> 
> If you are familiar with using/debugging YARN, set 
> "yarn.nodemanager.delete.debug-delay-sec” to a value such as 1200. Now, pick 
> a host on which the Tez container ran in which a task failed.
> 
> Using the directories specified in "yarn.nodemanager.local-dirs” in 
> yarn-site.xml, search for a “launch_container.sh” under the 
> container-specific directory mapping to the tez container above. The contents 
> of this will tell you which guava library is being symlinked into the 
> container space and used in the classpath. Having 2 guava jars is also a 
> problem as either could be picked.
> 
> thanks
> — Hitesh
> 
> 
> On Sep 2, 2015, at 5:18 AM, Sandeep Kumar  wrote:
> 
> > Thanks for your responses. I was mistaken that there is any compatibility 
> > issue. Its the same error when i run PIG-0.14.0 over Tez-0.5.2.
> >
> > HadoopVersion: 2.6.0-cdh5.4.4
> > PigVersion: 0.14.0
> > TezVersion: 0.5.2
> >
> > PFA the exception stack trace.
> >
> >
> > On Wed, Sep 2, 2015 at 3:11 PM, Jianfeng (Jeff) Zhang 
> >  wrote:
> > >>> I could not use tez-0.5.2 because it was not compatible with 
> > >>> Hadoop-2.6.0.
> >
> > What incompatible do you see ?
> >
> >
> > Best Regard,
> > Jeff Zhang
> >
> >
> > From: Sandeep Kumar 
> > Reply-To: "user@tez.apache.org" 
> > Date: Wednesday, September 2, 2015 at 5:18 PM
> >
> > To: "user@tez.apache.org" 
> > Subject: Re: Pig(0.14.0) on Tez(0.7.0)
> >
> > Yes i did change PIG/ivy/libraries.propeties to compile it with tez-0.7.0 
> > and also changed pig to compile with Hadoop-core-2.6.0.
> >
> > I could not use tez-0.5.2 because it was not compatible with Hadoop-2.6.0.
> >
> > I'm compiling my code of PIG using same command: ant clean jar 
> > -Dhadoopversion=23
> >
> >
> >
> > On Wed, Sep 2, 2015 at 2:36 PM, Jianfeng (Jeff) Zhang 
> >  wrote:
> >
> > Not sure how did you compile pig with tez 0.7.0, did you change the tez 
> > version in PIG/ivy/libraries.propeties ?
> >
> > And make sure you build pig with hadoop version, by default, pig build with 
> > hadoop-1.x.  Use the following command to build pig with hadoop-2.x
> >
> > >> ant clean jar -Dhadoopversion=23
> >
> >
> >
> > Best Regard,
> > Jeff Zhang
> >
> >
> > From: Sandeep Kumar 
> > Reply-To: "user@tez.apache.org" 
> > Date: Wednesday, September 2, 2015 at 4:27 PM
> > To: "user@tez.apache.org" 
> > Subject: Re: Pig(0.14.0) on Tez(0.7.0)
> >
> > Hi Jeff,
> >
> > The cloudera Hadoop is using guava-11.0.2.jar.
> > I've also exported one environment variable before running pig:
> >
> > export HADOOP_USER_CLASSPATH_FIRST=true
> >
>

1 2 3 >

1 - 100 of 216 matches

Mail list logo