Re: clarification regarding Tez DAGs

2016-11-28 Thread Hitesh Shah
Hello Robert, Some of the questions may be better answered on the Hive list but I will take a first crack of some of them. From a Tez perspective, lets use vertices and ignore Maps and Reducers for now. Hive uses this as a convention to indicate that a vertex is either reading data from HDFS

Re: Bad Log URL

2016-11-14 Thread Hitesh Shah
For the logs to a container in the NM, the NM’s http address is obtained from YARN APIs. Is this the only page in which the “:” is missing or is it missing in other rows’ links within the task attempts table? Can you confirm that the links to the NodeManagers work correctly from the ResourceMana

Re: Issue with the job progress in the UI

2016-11-09 Thread Hitesh Shah
Hello Premal, This is likely a combination of a lag in publishing the history events to YARN timeline which is consumed by the UI and also related to the UI relying more on YARN Timeline for data as compared to reading the information directly from the Tez AM. The Hive client is directly gettin

Re: Hive+Tez staging dir and scratch dir

2016-11-01 Thread Hitesh Shah
Hello Dharmesh, The tez staging dir is where scratch data is kept for the lifetime of the Tez session. i.e. data which can be deleted once the application completes. Staging data includes the following: - recovery logs used by the Tez AM for checkpointing state - Configs and/or dag plan pa

Re: Vertex Parallelism

2016-10-31 Thread Hitesh Shah
I suggest writing a custom InputFormat or modifying your existing InputFormat to generate more splits and at the same time, disable splits grouping for the vertex in question to ensure that you get the high level of parallelism that you want to achieve. The log snippet is just indicating that v

Re: Tez containers and input splits

2016-10-28 Thread Hitesh Shah
t; > thanks, > Madhu > > > On Thursday, October 27, 2016 11:19 PM, Hitesh Shah wrote: > > > Hello Madhusudan, > > I will start with how container allocations work and make my way back to > explaining splits. > > At the lowest level, each vertex wi

Re: Tez containers and input splits

2016-10-27 Thread Hitesh Shah
Hello Madhusudan, I will start with how container allocations work and make my way back to explaining splits. At the lowest level, each vertex will have decided to run a number of tasks. At a high level, when a task is ready to run, it tells the global DAG scheduler about its requirements ( i

Re: Tez Sessions

2016-10-20 Thread Hitesh Shah
, we were thinking this could save on the cost of AM, and > container initialization. > > We haven't looked into tez recovery as well. Durability is one of our big > concerns as well. > > > On Thursday, October 20, 2016 12:44 PM, Hitesh Shah wrote: > > &

Re: Tez Sessions

2016-10-20 Thread Hitesh Shah
Not supported as of now. There are multiple aspects to supporting this properly. One of the most important issues to address would be to do proper QoS across various DAGs i.e. what kind of policies would need to be built out to run multiple DAGs to completion within a limited amount of resources

Re: Container settings at vertex level

2016-10-20 Thread Hitesh Shah
Hello Madhu, If you are using Tez via Hive, then this would need a fix in Hive. I don’t believe Hive supports different settings for each vertex in a given query today. However, for native jobs, Tez already supports different specs for each vertex: Vertex::setTaskResource() ( configuring yarn r

Re: Tez UI

2016-10-19 Thread Hitesh Shah
jersey jars look to be in sync but the jackson ones look a step behind. > > what do you think? should i force the 1.9's into the ATS CLASSPATH? can't > hurt would be my guess. lemme try. > > Cheers, > Stephen. > > > On Mon, Oct 17, 2016 at 2:44 PM, Steph

Re: Tez UI

2016-10-17 Thread Hitesh Shah
oryLoggingService.java: >> 53) >> at org.apache.tez.dag.history. >> logging.ats.ATSHistoryLoggingService$1.run(ATSHistoryLoggingService. >> java:190) >> at java.lang.Thread.run(Thread. >> java:745) >> >> >> i'm running the hive cl

Re: Tez UI

2016-10-16 Thread Hitesh Shah
LISTEN > 31168/java > tcp0 0 172.19.103.136:8188 0.0.0.0:* LISTEN > 31168/java > > > might there be a debug log level i can set on impl.TimelineClientImpl to see > what is happening on the connection event? > >

Re: Tez UI

2016-10-16 Thread Hitesh Shah
Hello Stephen, yarn-site.xml needs to be updated wherever the Tez client is used. i.e if you are using Hive, then wherever you launch the Hive CLI and also where the HiveServer2 is installed ( HS2 will need a restart ). To see if the connection to timeline is/was an issue, please check the yar

Re: Origin of failed tasks

2016-10-12 Thread Hitesh Shah
If you have the logs for the application master, you can try the following: grep “[HISTORY]” | grep “TASK_ATTEMPT_FINISHED” This will give you info on any failed task attempts. The AM logs have history events being published to them. You can do grep “[HISTORY]” | grep “_” where entity type is

Re: adding local resource to classpath and/or java.library.path

2016-10-05 Thread Hitesh Shah
dependent on YARN implementation behavior. — Hitesh > On Oct 5, 2016, at 10:06 AM, Madhusudan Ramanna wrote: > > Seems like with this approach, there is no need to have information on > current dir. > > thanks, > Madhu > > > On Tuesday, October 4, 2016 4:44 PM, H

Re: Debugging M/R job with tez

2016-10-05 Thread Hitesh Shah
these jobs. Would it be possible to > get some support to set up my workstation to achieve this? > > Brgds > > Manuel > > On Wed, Sep 28, 2016 at 8:37 PM, Hitesh Shah wrote: > Thanks for the context, Manuel. > > Full compat with MR is something that has not really bee

Re: adding local resource to classpath and/or java.library.path

2016-10-04 Thread Hitesh Shah
The env is one approach for augmenting classpath. The other approach which modifies classpath for both the AM and the task containers is to use “tez.cluster.additional.classpath.prefix” by setting it to something like “./archive name/*” — Hitesh > On Oct 4, 2016, at 4:38 PM, Madhusudan Ramann

Re: Zip Exception since commit da4098b9

2016-09-28 Thread Hitesh Shah
gs\/container_1475091857089_0015_01_02\/apxqueue","completedLogsURL":"http:\/\/ip-10-1-3-71.us-west-2.compute.internal:19888\/jobhistory\/logs\/\/ip-10-1-2-173.us-west-2.compute.internal:8041\/container_1475091857089_0015_01_02\/v_pager_attempt_14

Re: Zip Exception since commit da4098b9

2016-09-28 Thread Hitesh Shah
}},{"timestamp":1475094062692,"eventtype":"DAG_STARTED","eventinfo":{}},{"timestamp":1475094062688,"eventtype":"DAG_INITIALIZED","eventinfo":{}},{"timestamp":1475094062055,"eventtype":"DAG_SUBMITT

Re: Debugging M/R job with tez

2016-09-28 Thread Hitesh Shah
ue we got is that we used to package our > applicative jars with nested dependencies in /lib and these are ignored by > Tez. We could easily work around this expanding these and adapting our > classpath. > > Regards > > On Wed, Sep 28, 2016 at 5:46 PM, Hitesh Shah wrote: > He

Re: Debugging M/R job with tez

2016-09-28 Thread Hitesh Shah
Hello Manuel, Thanks for reporting the issue. Let me try and reproduce this locally to see what is going on. A quick question in general though - are you hitting issues when running in non-local mode too? Would you mind sharing that details on the issues you hit? thanks — Hitesh > On Sep 2

Re: Zip Exception since commit da4098b9

2016-09-28 Thread Hitesh Shah
ne server is up and running. Tez UI is however not able to display DAG > and other details > > thanks, > Madhu > > > > On Saturday, September 24, 2016 12:01 PM, Hitesh Shah > wrote: > > > tez-dist tar balls are not published to maven today - only the module

Re: Zip Exception since commit da4098b9

2016-09-24 Thread Hitesh Shah
nks, > Madhu > > > On Friday, September 23, 2016 5:19 PM, Hitesh Shah wrote: > > > Hello Madhusudan, > > If you look at the MANIFEST.MF inside any of the tez jars, it will provide > the commit hash via the SCM-Revision field. > > The tez client and the DAGAp

Re: Zip Exception since commit da4098b9

2016-09-23 Thread Hitesh Shah
Hello Madhusudan, If you look at the MANIFEST.MF inside any of the tez jars, it will provide the commit hash via the SCM-Revision field. The tez client and the DAGAppMaster also log this info at runtime. — Hitesh > On Sep 23, 2016, at 4:08 PM, Madhusudan Ramanna wrote: > > Zhiyuan, > > We

Re: Zip Exception since commit da4098b9

2016-09-23 Thread Hitesh Shah
Hello Madhusudan Thanks for reporting the issue. Would you mind filing a bug at https://issues.apache.org/jira/browse/tez with the application logs and tez configs attached? If you have a simple dag/job example that reproduces the behavior that would be great too. thanks — Hitesh > On Sep 23

Re: Parallel queries to HS2/Tez

2016-08-29 Thread Hitesh Shah
ally? Interestingly this is not a problem with HDP deployment which > obviously has a 'fuller' setup. Local mode really helps to test. > > Thank you, > Uday > From: Hitesh Shah > Sent: 25 August 2016 20:06:30 > To: user@tez.apache.org > Subject: Re: Parallel

Re: Node unable to start vertex

2016-08-25 Thread Hitesh Shah
Created https://cwiki.apache.org/confluence/display/TEZ/FAQ which might be a better fit for such content and other related questions down the line. > On Aug 25, 2016, at 1:16 PM, Hitesh Shah wrote: > > +1. Would you like to contribute the content? You should be able to add an > a

Re: Node unable to start vertex

2016-08-25 Thread Hitesh Shah
:59 PM, Madhusudan Ramanna wrote: > > Thanks, #2 worked ! > > Might be a good idea to add to confluence ? > > Madhu > > > On Thursday, August 25, 2016 12:00 PM, Hitesh Shah wrote: > > > Hello Madhu, > > There are 2 approaches for this: > >

Re: Parallel queries to HS2/Tez

2016-08-25 Thread Hitesh Shah
Hello Uday, I don’t believe anyone has tried running 2 dags in parallel in local mode within the same TezClient ( and definitely not for HiveServer2 ). If this is with 2 instances of Tez client, this could likely be a bug in terms of either how Hive is setting up the TezClient for local mode wi

Re: Node unable to start vertex

2016-08-25 Thread Hitesh Shah
Hello Madhu, There are 2 approaches for this: 1) Programmatically, for user code running in tasks, you would need to use either DAG::addTaskLocalFiles() or Vertex::addTaskLocalFiles() - former if the same jars are needed in all tasks of the DAG. TezClient::addAppMasterLocalFiles only impacts

Re: Extra JAR files in the minimal distribution package?

2016-08-16 Thread Hitesh Shah
Hello Nathaniel, You are probably right that they should not be as long as the cluster classpath used contains the MR jars. I believe these jars were retained as a result of using yarn.application.classpath for augmenting the runtime classpath when using the classpath from the cluster instead

Re: Questions about Tez

2016-08-12 Thread Hitesh Shah
When comparing just a simple MR job to a Tez dag with 2 vertices, the perf improvements are limited (as the plan is pretty much the same and data is transferred via a shuffle edge): - container re-use - pipelined sorter vs the MR sorter ( your mileage may vary here depending on the kind of

Re: Some resource about tez architecture and design document

2016-08-10 Thread Hitesh Shah
ion ? > 在 2016-08-10 01:02:31,"Hitesh Shah" 写道: >> The following 2 links should help you get started. Might be best to start >> with the sigmod paper and one of the earlier videos. >> >> https://cwiki.apache.org/confluence/display/TEZ/How+to+Contribu

Re: Some resource about tez architecture and design document

2016-08-09 Thread Hitesh Shah
The following 2 links should help you get started. Might be best to start with the sigmod paper and one of the earlier videos. https://cwiki.apache.org/confluence/display/TEZ/How+to+Contribute+to+Tez https://cwiki.apache.org/confluence/display/TEZ/Presentations%2C+publications%2C+and+articles+ab

Re: Word Count examples run failed with Tez 0.8.4

2016-08-04 Thread Hitesh Shah
Hello I am assuming that this is the same issue as the one reported in TEZ-3396? Based on the logs in the jira: 2016-08-03 10:55:33,856 [INFO] [Thread-2] |app.DAGAppMaster|: DAGAppMasterShutdownHook invoked 2016-08-03 10:55:33,856 [INFO] [Thread-2] |app.DAGAppMaster|: DAGAppMaster received a

Re: hung AM due to timeline timeout

2016-08-03 Thread Hitesh Shah
TSService, eventQueueBacklog=17553 > I'll look into lowering tez.yarn.ats.event.flush.timeout.millis while trying > to look into the timelineserver. > > Thanks for your help, > Slava > > On Wed, Aug 3, 2016 at 2:45 PM, Hitesh Shah wrote: > Hello Slava, >

Re: hung AM due to timeline timeout

2016-08-03 Thread Hitesh Shah
Hello Slava, Can you check for a log line along the lines of "Stopping ATSService, eventQueueBacklog=“ to see how backed up is the event queue to YARN timeline? I have noticed this in quite a few installs with YARN Timeline where YARN Timeline is using the simple Level DB impl and not the Rol

Re: Guide to write map-reduce code using Tez API

2016-08-01 Thread Hitesh Shah
Please check Step 7 on http://tez.apache.org/install.html thanks — Hitesh > On Aug 1, 2016, at 10:25 AM, zhiyuan yang wrote: > > The nice thing of Tez is it’s compatible with MapReduce API. So if you just > want to run MapReduce on Tez, you just learn how to write standard MapReduce > and cha

Re: Tez error

2016-07-30 Thread Hitesh Shah
r.com/display/MapR41/Installing+and+Configuring+Tez+0.5.3 .But > hive job gave some NumberFormatError and found out by googling that there is > version mismatch between tez and hadoop libs. > > On Sat, Jul 30, 2016 at 10:22 PM, Sandeep Khurana > wrote: > Hitesh > > Both o

Re: Tez error

2016-07-30 Thread Hitesh Shah
Hello Sandeep, 2 things to check: - When compiling Tez, is the hadoop.version in the top-level pom ( and addition of mapr’s maven repo ) being used to compile against MapR’s hadoop distribution and not the std. apache release? The Tez AM cannot seem to do a handshake with the YARN RM. If Ma

Re: Could Tez 0.5.4 Integrate with Hive 2.X

2016-07-29 Thread Hitesh Shah
That is highly unlikely to work as Hive-2.x requires APIs introduced in Tez 0.8.x. thanks — Hitesh > On Jul 28, 2016, at 8:56 PM, darion.yaphet wrote: > > Hi team : > > We are using hadoop 2.5.0 and hive 1.2.1 tez 0.5.4 . Now we want to upgrade > to hive 2.X . Could Tez 0.5.4 support Hive

Re: Getting ClosedByInterruptException when DAG w/ edge executes

2016-07-20 Thread Hitesh Shah
Either emails to the dev list or specific JIRAs on any usability issues that you come across - be it missing/unclear docs, APIs that could require cleaning up, bugs or potential helper libraries to make things easier. Pretty much any feedback ( and/or patches ) are welcome :) thanks — Hitesh

Re: Getting ClosedByInterruptException when DAG w/ edge executes

2016-07-01 Thread Hitesh Shah
Thanks for the update, Scott. Given that the APIs have mostly been used by other framework developers, there is probably quite a few things which may not be easily surfaced in javadocs, usage examples ( and their lack of ), etc. It would be great if you can provide feedback ( and patches ) to

Re: Tez Job Counters

2016-06-28 Thread Hitesh Shah
Hello Muhammad, Did you try any of the calls to YARN timeline as described by Rajesh in his earlier reply? thanks — Hitesh > On Jun 28, 2016, at 1:20 PM, Muhammad Haris > wrote: > > Hi, > Could anybody please guide me how to get all task level counters? Thanks a lot > > > > Regards > >

Re: Tez Job fails - waiting for AM container to be allocated

2016-06-18 Thread Hitesh Shah
ontainers=2 clusterResource= type=OFF_SWITCH > 2016-06-17 19:04:50,407 INFO security.NMTokenSecretManagerInRM > (NMTokenSecretManagerInRM.java:createAndGetNMToken(200)) - Sending NMToken > for nodeId : usw2stdpwo12.glassdoor.local:45454 for container > > On Fri, Jun 17, 2016 at

Re: Tez Job fails - waiting for AM container to be allocated

2016-06-17 Thread Hitesh Shah
-dev@tez for now. Hello Anandha, The usual issue with this is a lack of resources. e.g. no cluster capacity to launch the AM, queue configs not allowing another AM to launch, the memory size configured for the AM is too large such that it cannot be scheduled on any existing node, etc. Can y

Re: Tez 0.8.3 on EMR hanging with Hive task

2016-06-15 Thread Hitesh Shah
yarn/apps/hadoop/logs/application_1465996511770_0001 does not > exist. > Log aggregation has not completed or is not enabled. > > I think we are missing some configuration that would help us get more insight? > > Thanks! > > Joze. > > 2016-06-15 12:03 GMT-03:00 H

Re: Tez 0.8.3 on EMR hanging with Hive task

2016-06-15 Thread Hitesh Shah
Hello Joze, Would it be possible for you to provide the YARN application logs obtained via “bin/yarn logs -applicationId ” for both of the cases you have seen? Feel free to file JIRAs and attach logs to each of them. thanks — Hitesh > On Jun 15, 2016, at 7:38 AM, Jose Rozanec > wrote: > >

Re: My first TEZ job fails

2016-05-20 Thread Hitesh Shah
1 > > > tez.lib.uris > /usr/lib/apache-tez-0.7.1-bin/share > > > hduser@rhes564: /home/hduser/hadoop-2.6.0/etc/Hadoop> > > > > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPC

Re: My first TEZ job fails

2016-05-20 Thread Hitesh Shah
k 1.3.1 engine and I compiled it spark > from source code. so hopefully I can use TEZ as Spark engine as well. > > > thanks > > > > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > >

Re: My first TEZ job fails

2016-05-20 Thread Hitesh Shah
are using your local build ). thanks — Hitesh > On May 20, 2016, at 4:39 PM, Mich Talebzadeh > wrote: > > This is the instruction? > > Created by Hitesh Shah, last modified on May 02, 2016 Go to start of metadata > Making use of the Tez Binary Release tarball >

Re: My first TEZ job fails

2016-05-20 Thread Hitesh Shah
apache-tez-0.7.1-bin/lib > > > -Original Message- > From: Hitesh Shah [mailto:hit...@apache.org] > Sent: Friday, May 20, 2016 4:18 PM > To: user@tez.apache.org > Subject: Re: My first TEZ job fails > > Can you try the instructions mentioned at > https://cwiki.ap

Re: My first TEZ job fails

2016-05-20 Thread Hitesh Shah
$Handler.run(Server.java:2033) > > at org.apache.hadoop.ipc.Client.call(Client.java:1468) > > at org.apache.hadoop.ipc.Client.call(Client.java:1399) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > > at com.sun.proxy.$Prox

Re: My first TEZ job fails

2016-05-20 Thread Hitesh Shah
Logs from `bin/yarn logs -applicationId application_1463758195355_0002` would be more useful to debug your setup issue. The RM logs usually do not shed much light on why an application failed. Can you confirm that you configured tez.lib.uris correctly to point to the tez tarball on HDFS (tez tar

Re: data discrepancies related to parallelism

2016-05-05 Thread Hitesh Shah
> > > On 5/5/16, 11:00 AM, "Hitesh Shah" wrote: > >> What version are you running with? >> >> thanks >> — Hitesh

Re: data discrepancies related to parallelism

2016-05-05 Thread Hitesh Shah
What version are you running with? thanks — Hitesh > On May 5, 2016, at 10:31 AM, Kurt Muehlner wrote: > > Hello, > > We have a Pig/Tez application which is exhibiting a strange problem. This > application was recently migrated from Pig/MR to Pig/Tez. We carefully > vetted during QA that

Re: Varying vcores/ram for hive queries running Tez engine

2016-05-04 Thread Hitesh Shah
Bikas’ comment ( and mine below ) is relevant only for task specific settings. Hive does not override any settings for the Tez AM so the tez configs for the AM memory/vcores will reflect at runtime. I believe Hive has a proxy config - hive.tez.cpu.vcores - for (3) which may be why your setting

Re: Description of tez counters

2016-04-11 Thread Hitesh Shah
Take a look at TaskCounter and DAGCounter under https://git-wip-us.apache.org/repos/asf?p=tez.git;a=tree;f=tez-api/src/main/java/org/apache/tez/common/counters;h=df3784e54d1fa6075dcbbca8d1405e309bce1460;hb=HEAD and let us know if that is insufficient. thanks — Hitesh On Apr 11, 2016, at 4:42

Re: Unable To Build Apache Tez 0.7.1 on Hadoop 2.7

2016-04-06 Thread Hitesh Shah
URE [8.107s] > [INFO] Tez ... SUCCESS [0.063s] > > For the npm error I see a open JIRA : > https://issues.apache.org/jira/browse/BIGTOP-1826 > > Do you have any suggestion? > > Thanks. > > > On Wed, Apr 6, 2016 at 4

Re: Unable To Build Apache Tez 0.7.1 on Hadoop 2.7

2016-04-06 Thread Hitesh Shah
0.7.0 > version). > > Please help. > > > Thanks, > Joel > > On Wed, Apr 6, 2016 at 3:01 PM Hitesh Shah wrote: > Hello Sam, > > Couple of things to confirm: > - I assume you are building branch-0.7 of Tez for 0.7.1-SNAPSHOT as there > has not yet

Re: Unable To Build Apache Tez 0.7.1 on Hadoop 2.7

2016-04-06 Thread Hitesh Shah
Hello Sam, Couple of things to confirm: - I assume you are building branch-0.7 of Tez for 0.7.1-SNAPSHOT as there has not yet been a release of 0.7.1? - For hadoop, are you using hadoop-2.7.0 or hadoop-2.7.1 ( though this really should not be too relevant here )? I took branch-0.7 of

Re: Apply patches to Apache Tez

2016-04-06 Thread Hitesh Shah
do I apply that .patch file to my existing setup of jars? > > Appreciate your help and time. > > Thanks, > Joel > > On Wed, Apr 6, 2016 at 2:28 PM, Hitesh Shah wrote: > Every component has a different approach to how it is deployed/upgraded. > > I can cover ho

Re: Apply patches to Apache Tez

2016-04-06 Thread Hitesh Shah
Every component has a different approach to how it is deployed/upgraded. I can cover how you can go about patching Tez on an existing production system. The steps should be similar to that described in INSTALL.md in the source tree with a few minor gotchas to be aware of: - Deploying Tez ha

Re: Tez UI in Pig

2016-04-05 Thread Hitesh Shah
Hi Kurt, The Tez UI as documented should work with any version beyond 0.5.2 if the history logging is configured to use YARN timeline. As for scopes, some bits of the vertex description are currently not displayed in the UI though I am not sure if Pig has integrated with that API yet. Dependin

Re: pig on tez hang with connection reset

2016-03-23 Thread Hitesh Shah
00ms before retrying getTask again. Got null now. Next getTask sleep > message after 3ms > . . . etc. > > > Is there anything else I can provide for now? > > Thanks, > Kurt > > > > On 3/23/16, 11:43 AM, "Hitesh Shah" wrote: > >> Hel

Re: pig on tez hang with connection reset

2016-03-23 Thread Hitesh Shah
Hello Kurt, Can you file a jira with a stack dump for the ApplicationMaster process when it is in this hung state and also include all the application master logs. Also please mention what version of Pig and Tez you are running. The main question would be whether the AM is really hung or does

Re: tez and beeline and hs2

2016-02-25 Thread Hitesh Shah
t this. > > > > Thanks > > Bikas > > > > From: Stephen Sprague [mailto:sprag...@gmail.com] > Sent: Monday, February 22, 2016 6:59 AM > To: user@tez.apache.org > Subject: Re: tez and beeline and hs2 > > > > just an update. i haven't

Re: tez and beeline and hs2

2016-02-19 Thread Hitesh Shah
Not exactly. I think the UI bits might be a red-herring. Bouncing YARN and HS2 also should unlikely be needed unless you are modifying configs. There is likely a bug ( the NPE being logged ) in the shutting down code for the org.apache.tez.dag.history.logging.impl.SimpleHistoryLoggingService (

Re: tez and beeline

2016-02-17 Thread Hitesh Shah
Hello Stephen, This question should ideally be posted to user@hive as this mainly relates to HS2 functionality and not really Tez. That said, a couple of things to look at/try out: 1) Unrelated point - "set mapreduce.framework.name=yarn-tez;” - this is not needed. What this setting does is

Re: jansi dependendency?

2016-02-15 Thread Hitesh Shah
One option may be to try using HADOOP_USER_CLASSPATH_FIRST with it set to true and adding the hive-exec.jar to the front of HADOOP_CLASSPATH. Using this ( and verifying by running “hadoop classpath”), you could try to get hive-exec.jar to the front of the classpath and see if that makes a differ

Re: Failing attemption at org.apache.tez.client.TezClient.waitTillReady

2016-02-12 Thread Hitesh Shah
-rw-rw-r-- 1 105112 2014-01-30 07:08 servlet-api-2.5.jar > -rw-rw-r-- 1 1251514 2014-01-30 07:08 snappy-java-1.0.5.jar > -rw-r--r-- 1 162976273 2015-09-10 20:16 > spark-assembly-1.4.1-hadoop2.6.0.jar > -rw-rw-r-- 1 26514 2014-01-30 07:08 stax-api-1.0.1.jar > -rw-rw

Re: Failing attemption at org.apache.tez.client.TezClient.waitTillReady

2016-02-12 Thread Hitesh Shah
hadoop-common? thanks — Hitesh On Feb 12, 2016, at 12:16 AM, no jihun wrote: > Thanks Hitesh Shah. > > It claims > > 2016-02-12 14:59:07,388 [ERROR] [main] |app.DAGAppMaster|: Error starting > DAGAppMaster > > jav

Re: Failing attemption at org.apache.tez.client.TezClient.waitTillReady

2016-02-11 Thread Hitesh Shah
Run the following command: “bin/yarn logs -applicationId application_1452243782005_0292” . This should give you the logs for container_1452243782005_0292_02_01 which may shed more light on why the Tez ApplicationMaster is failing to launch when triggered via Hive. thanks — Hitesh On Feb

Re: Question regarding instability of EdgeProperty DataSourceType

2016-01-31 Thread Hitesh Shah
There are 3 types defined as you have noticed: persisted_reliable: assumes a vertex output is stored in a reliable store like HDFS. This states that if the node on which the task ran disappears, the output is still available. persisted: vertex output stored on local disk where the task ran. ep

Re: Classpath Composition

2016-01-28 Thread Hitesh Shah
Assuming you have the guava jar available on all nodes, you can set “tez.cluster.additional.classpath.prefix” to point to it and this classpath value will be prepended to the classpath of the tez runtime layers. However, please note that this is not a guarantee to work if the guava jar from your

Re: What's the application scenario of Apache TEZ

2016-01-20 Thread Hitesh Shah
Couple of other points to add to Bikas’s email: Regarding your question on small data: No - Tez is geared to work in both small data and extremely large data cases. Hive should likely perform better with Tez regardless of data size unless there is a bad query plan created that is non-optimal f

Re: when split data in AM lead to "path not exsit or is not a directory" Execption

2016-01-15 Thread Hitesh Shah
e",why? > (2)As mentioned earlier,I also packaged this "conf/hbasetable" to conf.jar, > and it was downloaded to the AM container path, why it can not be parsed or > decompressed ? > > Is there any configuration options can do this? > > best wishes &

Re: when split data in AM lead to "path not exsit or is not a directory" Execption

2016-01-13 Thread Hitesh Shah
Hello You are right that when hive.compute.splits.in.am is true, the splits are computed in the cluster in the Tez AM container. Now, there are a bunch of options to consider but the general gist is that if you are familiar with MapReduce Distributed Cache or YARN local resources, you need t

Re: Cross platform job submission

2016-01-12 Thread Hitesh Shah
This is probably something that was missed for Tez. Would you mind filing a bug for this? The fix is to always use {{VAR}} ( instead of say $VAR ) which is then automatically changed by YARN to $VAR or %VAR% based on the env where the container is being launched. — Hitesh On Jan 12, 2016, at

Re: Writing intermediate data

2015-12-08 Thread Hitesh Shah
from. This information handshake is part of the Input/Output pair > implementation." > > If the edges had type PERSISTED_RELIABLE, the information handshake is > probably not needed. Is that right ? > > - Raajay > > On Tue, Dec 8, 2015 at 6:17 PM, Hitesh Shah wrote: &

Re: Writing intermediate data

2015-12-08 Thread Hitesh Shah
The other way to look at this problem is that for a given edge between 2 vertices, the data format and transfer mechanism is governed by the Output of the upstream vertex and the Input of the downstream vertex. You can potentially write your own Input and Output pair that work with HDFS or tachy

Re: Issue: Hive with Tez on CDH-5.4.2

2015-12-04 Thread Hitesh Shah
I don’t believe I have seen this error reported before. The error mainly seems to be coming from somewhere in the Hive codebase so the hive mailing list might provide a more relevant answer. If you don’t get one, would you mind setting “tez.am.log.level” to DEBUG in your tez-site.xml, re-run the

Re: Running Tez with Tachyon

2015-11-16 Thread Hitesh Shah
g.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.(FileOutputCommitter.java:80) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.getOutputCommitter(FileOutputFormat.java:309) > at > org.apache.tez.mapreduce.committer.MROutputCommitter.getOutputCommitter

Re: Running Tez with Tachyon

2015-11-12 Thread Hitesh Shah
The general approach for add-on jars requires 2 steps: 1) On the client host, where the job is submitted, you need to ensure that the add-on jars are in the local classpath. This is usually done by adding them to HADOOP_CLASSPATH. Please do pay attention to adding the jars via "/*” instead of j

Re: Constant Full GC making Tez Hive job take almost forever

2015-10-23 Thread Hitesh Shah
Hello Juho As you are probably aware, each hive query will largely have different memory requirements depending on what kind of plan it ends up executing. For the most part, a common container size and general settings work well for most queries. In this case, this might need additional tuning

Re: Tez Code 1 & Tez Code 2

2015-10-22 Thread Hitesh Shah
Hello Dale, I think I can guess what is happening. Hue keeps connections open between itself and the HiveServer2. Now what happens is that the Tez session times itself out if queries are not submitted to it within a certain time window ( to stop wasting resources on a YARN cluster ). This ca

Re: Tez View in Ambari

2015-10-15 Thread Hitesh Shah
In previous releases of the Tez view, it did not have the integration in place for the Hive query info. There is one approach that you can try: Ambari has something called standalone mode. You can deploy a new ambari-server version 2.1 ( on a different host ) and just instantiate a Tez view o

Re: Getting dot files for DAGs

2015-10-01 Thread Hitesh Shah
I don’t believe the binary should need changing at all unless you need enhancements from recent commits. It should just be setting up the UI and configuring Tez for using YARN Timeline. The instructions that you can follow: http://tez.apache.org/tez-ui.html http://tez.apache.org/tez_yarn_ti

Re: Getting dot files for DAGs

2015-10-01 Thread Hitesh Shah
wrote: > Maybe it would be a good idea to send the dot file to the ATS along with the > other information you are sending. I too wanted to look at a dot file the > other day and had problem finding it back. > > - André > > On Thu, Oct 1, 2015 at 4:00 AM, Hitesh Shah wrot

Re: Getting dot files for DAGs

2015-09-30 Thread Hitesh Shah
The .dot file is generated into the Tez Application Master’s container log dir. Firstly, you need to figure out the yarn application in which the query/Tez DAG ran. Once you have the applicationId, you can use one of these 2 approaches: 1) Go to the YARN ResourceManager UI, find the application

Re: Getting DAG Id from Hive on tez

2015-09-15 Thread Hitesh Shah
This is a question that is probably meant for the Hive mailing list. I believe either the Hive query Id or the information from the query plan ( as used in https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/hooks/ATSHook.java ) should probably be able to give you th

Re: Getting DAG Id from Hive on tez

2015-09-14 Thread Hitesh Shah
The Hive query id maps to the Tez dag name. You can try the following call against timeline: /ws/v1/timeline/TEZ_DAG_ID?primaryFilter=dagName:{tezDagName} thanks — Hitesh On Sep 14, 2015, at 10:45 PM, Dharmesh Kakadia wrote: > Hi, > > I am running Hive on Tez, with timeline server. We have

Re: Over writing files

2015-09-11 Thread Hitesh Shah
This is probably a question for the Hive dev mailing list on how the staging/output directory name is determined. i.e. ".hive-staging_hive_2015-09-11_00-07-40_043_6365145769624003668-1”. You may need to change this value in the config being used to configure the output of the vertex that is doi

Re: Missing libraries.

2015-09-11 Thread Hitesh Shah
“tez.aux.uris” supports a comma separated list of files and directories on HDFS or any distributed filesystem ( no tar balls, archives, etc and no support for file:// ). When Hive submits a query to Tez, it adds its hive-exec.jar to the tez runtime ( similar to MR distributed cache ). If you are

Re: Creating TaskLocationHints

2015-09-11 Thread Hitesh Shah
ded? How are Vertex > Location Hints handled ? What if YARN is not able to provide containers in > requested locations ? > > Raajay > > On Thu, Sep 10, 2015 at 10:19 AM, Hitesh Shah wrote: > In almost all cases, this is usually hostnames. The general flow is find the > b

Re: how to allocate more containers?

2015-09-11 Thread Hitesh Shah
Most of the parallelism ( no. of containers ) is controlled by the upper layer application. There is some minor control that Tez does when it groups splits together but for the most part, the upper layer decides how many containers to run. You should look at Hive configs to see how to control

Re: Creating TaskLocationHints

2015-09-10 Thread Hitesh Shah
In almost all cases, this is usually hostnames. The general flow is find the block locations for the data source, extract the hostname from there and provide it to YARN so that it can provide a container on the same host as the datanode having the data. As long as YARN is using hostnames, the co

Re: Error of setting vertex location hints

2015-09-10 Thread Hitesh Shah
There are 2 aspects to using Vertex Location Hints and parallelism. All of this depends on how you define the work that needs to be done by a particular task. I will take the MR approach and compare it to the more dynamic approach that Jeff has been explaining. For MR, all the work was decide

Re: Pig(0.14.0) on Tez(0.7.0)

2015-09-02 Thread Hitesh Shah
st for curiosity. Can i use tez-0.7.0 with latest PIG-0.14.0? Is it tested > earlier? > > Regards, > Sandeep > > On Wed, Sep 2, 2015 at 8:44 PM, Hitesh Shah wrote: > Based on the stack trace, the following issue seems to be the cause:

  1   2   3   >