Thanks! Hopefully I'm getting the correct logs here:
It seems the same application manager keeps on taking the requests. They both get the same application ID: application_1403285786962_0002 <http://127.0.0.1:8088/cluster/app/application_1403285786962_0002> dag_1403285786962_0004_1.dot : Total file length is 2179 bytes. <http://localhost:8042/node/containerlogs/container_1403285786962_0004_01_000001/root/dag_1403285786962_0004_1.dot/?start=-4096> dag_1403285786962_0004_2.dot : Total file length is 2179 bytes. <http://localhost:8042/node/containerlogs/container_1403285786962_0004_01_000001/root/dag_1403285786962_0004_2.dot/?start=-4096> dag_1403285786962_0004_3.dot : Total file length is 2179 bytes. <http://localhost:8042/node/containerlogs/container_1403285786962_0004_01_000001/root/dag_1403285786962_0004_3.dot/?start=-4096> dag_1403285786962_0004_4.dot : Total file length is 2179 bytes. <http://localhost:8042/node/containerlogs/container_1403285786962_0004_01_000001/root/dag_1403285786962_0004_4.dot/?start=-4096> stderr : Total file length is 0 bytes. <http://localhost:8042/node/containerlogs/container_1403285786962_0004_01_000001/root/stderr/?start=-4096> stderr_dag_1403285786962_0004_1 : Total file length is 0 bytes. <http://localhost:8042/node/containerlogs/container_1403285786962_0004_01_000001/root/stderr_dag_1403285786962_0004_1/?start=-4096> stderr_dag_1403285786962_0004_1_post : Total file length is 0 bytes. <http://localhost:8042/node/containerlogs/container_1403285786962_0004_01_000001/root/stderr_dag_1403285786962_0004_1_post/?start=-4096> stderr_dag_1403285786962_0004_2 : Total file length is 0 bytes. <http://localhost:8042/node/containerlogs/container_1403285786962_0004_01_000001/root/stderr_dag_1403285786962_0004_2/?start=-4096> stderr_dag_1403285786962_0004_2_post : Total file length is 0 bytes. <http://localhost:8042/node/containerlogs/container_1403285786962_0004_01_000001/root/stderr_dag_1403285786962_0004_2_post/?start=-4096> stderr_dag_1403285786962_0004_3 : Total file length is 0 bytes. <http://localhost:8042/node/containerlogs/container_1403285786962_0004_01_000001/root/stderr_dag_1403285786962_0004_3/?start=-4096> stderr_dag_1403285786962_0004_3_post : Total file length is 0 bytes. <http://localhost:8042/node/containerlogs/container_1403285786962_0004_01_000001/root/stderr_dag_1403285786962_0004_3_post/?start=-4096> stderr_dag_1403285786962_0004_4 : Total file length is 0 bytes. <http://localhost:8042/node/containerlogs/container_1403285786962_0004_01_000001/root/stderr_dag_1403285786962_0004_4/?start=-4096> stderr_dag_1403285786962_0004_4_post : Total file length is 0 bytes. <http://localhost:8042/node/containerlogs/container_1403285786962_0004_01_000001/root/stderr_dag_1403285786962_0004_4_post/?start=-4096> stdout : Total file length is 0 bytes. <http://localhost:8042/node/containerlogs/container_1403285786962_0004_01_000001/root/stdout/?start=-4096> stdout_dag_1403285786962_0004_1 : Total file length is 0 bytes. <http://localhost:8042/node/containerlogs/container_1403285786962_0004_01_000001/root/stdout_dag_1403285786962_0004_1/?start=-4096> stdout_dag_1403285786962_0004_1_post : Total file length is 0 bytes. <http://localhost:8042/node/containerlogs/container_1403285786962_0004_01_000001/root/stdout_dag_1403285786962_0004_1_post/?start=-4096> stdout_dag_1403285786962_0004_2 : Total file length is 0 bytes. <http://localhost:8042/node/containerlogs/container_1403285786962_0004_01_000001/root/stdout_dag_1403285786962_0004_2/?start=-4096> stdout_dag_1403285786962_0004_2_post : Total file length is 0 bytes. <http://localhost:8042/node/containerlogs/container_1403285786962_0004_01_000001/root/stdout_dag_1403285786962_0004_2_post/?start=-4096> stdout_dag_1403285786962_0004_3 : Total file length is 0 bytes. <http://localhost:8042/node/containerlogs/container_1403285786962_0004_01_000001/root/stdout_dag_1403285786962_0004_3/?start=-4096> stdout_dag_1403285786962_0004_3_post : Total file length is 0 bytes. <http://localhost:8042/node/containerlogs/container_1403285786962_0004_01_000001/root/stdout_dag_1403285786962_0004_3_post/?start=-4096> stdout_dag_1403285786962_0004_4 : Total file length is 0 bytes. <http://localhost:8042/node/containerlogs/container_1403285786962_0004_01_000001/root/stdout_dag_1403285786962_0004_4/?start=-4096> stdout_dag_1403285786962_0004_4_post : Total file length is 0 bytes. <http://localhost:8042/node/containerlogs/container_1403285786962_0004_01_000001/root/stdout_dag_1403285786962_0004_4_post/?start=-4096> syslog : Total file length is 7577 bytes. <http://localhost:8042/node/containerlogs/container_1403285786962_0004_01_000001/root/syslog/?start=-4096> syslog_dag_1403285786962_0004_1 : Total file length is 57034 bytes. <http://localhost:8042/node/containerlogs/container_1403285786962_0004_01_000001/root/syslog_dag_1403285786962_0004_1/?start=-4096> syslog_dag_1403285786962_0004_1_post : Total file length is 4775 bytes. <http://localhost:8042/node/containerlogs/container_1403285786962_0004_01_000001/root/syslog_dag_1403285786962_0004_1_post/?start=-4096> syslog_dag_1403285786962_0004_2 : Total file length is 56104 bytes. <http://localhost:8042/node/containerlogs/container_1403285786962_0004_01_000001/root/syslog_dag_1403285786962_0004_2/?start=-4096> syslog_dag_1403285786962_0004_2_post : Total file length is 707 bytes. <http://localhost:8042/node/containerlogs/container_1403285786962_0004_01_000001/root/syslog_dag_1403285786962_0004_2_post/?start=-4096> syslog_dag_1403285786962_0004_3 : Total file length is 53187 bytes. <http://localhost:8042/node/containerlogs/container_1403285786962_0004_01_000001/root/syslog_dag_1403285786962_0004_3/?start=-4096> syslog_dag_1403285786962_0004_3_post : Total file length is 5003 bytes. <http://localhost:8042/node/containerlogs/container_1403285786962_0004_01_000001/root/syslog_dag_1403285786962_0004_3_post/?start=-4096> syslog_dag_1403285786962_0004_4 : Total file length is 56111 bytes. <http://localhost:8042/node/containerlogs/container_1403285786962_0004_01_000001/root/syslog_dag_1403285786962_0004_4/?start=-4096> syslog_dag_1403285786962_0004_4_post : Total file length is 4204 bytes. <http://localhost:8042/node/containerlogs/container_1403285786962_0004_01_000001/root/syslog_dag_1403285786962_0004_4_post/?start=-4096> fast run Map 1 <http://127.0.0.1:8080/#>1734 Bytes438 Bytes639 msMap 2 <http://127.0.0.1:8080/#>1245 KB478 Bytes1.34 secsReducer 3 <http://127.0.0.1:8080/#>1446 Bytes557 Bytes3.63 secs slow run Map 1 <http://127.0.0.1:8080/#>1734 Bytes438 Bytes12.62 secsMap 2 <http://127.0.0.1:8080/#>1245 KB478 Bytes14.37 secsReducer 3 <http://127.0.0.1:8080/#>1446 Bytes557 Bytes15.67 secs On Fri, Jun 20, 2014 at 10:31 AM, Hitesh Shah <[email protected]> wrote: > Hello Lars, > > Just to be very clear - there is no caching of results/data across queries > except for some minimal meta-data caching for ORC. If you can send across > the logs generated by “yarn logs -applicationId <appId>”, we can try and > help you get a better understanding of where the speed difference is > stemming from. > > — HItesh > > On Jun 20, 2014, at 10:13 AM, Bikas Saha <[email protected]> wrote: > > > Hi, > > > > Thanks for your interest in trying out Hive on Tez. There are multiple > reasons for the observations you see below. > > 1) Containers are warmed up the longer they get used. So if you > repeatedly run queries then the JVM has all classes loaded and ready and > may have JIT-ed the frequently run code path. As it learns more about your > execution pattern, the JIT can do a better job. This will help you across > different queries. > > 2) As you frequently access the same data from the OS it will > increase the chances of your finding that data in the OS buffer cache. So > you get the benefits of in-memory data JThis will help repeated runs of > queries on the same data. > > 3) Hive is smart about explicitly caching de-serialized (Java > objects) within query in order to reduce re-computation of work that has > already been done. This will help within a query. > > 4) If you are using the ORC file then Hive will try to cache ORC > file metadata like locations/sizes etc. and this helps different queries > that access the same data. > > 5) If your Tez query session has been idle for some time, then the > system starts pro-actively releasing resources back to the cluster so that > they may be used by other applications (good for multi-tenancy). So if you > fire a query after some delay then a slowdown will be observed in case we > need to reclaim some of the released resources. This delay is configurable. > > > > Hope this helps and you have a positive experience experimenting with > Hive on Tez. > > Please let us know how we can help! > > Bikas > > > > From: Lars Selsaas [mailto:[email protected]] > > Sent: Friday, June 20, 2014 8:50 AM > > To: user > > Subject: Tez performance on Hive > > > > Hi, > > > > So when you set Tez as the execution engine for Hive it takes about half > the time to finish a query the second time you run it going from say 24 > seconds to 12 seconds. but if I keep re running it it gets down to about 2 > seconds on that same query. The speed goes up to 12 seconds if I wait to > long before the next rerun or if I do large enough adjustments to the query. > > > > > > So I'm working on a blogpost about Tez and need to find out why this is > happening. The first reduced speed seem to mainly just be because of hot > containers that store the information about where to find your data. While > the seconds reduce down to about 2 sec seems to be some in memory storage > of the data. Does it store the results in memory and keep it ready for next > time or? > > > > > > > > -- > > <~WRD018.jpg> > > Lars Selsaas > > Data Engineer > > Think Big Analytics > > [email protected] > > 650-537-5321 > > > > > > CONFIDENTIALITY NOTICE > > NOTICE: This message is intended for the use of the individual or entity > to which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. > > -- Lars Selsaas Data Engineer Think Big Analytics <http://thinkbiganalytics.com> [email protected] 650-537-5321
