They were in the YARN logs instead. No exception, just a WARN level message, which made it a bit more difficult to find.
On Mon, Nov 21, 2016 at 5:27 PM, Ashwin Chandra Putta < [email protected]> wrote: > Cool. I did not see these errors in the AM logs, were these in the node > manager logs? > > Regards, > Ashwin. > > On Mon, Nov 21, 2016 at 4:20 AM, Max Bridgewater < > [email protected]> wrote: > >> The issue turned out to be memory allocation. Here is the relevant YARN >> error message: >> >> 2016-11-21 11:44:30,020 INFO org.apache.hadoop.yarn.server. >> nodemanager.containermanager.container.ContainerImpl: Container >> container_1479728463466_0001_02_000001 transitioned from LOCALIZED to >> RUNNING >> 2016-11-21 11:44:31,858 INFO org.apache.hadoop.yarn.server. >> nodemanager.containermanager.monitor.ContainersMonitorImpl: Starting >> resource-monitoring for container_1479728463466_0001_02_000001 >> 2016-11-21 11:44:31,858 INFO org.apache.hadoop.yarn.server. >> nodemanager.containermanager.monitor.ContainersMonitorImpl: Stopping >> resource-monitoring for container_1479728463466_0001_01_000001 >> 2016-11-21 11:44:31,867 INFO org.apache.hadoop.yarn.server. >> nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage >> of ProcessTree 26632 for container-id container_1479728463466_0001_02_000001: >> 194.5 MB of 1 GB physical memory used; 2.5 GB of 2.1 GB virtual memory used >> 2016-11-21 11:44:34,875 INFO org.apache.hadoop.yarn.server. >> nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage >> of ProcessTree 26632 for container-id container_1479728463466_0001_02_000001: >> 532.4 MB of 1 GB physical memory used; 2.6 GB of 2.1 GB virtual memory used >> 2016-11-21 11:44:34,876 WARN org.apache.hadoop.yarn.server. >> nodemanager.containermanager.monitor.ContainersMonitorImpl: Process tree >> for container: container_1479728463466_0001_02_000001 has processes >> older than 1 iteration running over the configured limit. Limit= >> 2254857728, current usage = 2822131712 >> 2016-11-21 11:44:34,876 WARN org.apache.hadoop.yarn.server. >> nodemanager.containermanager.monitor.ContainersMonitorImpl: Container >> [pid=26632,containerID=container_1479728463466_0001_02_000001] is >> running beyond virtual memory limits. Current usage: 532.4 MB of 1 GB >> physical memory used; 2.6 GB of 2.1 GB virtual memory used. Killing >> container. >> >> >> I solved it by adding this to yarn-site.xml: >> >> <property> >> <name>yarn.scheduler.minimum-allocation-mb</name> >> <value>1000</value> >> </property> >> >> >> Thanks, >> Max. >> >> >> >> >> On Sat, Nov 19, 2016 at 10:30 PM, Ashwin Chandra Putta < >> [email protected]> wrote: >> >>> Max, >>> >>> The app failure does not depend on the gateway. The gateway is a daemon >>> to launch Apex apps on YARN and to get metrics for the Apex apps from YARN >>> and AM for each app, so it won't affect app execution once YARN accepts the >>> application. For some reason the AM itself is failing. I cannot figure out >>> the cause from the logs. It is possible that the app packages for these >>> apps have hadoop dependencies packaged, it is one of the most common causes >>> for AM failure. >>> >>> Regards, >>> Ashwin. >>> >>> On Sat, Nov 19, 2016 at 3:08 PM, Max Bridgewater < >>> [email protected]> wrote: >>> >>>> Please find the AppMaster.stderr attached as well as dt.log. >>>> AppMaster.stdout is empty. I am still wondering if there is another port >>>> that is needed or if the UI is using websocket. >>>> >>>> On Sat, Nov 19, 2016 at 5:40 PM, Ashwin Chandra Putta < >>>> [email protected]> wrote: >>>> >>>>> Max, >>>>> >>>>> Can you share the app master logs of the failed application? >>>>> >>>>> Regards, >>>>> Ashwin. >>>>> >>>>> On Sat, Nov 19, 2016 at 4:45 AM, Max Bridgewater < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi Ahswin, >>>>>> >>>>>> Thanks for the feedback. I created a completely new instance, trying >>>>>> the follow the instructions more precisely. I attached the logs again. As >>>>>> you can see they are very clean. Despite this, PIDemo is still failing >>>>>> without any meaningful error message. Same things happens with >>>>>> WorldCountDemo. After launching, it stays in ACCEPTED status for 10 to 15 >>>>>> seconds and switch to FAILED. >>>>>> >>>>>> Max. >>>>>> >>>>>> On Fri, Nov 18, 2016 at 2:30 PM, Ashwin Chandra Putta < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Also, there are write permission errors on /user/dtadmin/datatorrent >>>>>>> in hdfs. Please make dtadmin user own /user/dtadmin/ >>>>>>> >>>>>>> Permission denied: user=dtadmin, access=WRITE, >>>>>>> inode="/user/dtadmin/datatorrent":hduser:supergroup:drwxr-xr-x >>>>>>> >>>>>>> Regards, >>>>>>> Ashwin. >>>>>>> >>>>>>> On Fri, Nov 18, 2016 at 11:27 AM, Ashwin Chandra Putta < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> The end </property> tag is missing between line 30 and 31. It is >>>>>>>> for the property dt.attr.DEBUG. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Ashwin. >>>>>>>> >>>>>>>> On Fri, Nov 18, 2016 at 10:16 AM, Max Bridgewater < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Here is the log folder. Note that it refers to a malformed >>>>>>>>> properties.xml. I am attaching that properties file as well. >>>>>>>>> >>>>>>>>> On Fri, Nov 18, 2016 at 1:08 PM, Ashwin Chandra Putta < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Max, >>>>>>>>>> >>>>>>>>>> Can you share the gateway logs? >>>>>>>>>> >>>>>>>>>> You will find them under /var/log/datatorrent for global install, >>>>>>>>>> or under ~/.dt/logs for local install. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Ashwin. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Ashwin. >>>>>>>>>> >>>>>>>>>> On Nov 18, 2016 9:41 AM, "Max Bridgewater" < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Folks, >>>>>>>>>>> >>>>>>>>>>> I am playing with Apex (DataTorrent RTS Enterprise). Local >>>>>>>>>>> deployment in Ubuntu 16 box works fine. >>>>>>>>>>> >>>>>>>>>>> However, when I deploy on a remote host, I am not apple to >>>>>>>>>>> launch demo applications. My suspicion is that this is due to >>>>>>>>>>> having to >>>>>>>>>>> open an SSH tunnel to access the gateway. All activities other than >>>>>>>>>>> launching the apps seem to work fine. >>>>>>>>>>> >>>>>>>>>>> My question: is there another port I need to open? Anybody is >>>>>>>>>>> aware of issues running/accessing Apex behind a proxy or firewall? >>>>>>>>>>> >>>>>>>>>>> Unfortunately the UI does not provide much information. I am >>>>>>>>>>> attaching some screenshots. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Max. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> Regards, >>>>>>>> Ashwin. >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Regards, >>>>>>> Ashwin. >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Regards, >>>>> Ashwin. >>>>> >>>> >>>> >>> >>> >>> -- >>> >>> Regards, >>> Ashwin. >>> >> >> > > > -- > > Regards, > Ashwin. >
