They were in the YARN logs instead. No exception, just a WARN level
message, which made it a bit more difficult to find.

On Mon, Nov 21, 2016 at 5:27 PM, Ashwin Chandra Putta <
[email protected]> wrote:

> Cool. I did not see these errors in the AM logs, were these in the node
> manager logs?
>
> Regards,
> Ashwin.
>
> On Mon, Nov 21, 2016 at 4:20 AM, Max Bridgewater <
> [email protected]> wrote:
>
>> The issue turned out to be memory allocation. Here is the relevant YARN
>> error message:
>>
>> 2016-11-21 11:44:30,020 INFO org.apache.hadoop.yarn.server.
>> nodemanager.containermanager.container.ContainerImpl: Container
>> container_1479728463466_0001_02_000001 transitioned from LOCALIZED to
>> RUNNING
>> 2016-11-21 11:44:31,858 INFO org.apache.hadoop.yarn.server.
>> nodemanager.containermanager.monitor.ContainersMonitorImpl: Starting
>> resource-monitoring for container_1479728463466_0001_02_000001
>> 2016-11-21 11:44:31,858 INFO org.apache.hadoop.yarn.server.
>> nodemanager.containermanager.monitor.ContainersMonitorImpl: Stopping
>> resource-monitoring for container_1479728463466_0001_01_000001
>> 2016-11-21 11:44:31,867 INFO org.apache.hadoop.yarn.server.
>> nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage
>> of ProcessTree 26632 for container-id container_1479728463466_0001_02_000001:
>> 194.5 MB of 1 GB physical memory used; 2.5 GB of 2.1 GB virtual memory used
>> 2016-11-21 11:44:34,875 INFO org.apache.hadoop.yarn.server.
>> nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage
>> of ProcessTree 26632 for container-id container_1479728463466_0001_02_000001:
>> 532.4 MB of 1 GB physical memory used; 2.6 GB of 2.1 GB virtual memory used
>> 2016-11-21 11:44:34,876 WARN org.apache.hadoop.yarn.server.
>> nodemanager.containermanager.monitor.ContainersMonitorImpl: Process tree
>> for container: container_1479728463466_0001_02_000001 has processes
>> older than 1 iteration running over the configured limit. Limit=
>> 2254857728, current usage = 2822131712
>> 2016-11-21 11:44:34,876 WARN org.apache.hadoop.yarn.server.
>> nodemanager.containermanager.monitor.ContainersMonitorImpl: Container
>> [pid=26632,containerID=container_1479728463466_0001_02_000001] is
>> running beyond virtual memory limits. Current usage: 532.4 MB of 1 GB
>> physical memory used; 2.6 GB of 2.1 GB virtual memory used. Killing
>> container.
>>
>>
>> I solved it by adding this to yarn-site.xml:
>>
>> <property>
>>         <name>yarn.scheduler.minimum-allocation-mb</name>
>>         <value>1000</value>
>> </property>
>>
>>
>> Thanks,
>> Max.
>>
>>
>>
>>
>> On Sat, Nov 19, 2016 at 10:30 PM, Ashwin Chandra Putta <
>> [email protected]> wrote:
>>
>>> Max,
>>>
>>> The app failure does not depend on the gateway. The gateway is a daemon
>>> to launch Apex apps on YARN and to get metrics for the Apex apps from YARN
>>> and AM for each app, so it won't affect app execution once YARN accepts the
>>> application. For some reason the AM itself is failing. I cannot figure out
>>> the cause from the logs. It is possible that the app packages for these
>>> apps have hadoop dependencies packaged, it is one of the most common causes
>>> for AM failure.
>>>
>>> Regards,
>>> Ashwin.
>>>
>>> On Sat, Nov 19, 2016 at 3:08 PM, Max Bridgewater <
>>> [email protected]> wrote:
>>>
>>>> Please find the AppMaster.stderr attached as well as dt.log.
>>>> AppMaster.stdout is empty. I am still wondering if there is another port
>>>> that is needed or if the UI is using websocket.
>>>>
>>>> On Sat, Nov 19, 2016 at 5:40 PM, Ashwin Chandra Putta <
>>>> [email protected]> wrote:
>>>>
>>>>> Max,
>>>>>
>>>>> Can you share the app master logs of the failed application?
>>>>>
>>>>> Regards,
>>>>> Ashwin.
>>>>>
>>>>> On Sat, Nov 19, 2016 at 4:45 AM, Max Bridgewater <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi Ahswin,
>>>>>>
>>>>>> Thanks for the feedback. I created a completely new instance, trying
>>>>>> the follow the instructions more precisely. I attached the logs again. As
>>>>>> you can see they are very clean. Despite this, PIDemo is still failing
>>>>>> without any meaningful error message. Same things happens with
>>>>>> WorldCountDemo. After launching, it stays in ACCEPTED status for 10 to 15
>>>>>> seconds and switch to FAILED.
>>>>>>
>>>>>> Max.
>>>>>>
>>>>>> On Fri, Nov 18, 2016 at 2:30 PM, Ashwin Chandra Putta <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Also, there are write permission errors on /user/dtadmin/datatorrent
>>>>>>> in hdfs. Please make dtadmin user own /user/dtadmin/
>>>>>>>
>>>>>>> Permission denied: user=dtadmin, access=WRITE,
>>>>>>> inode="/user/dtadmin/datatorrent":hduser:supergroup:drwxr-xr-x
>>>>>>>
>>>>>>> Regards,
>>>>>>> Ashwin.
>>>>>>>
>>>>>>> On Fri, Nov 18, 2016 at 11:27 AM, Ashwin Chandra Putta <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> The end </property> tag is missing between line 30 and 31. It is
>>>>>>>> for the property dt.attr.DEBUG.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Ashwin.
>>>>>>>>
>>>>>>>> On Fri, Nov 18, 2016 at 10:16 AM, Max Bridgewater <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Here is the log folder. Note that it refers to a malformed
>>>>>>>>> properties.xml. I am attaching that properties file as well.
>>>>>>>>>
>>>>>>>>> On Fri, Nov 18, 2016 at 1:08 PM, Ashwin Chandra Putta <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Max,
>>>>>>>>>>
>>>>>>>>>> Can you share the gateway logs?
>>>>>>>>>>
>>>>>>>>>> You will find them under /var/log/datatorrent for global install,
>>>>>>>>>> or under ~/.dt/logs for local install.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Ashwin.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Ashwin.
>>>>>>>>>>
>>>>>>>>>> On Nov 18, 2016 9:41 AM, "Max Bridgewater" <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Folks,
>>>>>>>>>>>
>>>>>>>>>>> I am playing with Apex (DataTorrent RTS Enterprise). Local
>>>>>>>>>>> deployment in Ubuntu 16 box works fine.
>>>>>>>>>>>
>>>>>>>>>>> However, when I deploy on a remote host, I am not apple to
>>>>>>>>>>> launch demo applications. My suspicion is that this is due to 
>>>>>>>>>>> having to
>>>>>>>>>>> open an SSH tunnel to access the gateway. All activities other than
>>>>>>>>>>> launching the apps seem to work fine.
>>>>>>>>>>>
>>>>>>>>>>> My question: is there another port I need to open? Anybody is
>>>>>>>>>>> aware of issues running/accessing Apex behind a proxy or firewall?
>>>>>>>>>>>
>>>>>>>>>>> Unfortunately the UI does not provide much information. I am
>>>>>>>>>>> attaching some screenshots.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Max.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Ashwin.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Regards,
>>>>>>> Ashwin.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Regards,
>>>>> Ashwin.
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Regards,
>>> Ashwin.
>>>
>>
>>
>
>
> --
>
> Regards,
> Ashwin.
>

Reply via email to