Since the diagnostic is about exceeding virtual memory limits, please see:
    http://docs.datatorrent.com/configuration/#yarn-vmem-pmem-ratio-tuning
for an alternative solution.

Ram

On Mon, Nov 21, 2016 at 4:20 AM, Max Bridgewater <[email protected]>
wrote:

> The issue turned out to be memory allocation. Here is the relevant YARN
> error message:
>
> 2016-11-21 11:44:30,020 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.container.ContainerImpl: Container
> container_1479728463466_0001_02_000001 transitioned from LOCALIZED to
> RUNNING
> 2016-11-21 11:44:31,858 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.monitor.ContainersMonitorImpl: Starting
> resource-monitoring for container_1479728463466_0001_02_000001
> 2016-11-21 11:44:31,858 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.monitor.ContainersMonitorImpl: Stopping
> resource-monitoring for container_1479728463466_0001_01_000001
> 2016-11-21 11:44:31,867 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage
> of ProcessTree 26632 for container-id container_1479728463466_0001_02_000001:
> 194.5 MB of 1 GB physical memory used; 2.5 GB of 2.1 GB virtual memory used
> 2016-11-21 11:44:34,875 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage
> of ProcessTree 26632 for container-id container_1479728463466_0001_02_000001:
> 532.4 MB of 1 GB physical memory used; 2.6 GB of 2.1 GB virtual memory used
> 2016-11-21 11:44:34,876 WARN org.apache.hadoop.yarn.server.
> nodemanager.containermanager.monitor.ContainersMonitorImpl: Process tree
> for container: container_1479728463466_0001_02_000001 has processes older
> than 1 iteration running over the configured limit. Limit=2254857728,
> current usage = 2822131712
> 2016-11-21 11:44:34,876 WARN org.apache.hadoop.yarn.server.
> nodemanager.containermanager.monitor.ContainersMonitorImpl: Container
> [pid=26632,containerID=container_1479728463466_0001_02_000001] is running
> beyond virtual memory limits. Current usage: 532.4 MB of 1 GB physical
> memory used; 2.6 GB of 2.1 GB virtual memory used. Killing container.
>
>
> I solved it by adding this to yarn-site.xml:
>
> <property>
>         <name>yarn.scheduler.minimum-allocation-mb</name>
>         <value>1000</value>
> </property>
>
>
> Thanks,
> Max.
>
>
>
> On Sat, Nov 19, 2016 at 10:30 PM, Ashwin Chandra Putta <
> [email protected]> wrote:
>
>> Max,
>>
>> The app failure does not depend on the gateway. The gateway is a daemon
>> to launch Apex apps on YARN and to get metrics for the Apex apps from YARN
>> and AM for each app, so it won't affect app execution once YARN accepts the
>> application. For some reason the AM itself is failing. I cannot figure out
>> the cause from the logs. It is possible that the app packages for these
>> apps have hadoop dependencies packaged, it is one of the most common causes
>> for AM failure.
>>
>> Regards,
>> Ashwin.
>>
>> On Sat, Nov 19, 2016 at 3:08 PM, Max Bridgewater <
>> [email protected]> wrote:
>>
>>> Please find the AppMaster.stderr attached as well as dt.log.
>>> AppMaster.stdout is empty. I am still wondering if there is another port
>>> that is needed or if the UI is using websocket.
>>>
>>> On Sat, Nov 19, 2016 at 5:40 PM, Ashwin Chandra Putta <
>>> [email protected]> wrote:
>>>
>>>> Max,
>>>>
>>>> Can you share the app master logs of the failed application?
>>>>
>>>> Regards,
>>>> Ashwin.
>>>>
>>>> On Sat, Nov 19, 2016 at 4:45 AM, Max Bridgewater <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi Ahswin,
>>>>>
>>>>> Thanks for the feedback. I created a completely new instance, trying
>>>>> the follow the instructions more precisely. I attached the logs again. As
>>>>> you can see they are very clean. Despite this, PIDemo is still failing
>>>>> without any meaningful error message. Same things happens with
>>>>> WorldCountDemo. After launching, it stays in ACCEPTED status for 10 to 15
>>>>> seconds and switch to FAILED.
>>>>>
>>>>> Max.
>>>>>
>>>>> On Fri, Nov 18, 2016 at 2:30 PM, Ashwin Chandra Putta <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Also, there are write permission errors on /user/dtadmin/datatorrent
>>>>>> in hdfs. Please make dtadmin user own /user/dtadmin/
>>>>>>
>>>>>> Permission denied: user=dtadmin, access=WRITE,
>>>>>> inode="/user/dtadmin/datatorrent":hduser:supergroup:drwxr-xr-x
>>>>>>
>>>>>> Regards,
>>>>>> Ashwin.
>>>>>>
>>>>>> On Fri, Nov 18, 2016 at 11:27 AM, Ashwin Chandra Putta <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> The end </property> tag is missing between line 30 and 31. It is for
>>>>>>> the property dt.attr.DEBUG.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Ashwin.
>>>>>>>
>>>>>>> On Fri, Nov 18, 2016 at 10:16 AM, Max Bridgewater <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Here is the log folder. Note that it refers to a malformed
>>>>>>>> properties.xml. I am attaching that properties file as well.
>>>>>>>>
>>>>>>>> On Fri, Nov 18, 2016 at 1:08 PM, Ashwin Chandra Putta <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Max,
>>>>>>>>>
>>>>>>>>> Can you share the gateway logs?
>>>>>>>>>
>>>>>>>>> You will find them under /var/log/datatorrent for global install,
>>>>>>>>> or under ~/.dt/logs for local install.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Ashwin.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Ashwin.
>>>>>>>>>
>>>>>>>>> On Nov 18, 2016 9:41 AM, "Max Bridgewater" <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Folks,
>>>>>>>>>>
>>>>>>>>>> I am playing with Apex (DataTorrent RTS Enterprise). Local
>>>>>>>>>> deployment in Ubuntu 16 box works fine.
>>>>>>>>>>
>>>>>>>>>> However, when I deploy on a remote host, I am not apple to launch
>>>>>>>>>> demo applications. My suspicion is that this is due to having to 
>>>>>>>>>> open an
>>>>>>>>>> SSH tunnel to access the gateway. All activities other than 
>>>>>>>>>> launching the
>>>>>>>>>> apps seem to work fine.
>>>>>>>>>>
>>>>>>>>>> My question: is there another port I need to open? Anybody is
>>>>>>>>>> aware of issues running/accessing Apex behind a proxy or firewall?
>>>>>>>>>>
>>>>>>>>>> Unfortunately the UI does not provide much information. I am
>>>>>>>>>> attaching some screenshots.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Max.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Regards,
>>>>>>> Ashwin.
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Regards,
>>>>>> Ashwin.
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Regards,
>>>> Ashwin.
>>>>
>>>
>>>
>>
>>
>> --
>>
>> Regards,
>> Ashwin.
>>
>
>

Reply via email to