What does execution of pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.*
generate as output? If you have a single agent you can also use ps aux and grep for flume In my case, for example, I see [root@smb201-1 ~]# pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*a1.* 16794 Is it possible that the configuration for the flume agent "agent1" may have some issue? ________________________________ From: Marco <[email protected]> Sent: Wednesday, July 01, 2015 7:27 AM To: [email protected] Subject: Re: Restart of flume-agents bug error: <<<< File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 214, in execute method(env) File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk return fn(*args, **kwargs) File "/var/lib/ambari-agent/cache/common-services/FLUME/1.4.0.2.0/package/scripts/flume_handler.py", line 56, in start flume(action='start') File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk return fn(*args, **kwargs) File "/var/lib/ambari-agent/cache/common-services/FLUME/1.4.0.2.0/package/scripts/flume.py", line 161, in flume try_sleep=10) File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 148, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 152, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 118, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 274, in action_run raise ex Fail: Execution of 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > /var/run/flume/agent1.pid' returned 1. >>>> output: <<< 2015-07-01 14:08:03,131 - u'Execute[\'ambari-sudo.sh su flume -l -s /bin/bash -c \'export PATH=\'"\'"\'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/sbin:/usr/sbin:/bin:/usr/bin:/var/lib/ambari-agent\'"\'"\\ ' JAVA_HOME=/usr/jdk64/jdk1.7.0_67 ; /usr/hdp/current/flume-server/bin/flume-ng agent --name agent1 --conf /etc/flume/conf/agent1 --conf-file /etc/flume/conf/agent1/flume.conf -Dflume.monitoring.type=org\ .apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink -Dflume.monitoring.node=hostname:6188 > /var/log/flume/agent1.out 2>&1\' &\']' {'environment': {'JAVA_HOME': u'/usr/jdk64/jd\ k1.7.0_67'}, 'wait_for_finish': False} 2015-07-01 14:08:03,136 - u"Execute['pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > /var/run/flume/agent1.pid']" {'logoutput': True, 'tries': 20, 'try_sleep': 10} 2015-07-01 14:08:03,179 - Retrying after 10 seconds. Reason: Execution of 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > /var/run/flume/agent1.pid' returned 1. 2015-07-01 14:08:13,233 - Retrying after 10 seconds. Reason: Execution of 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > /var/run/flume/agent1.pid' returned 1. 2015-07-01 14:08:23,280 - Retrying after 10 seconds. Reason: Execution of 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > /var/run/flume/agent1.pid' returned 1. 2015-07-01 14:08:33,334 - Retrying after 10 seconds. Reason: Execution of 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > /var/run/flume/agent1.pid' returned 1. 2015-07-01 14:08:43,389 - Retrying after 10 seconds. Reason: Execution of 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > /var/run/flume/agent1.pid' returned 1. 2015-07-01 14:08:53,440 - Retrying after 10 seconds. Reason: Execution of 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > /var/run/flume/agent1.pid' returned 1. 2015-07-01 14:09:03,511 - Retrying after 10 seconds. Reason: Execution of 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > /var/run/flume/agent1.pid' returned 1. 2015-07-01 14:09:13,565 - Retrying after 10 seconds. Reason: Execution of 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > /var/run/flume/agent1.pid' returned 1. 2015-07-01 14:09:23,619 - Retrying after 10 seconds. Reason: Execution of 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > /var/run/flume/agent1.pid' returned 1. 2015-07-01 14:09:33,673 - Retrying after 10 seconds. Reason: Execution of 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > /var/run/flume/agent1.pid' returned 1. 2015-07-01 14:09:43,722 - Retrying after 10 seconds. Reason: Execution of 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > /var/run/flume/agent1.pid' returned 1. 2015-07-01 14:09:53,772 - Retrying after 10 seconds. Reason: Execution of 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > /var/run/flume/agent1.pid' returned 1. 2015-07-01 14:10:03,826 - Retrying after 10 seconds. Reason: Execution of 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > /var/run/flume/agent1.pid' returned 1. 2015-07-01 14:10:13,880 - Retrying after 10 seconds. Reason: Execution of 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > /var/run/flume/agent1.pid' returned 1. 2015-07-01 14:10:23,928 - Retrying after 10 seconds. Reason: Execution of 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > /var/run/flume/agent1.pid' returned 1. 2015-07-01 14:10:33,982 - Retrying after 10 seconds. Reason: Execution of 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > /var/run/flume/agent1.pid' returned 1. 2015-07-01 14:10:44,037 - Retrying after 10 seconds. Reason: Execution of 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > /var/run/flume/agent1.pid' returned 1. 2015-07-01 14:10:54,083 - Retrying after 10 seconds. Reason: Execution of 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > /var/run/flume/agent1.pid' returned 1. 2015-07-01 14:11:04,137 - Retrying after 10 seconds. Reason: Execution of 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > /var/run/flume/agent1.pid' returned 1. 2015-07-01 14:11:14,190 - Error while executing command 'start': Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 214, in execute method(env) >>> Thanks, Marco 2015-07-01 16:18 GMT+02:00 Sumit Mohanty <[email protected]<mailto:[email protected]>>: When you start Flume using Ambari - /var/lib/ambari-agent/data folder on the host will have corresponding command outputs/errors etc. Can you share those? Feel free to send a direct email as I think Apache email will not let attachments. ________________________________ From: Marco <[email protected]<mailto:[email protected]>> Sent: Wednesday, July 01, 2015 7:14 AM To: [email protected]<mailto:[email protected]> Subject: Re: Restart of flume-agents bug I've tried this but do not find any related processes I've searched via pgrep -fl flume pgrep -fl agent1 Also, I've restarted the corresponding server. If I try to restart the flume agent, I get the same issue :( I've also tried to delete /var/run/flume and create it again....also no effect. BR Marco 2015-07-01 15:57 GMT+02:00 Sumit Mohanty <[email protected]<mailto:[email protected]>>: If flume agents are running then you need to kill those processes as well along with deleting the pid files. ________________________________ From: Marco <[email protected]<mailto:[email protected]>> Sent: Wednesday, July 01, 2015 6:42 AM To: [email protected]<mailto:[email protected]> Subject: Restart of flume-agents bug Hi, I've troubles when restarting flume agents with ambari. I've found this jira entry https://issues.apache.org/jira/browse/AMBARI-10657, which describes my problem (var/run/flume/a2.pid' returned 1. Since I am using the hortonworks distribution (ambari 2.0.0) I cannot just upgrade/patch...is there any workaround for this issue? I've tried to delete the pid file but with no effect. Thanks, Marco -- Viele Grüße, Marco -- Viele Grüße, Marco
