[graylog2] Re: 'did not find meta info for this node' error, but not timesync related?

2016-05-17 Thread Jeff McCombs
Turns out this was a resource issue.. My 3 nodes were running under VMWare 
- only had 2 cores/4Gb and I was trying to throw about 150K log messages at 
it per second. :)

Increasing memory/cpu allocations, and tweaking the graylog mem values 
(orig 1G -> 4G) and doing the same on the shared elasticsearch configs (up 
to 18G instead of 2G) cleared the issue. 


On Thursday, May 12, 2016 at 1:17:23 PM UTC-7, Jeff McCombs wrote:
>
> Hi gang,
>
>
>   I'm running into a strange problem where my graylog nodes are 
> complaining about not being able to find their meta info:
>
>
> 2016-05-12T11:50:09.691-07:00 WARN  [NodePingThread] Did not find meta 
> info of this node. Re-registering.
>
> 2016-05-12T11:50:12.878-07:00 WARN  [NodePingThread] Did not find meta 
> info of this node. Re-registering.
>
> 2016-05-12T11:50:13.417-07:00 WARN  [ProxiedResource] Node 
> <00ac0ad1-b96f-46c0-a2bc-bc9e7a90777f> not found while trying to call 
> org.graylog2.shared.rest.resources.system.RemoteMetricsResource on it.
>
> 2016-05-12T11:50:15.808-07:00 WARN  [NodePingThread] Did not find meta 
> info of this node. Re-registering.
>
> 2016-05-12T11:50:19.175-07:00 WARN  [NodePingThread] Did not find meta 
> info of this node. Re-registering.
>
> 2016-05-12T11:50:24.767-07:00 WARN  [NodePingThread] Did not find meta 
> info of this node. Re-registering.
>
> 2016-05-12T11:50:28.020-07:00 WARN  [NodePingThread] Did not find meta 
> info of this node. Re-registering.
>
> 2016-05-12T11:50:37.849-07:00 WARN  [NodePingThread] Did not find meta 
> info of this node. Re-registering.
>
> 2016-05-12T11:50:40.978-07:00 WARN  [NodePingThread] Did not find meta 
> info of this node. Re-registering.
>
> 2016-05-12T11:50:41.904-07:00 WARN  [ProxiedResource] Node 
> <00ac0ad1-b96f-46c0-a2bc-bc9e7a90777f> not found while trying to call 
> org.graylog2.shared.rest.resources.system.RemoteMetricsResource on it.
>
> 2016-05-12T11:50:47.400-07:00 WARN  [NodePingThread] Did not find meta 
> info of this node. Re-registering.
>
> 2016-05-12T11:50:50.670-07:00 WARN  [NodePingThread] Did not find meta 
> info of this node. Re-registering.
>
>
> In addition to the log entries above, I see occasional timeouts and errors 
> in the web UI about master nodes no longer being available, or the web-UI 
> just disappears for a few seconds and comes back.. I've also seen nodes 
> drop in/out of the webUI.. I'm assuming these are related.
>
>
> Doing some basic google searches on this, the only thing I've seen on the 
> log entries, is that the time for the nodes may be out of sync.. I've 
> checked this and that's not the case here. All three nodes are running NTP 
> and chiming off the local ntp server on the network:
>
>
> [root@gray00 /data]# ntpdate -q ntp0
>
> server 10.201.136.38, stratum 3, offset -0.000653, delay 0.02576
>
> 12 May 12:29:33 ntpdate[317]: adjust time server 10.201.136.38 offset 
> -0.000653 sec
>
> [root@gray00 /data]# date
>
> Thu May 12 12:30:31 PDT 2016
>
>
> [root@gray01 graylog]# ntpdate -q ntp0
>
> server 10.201.136.38, stratum 3, offset -0.000568, delay 0.02576
>
> 12 May 12:29:22 ntpdate[31508]: adjust time server 10.201.136.38 offset 
> -0.000568 sec
>
> [root@gray01 graylog]# date
>
> Thu May 12 12:30:31 PDT 2016
>
>
> [root@gray02 /data]# ntpdate -q ntp0
>
> server 10.201.136.38, stratum 3, offset -0.55, delay 0.02580
>
> 12 May 12:29:21 ntpdate[535]: adjust time server 10.201.136.38 offset 
> -0.55 sec
>
> [root@gray02 /data]# date
>
> Thu May 12 12:30:32 PDT 2016
>
>
> So what am I doing wrong here? Is there some additional troubleshooting I 
> can perform to try and pinpoint the issue? Strangely, everything is fine if 
> I restart the graylog instances for about 5-10 minutes, then these log 
> entries start popping back up.
>
>
> Here's some deets on how I have things configured:
>
>
> 3x nodes - RHEL6 x64 (gray00, gray01, gray02). Installation via the repo's 
> for mongo, elasticsearch, and graylog.
>
>
> all three nodes run:
>
>elasticsearch
>
>mongo
>
>graylog
>
>
> In front is an F5 LTM, Virtual IP on the F5 is known as "graylog". 
> Services ports 9000, and 12900. Sticky sessions enabled on both.
>
>
> Configuration data for graylog below. All nodes have the same core config 
> except for "is_master=false" and IP address changes:
>
> is_master = true
>
> node_id_file = /etc/graylog/server/node-id
>
> password_secret = 
> WQBdx6xgWTTykN9LHJhEGxfiSJbeYdaZhHhKEwbvAKQEWkVrl8lgTLvDDkfUtwhe7jgdFDFCBqpmVvY4aea1GyrbQ791UOCv
>
> root_password_sha2 = 
> e3ed009797ada49a3fd

Re: [graylog2] Re: 'did not find meta info for this node' error, but not timesync related?

2016-05-17 Thread Jeff McCombs
OK, so I'm back to the original issue.

As soon as there are multiple nodes in the configuration, I start to see 
the "Did not find meta info of this node. Re-Registering." in the logs. I 
see this for all 3 nodes. I've checked the replica configuration in Mongo 
(looks good), dropped down to a single Mongo node (no change).. the mongo 
table has the hosts in there:

graylog:PRIMARY> db.nodes.find()
{ "_id" : ObjectId("573b620b89479f0ccf046c34"), "is_master" : false, 
"hostname" : "gray02.somewhere.com", "last_seen" : 1463512536, 
"transport_address" : "http://10.201.137.210:12900/;, "type" : "SERVER", 
"node_id" : "8536ee95-b9c7-4553-9022-d997da315755" }
{ "_id" : ObjectId("573b6dbf05ee161654fa2122"), "is_master" : true, 
"hostname" : "gray01.somewhere.com", "last_seen" : 1463512536, 
"transport_address" : "http://10.201.137.209:12900/;, "type" : "SERVER", 
"node_id" : "00ac0ad1-b96f-46c0-a2bc-bc9e7a90777f" }
{ "_id" : ObjectId("573b6dbfb2d64909218a56b8"), "is_master" : false, 
"hostname" : "gray00.somewhere.com", "last_seen" : 1463512537, 
"transport_address" : "http://10.201.137.208:12900/;, "type" : "SERVER", 
"node_id" : "3116ac6b-604f-4436-955c-1458cb489415" }

last_seen is getting updated.. times are in-sync on all 3 nodes (as well as 
the F5 and any web clients).. REST calls are configured to be sticky, as 
are the webUI calls.. what am I missing?



On Tuesday, May 17, 2016 at 11:04:22 AM UTC-7, Jeff McCombs wrote:
>
> You can't tell, but I'm blushing right now.
>
> Thanks Jochen. :) 
>
>
> On Tuesday, May 17, 2016 at 10:13:36 AM UTC-7, Jochen Schalanda wrote:
>>
>> Hi Jeff,
>>
>> you're probably looking for the web_endpoint_uri configuration setting 
>> (see 
>> http://docs.graylog.org/en/2.0/pages/configuring_webif.html#configuration-options).
>>  
>> The rest_listen_uri (or rest_transport_uri) should always be an address 
>> which the Graylog nodes in a given cluster can access.
>>
>> Cheers,
>> Jochen
>>
>> On Tuesday, 17 May 2016 17:49:54 UTC+2, Jeff McCombs wrote:
>>>
>>> Hi Jochen,
>>>
>>>   Yes, that's actually intentional. Though it could just be that I'm 
>>> misunderstanding the option.. 
>>>
>>>   Consider the scenario below:
>>>
>>> +-+
>>> | |
>>> |  User   |
>>> |  (192.168.1.200)|
>>> +-+
>>>   |
>>>   |
>>>   +--+
>>>   |
>>>   |
>>>   | graylog.somewhere.com
>>>   | (192.168.1.100)
>>>   +--+
>>>   |  |
>>>   |F5|
>>>   |  |
>>>   |  |
>>>   +--+
>>>|
>>>|
>>>|
>>>
>>>  
>>> ++
>>>  | |   |
>>>   +-+   +-+  +-+
>>>   | |   | |  | |
>>>   | |   | |  | |
>>>   |  gray00 |   |  gray01 |  |  gray02 |
>>>   |  10.201.5.1 |   |  10.201.5.2 |  |  10.201.5.3 |
>>>   +-+   +-+  +-+
>>>
>>> (hope that shows up OK, if not, convert it to fixed width font)
>>>
>>>  When the javscript running in the browser for a WebUI call needs to 
>>> reach the individual nodes via the REST interface, the only way for that 
>>> call to happen is to go through the F5 and be load balanced.. right? But if 
>>> the nodes need to communicate with one anoth

Re: [graylog2] Re: 'did not find meta info for this node' error, but not timesync related?

2016-05-17 Thread Jeff McCombs
You can't tell, but I'm blushing right now.

Thanks Jochen. :) 


On Tuesday, May 17, 2016 at 10:13:36 AM UTC-7, Jochen Schalanda wrote:
>
> Hi Jeff,
>
> you're probably looking for the web_endpoint_uri configuration setting 
> (see 
> http://docs.graylog.org/en/2.0/pages/configuring_webif.html#configuration-options).
>  
> The rest_listen_uri (or rest_transport_uri) should always be an address 
> which the Graylog nodes in a given cluster can access.
>
> Cheers,
> Jochen
>
> On Tuesday, 17 May 2016 17:49:54 UTC+2, Jeff McCombs wrote:
>>
>> Hi Jochen,
>>
>>   Yes, that's actually intentional. Though it could just be that I'm 
>> misunderstanding the option.. 
>>
>>   Consider the scenario below:
>>
>> +-+
>> | |
>> |  User   |
>> |  (192.168.1.200)|
>> +-+
>>   |
>>   |
>>   +--+
>>   |
>>   |
>>   | graylog.somewhere.com
>>   | (192.168.1.100)
>>   +--+
>>   |  |
>>   |F5|
>>   |  |
>>   |  |
>>   +--+
>>|
>>|
>>|
>>
>>  
>> ++
>>  | |   |
>>   +-+   +-+  +-+
>>   | |   | |  | |
>>   | |   | |  | |
>>   |  gray00 |   |  gray01 |  |  gray02 |
>>   |  10.201.5.1 |   |  10.201.5.2 |  |  10.201.5.3 |
>>   +-+   +-+  +-+
>>
>> (hope that shows up OK, if not, convert it to fixed width font)
>>
>>  When the javscript running in the browser for a WebUI call needs to 
>> reach the individual nodes via the REST interface, the only way for that 
>> call to happen is to go through the F5 and be load balanced.. right? But if 
>> the nodes need to communicate with one another via the REST interface as 
>> well..then yeah I could see why the nodes would be complaining. They try 
>> and reach the 192 address, the request gets balanced, and winds up on the 
>> wrong node... 
>>
>> So am I just misunderstanding the REST transport URI option? WITHOUT 
>> setting that configuration to the same address, the WebUI doesn't function 
>> properly because there's no direct communication between the individual 
>> graylog nodes, and the end user.
>>
>> Is there a WebUI/REST URI option somewhere I just don't know about?
>>
>>
>> On Tue, May 17, 2016 at 5:07 AM, Jochen Schalanda wrote:
>>
>>> Hi Jeff,
>>>
>>> you're using the same transport address for the Graylog REST API on all 
>>> 3 Graylog nodes. Is this intentional? I'm asking because that won't work in 
>>> the long run as Graylog nodes need to be able to communicate with each 
>>> other via the Graylog REST API and the announced transport address.
>>>
>>> Cheers,
>>> Jochen
>>>
>>>
>>> On Friday, 13 May 2016 22:53:07 UTC+2, Jeff McCombs wrote:
>>>>
>>>> So here's a question.. looking at the node output from tokred vs mine..
>>>>
>>>> When you have a cluster of Graylog servers behind a load balancer.. do 
>>>> you configure the API transport address to the cluster IP, or the 
>>>> individual nodes? Could this be the cause of the following errors I'm also 
>>>> seeing?
>>>>
>>>> 2016-05-13T13:43:27.749-07:00 WARN  [ProxiedResource] Node 
>>>> <3116ac6b-604f-4436-955c-1458cb489415> not found while trying to call 
>>>> org.graylog2.shared.rest.resources.system.RemoteMetricsResource on it.
>>>> 2016-05-13T13:46:58.766-07:00 WARN  [ProxiedResource] Node 
>>>> <3116ac6b-604f-4436-955c-1458cb489415> not found while

Re: [graylog2] Re: 'did not find meta info for this node' error, but not timesync related?

2016-05-17 Thread Jeff McCombs
Hi Jochen,

  Yes, that's actually intentional. Though it could just be that I'm
misunderstanding the option..

  Consider the scenario below:

+-+
| |
|  User   |
|  (192.168.1.200)|
+-+
  |
  |
  +--+
  |
  |
  | graylog.somewhere.com
  | (192.168.1.100)
  +--+
  |  |
  |F5|
  |  |
  |  |
  +--+
   |
   |
   |
 
++
 | |   |
  +-+   +-+  +-+
  | |   | |  | |
  | |   | |  | |
  |  gray00 |   |  gray01 |  |  gray02 |
  |  10.201.5.1 |   |  10.201.5.2 |  |  10.201.5.3 |
  +-+   +-+  +-+

(hope that shows up OK, if not, convert it to fixed width font)

 When the javscript running in the browser for a WebUI call needs to reach
the individual nodes via the REST interface, the only way for that call to
happen is to go through the F5 and be load balanced.. right? But if the
nodes need to communicate with one another via the REST interface as
well..then yeah I could see why the nodes would be complaining. They try
and reach the 192 address, the request gets balanced, and winds up on the
wrong node...

So am I just misunderstanding the REST transport URI option? WITHOUT
setting that configuration to the same address, the WebUI doesn't function
properly because there's no direct communication between the individual
graylog nodes, and the end user.

Is there a WebUI/REST URI option somewhere I just don't know about?


On Tue, May 17, 2016 at 5:07 AM, Jochen Schalanda <joc...@graylog.com>
wrote:

> Hi Jeff,
>
> you're using the same transport address for the Graylog REST API on all 3
> Graylog nodes. Is this intentional? I'm asking because that won't work in
> the long run as Graylog nodes need to be able to communicate with each
> other via the Graylog REST API and the announced transport address.
>
> Cheers,
> Jochen
>
>
> On Friday, 13 May 2016 22:53:07 UTC+2, Jeff McCombs wrote:
>>
>> So here's a question.. looking at the node output from tokred vs mine..
>>
>> When you have a cluster of Graylog servers behind a load balancer.. do
>> you configure the API transport address to the cluster IP, or the
>> individual nodes? Could this be the cause of the following errors I'm also
>> seeing?
>>
>> 2016-05-13T13:43:27.749-07:00 WARN  [ProxiedResource] Node
>> <3116ac6b-604f-4436-955c-1458cb489415> not found while trying to call
>> org.graylog2.shared.rest.resources.system.RemoteMetricsResource on it.
>> 2016-05-13T13:46:58.766-07:00 WARN  [ProxiedResource] Node
>> <3116ac6b-604f-4436-955c-1458cb489415> not found while trying to call
>> org.graylog2.shared.rest.resources.system.RemoteMetricsResource on it.
>> 2016-05-13T13:49:14.735-07:00 WARN  [ProxiedResource] Node
>> <3116ac6b-604f-4436-955c-1458cb489415> not found while trying to call
>> org.graylog2.shared.rest.resources.system.RemoteMetricsResource on it.
>>
>> On Friday, May 13, 2016 at 1:50:17 PM UTC-7, Jeff McCombs wrote:
>>>
>>> Hi Jochen,
>>>
>>>   I see the records for the nodes:
>>>
>>> graylog:PRIMARY> db.nodes.find()
>>> { "_id" : ObjectId("57363bab05ee16689e192953"), "is_master" : false,
>>> "hostname" : "gray01somewhere.com", "last_seen" : 1463172221,
>>> "transport_address" : "http://graylog.somewhere.com:12900/;, "type" :
>>> "SERVER", "node_id" : "00ac0ad1-b96f-46c0-a2bc-bc9e7a90777f" }
>>> { "_id" : ObjectId("57363c0889479f6906e17de9"), "is_master" : false,
>>> "hostname" : "gray02.somewhere.com", "last_seen" : 1463172221,
>>> "transport_address" : "http://graylog.somewhere.com:12900/;, "t

[graylog2] Re: 'did not find meta info for this node' error, but not timesync related?

2016-05-13 Thread Jeff McCombs
So here's a question.. looking at the node output from tokred vs mine..

When you have a cluster of Graylog servers behind a load balancer.. do you 
configure the API transport address to the cluster IP, or the individual 
nodes? Could this be the cause of the following errors I'm also seeing?

2016-05-13T13:43:27.749-07:00 WARN  [ProxiedResource] Node 
<3116ac6b-604f-4436-955c-1458cb489415> not found while trying to call 
org.graylog2.shared.rest.resources.system.RemoteMetricsResource on it.
2016-05-13T13:46:58.766-07:00 WARN  [ProxiedResource] Node 
<3116ac6b-604f-4436-955c-1458cb489415> not found while trying to call 
org.graylog2.shared.rest.resources.system.RemoteMetricsResource on it.
2016-05-13T13:49:14.735-07:00 WARN  [ProxiedResource] Node 
<3116ac6b-604f-4436-955c-1458cb489415> not found while trying to call 
org.graylog2.shared.rest.resources.system.RemoteMetricsResource on it.

On Friday, May 13, 2016 at 1:50:17 PM UTC-7, Jeff McCombs wrote:
>
> Hi Jochen,
>
>   I see the records for the nodes:
>
> graylog:PRIMARY> db.nodes.find()
> { "_id" : ObjectId("57363bab05ee16689e192953"), "is_master" : false, 
> "hostname" : "gray01somewhere.com", "last_seen" : 1463172221, 
> "transport_address" : "http://graylog.somewhere.com:12900/;, "type" : 
> "SERVER", "node_id" : "00ac0ad1-b96f-46c0-a2bc-bc9e7a90777f" }
> { "_id" : ObjectId("57363c0889479f6906e17de9"), "is_master" : false, 
> "hostname" : "gray02.somewhere.com", "last_seen" : 1463172221, 
> "transport_address" : "http://graylog.somewhere.com:12900/;, "type" : 
> "SERVER", "node_id" : "8536ee95-b9c7-4553-9022-d997da315755" }
> { "_id" : ObjectId("57363c79b2d6491223d87222"), "is_master" : true, 
> "hostname" : "gray00.somewhere.com", "last_seen" : 1463172220, 
> "transport_address" : "http://graylog.somewhere.com:12900/;, "type" : 
> "SERVER", "node_id" : "3116ac6b-604f-4436-955c-1458cb489415" }
>
> Interestingly, when I shut down all but the master, it continues to spit 
> errors:
>
> /var/log/graylog-server/server.log
> 2016-05-13T13:47:06.662-07:00 WARN  [NodePingThread] Did not find meta 
> info of this node. Re-registering.
> 2016-05-13T13:47:32.639-07:00 WARN  [NodePingThread] Did not find meta 
> info of this node. Re-registering.
>
> mongo nodes query:
> graylog:PRIMARY> db.nodes.find()
> { "_id" : ObjectId("57363d64b2d6491223d87339"), "is_master" : true, 
> "hostname" : "gray00.somewhere.com", "last_seen" : 1463172464, 
> "transport_address" : "http://graylog.somewhere.com:12900/;, "type" : 
> "SERVER", "node_id" : "3116ac6b-604f-4436-955c-1458cb489415" }
>
> Thoughts?
>
> On Friday, May 13, 2016 at 2:16:17 AM UTC-7, Jochen Schalanda wrote:
>>
>> Hi Jeff,
>>
>> please check the "nodes" collection in MongoDB and that it contains valid 
>> node descriptors while Graylog is running.
>>
>> Cheers,
>> Jochen
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Graylog Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to graylog2+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/graylog2/e45ab56c-8690-42bf-992d-74e229d754f7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[graylog2] Re: 'did not find meta info for this node' error, but not timesync related?

2016-05-13 Thread Jeff McCombs
Hi Jochen,

  I see the records for the nodes:

graylog:PRIMARY> db.nodes.find()
{ "_id" : ObjectId("57363bab05ee16689e192953"), "is_master" : false, 
"hostname" : "gray01somewhere.com", "last_seen" : 1463172221, 
"transport_address" : "http://graylog.somewhere.com:12900/;, "type" : 
"SERVER", "node_id" : "00ac0ad1-b96f-46c0-a2bc-bc9e7a90777f" }
{ "_id" : ObjectId("57363c0889479f6906e17de9"), "is_master" : false, 
"hostname" : "gray02.somewhere.com", "last_seen" : 1463172221, 
"transport_address" : "http://graylog.somewhere.com:12900/;, "type" : 
"SERVER", "node_id" : "8536ee95-b9c7-4553-9022-d997da315755" }
{ "_id" : ObjectId("57363c79b2d6491223d87222"), "is_master" : true, 
"hostname" : "gray00.somewhere.com", "last_seen" : 1463172220, 
"transport_address" : "http://graylog.somewhere.com:12900/;, "type" : 
"SERVER", "node_id" : "3116ac6b-604f-4436-955c-1458cb489415" }

Interestingly, when I shut down all but the master, it continues to spit 
errors:

/var/log/graylog-server/server.log
2016-05-13T13:47:06.662-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.
2016-05-13T13:47:32.639-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.

mongo nodes query:
graylog:PRIMARY> db.nodes.find()
{ "_id" : ObjectId("57363d64b2d6491223d87339"), "is_master" : true, 
"hostname" : "gray00.somewhere.com", "last_seen" : 1463172464, 
"transport_address" : "http://graylog.somewhere.com:12900/;, "type" : 
"SERVER", "node_id" : "3116ac6b-604f-4436-955c-1458cb489415" }

Thoughts?

On Friday, May 13, 2016 at 2:16:17 AM UTC-7, Jochen Schalanda wrote:
>
> Hi Jeff,
>
> please check the "nodes" collection in MongoDB and that it contains valid 
> node descriptors while Graylog is running.
>
> Cheers,
> Jochen
>

-- 
You received this message because you are subscribed to the Google Groups 
"Graylog Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to graylog2+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/graylog2/2ec43f6d-3028-4b4a-ad33-2c3ad594cdce%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[graylog2] 3-node cluster, strangeness "did not find meta info" and webUI oddities

2016-05-13 Thread Jeff McCombs
Hi gang,

  I'm running into a strange problem where my graylog nodes are complaining 
about not being able to find their meta info:

2016-05-12T11:50:09.691-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.
2016-05-12T11:50:12.878-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.
2016-05-12T11:50:13.417-07:00 WARN  [ProxiedResource] Node 
<00ac0ad1-b96f-46c0-a2bc-bc9e7a90777f> not found while trying to call 
org.graylog2.shared.rest.resources.system.RemoteMetricsResource on it.
2016-05-12T11:50:15.808-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.
2016-05-12T11:50:19.175-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.
2016-05-12T11:50:24.767-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.
2016-05-12T11:50:28.020-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.
2016-05-12T11:50:37.849-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.
2016-05-12T11:50:40.978-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.
2016-05-12T11:50:41.904-07:00 WARN  [ProxiedResource] Node 
<00ac0ad1-b96f-46c0-a2bc-bc9e7a90777f> not found while trying to call 
org.graylog2.shared.rest.resources.system.RemoteMetricsResource on it.
2016-05-12T11:50:47.400-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.
2016-05-12T11:50:50.670-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.

In addition to the log entries above, I see occasional timeouts and errors 
about master nodes no longer being available, or the web-UI just disappears 
for a few seconds and comes back.

Doing some basic google searches on this, the only thing I've seen on the 
log entries, is that the time for the nodes may be out of sync.. I've 
checked this and that's not the case here. All three nodes are running NTP 
and chiming off the local ntp server on the network:

[root@gray00 /data]# ntpdate -q ntp0
server 10.201.136.38, stratum 3, offset -0.000653, delay 0.02576
12 May 12:29:33 ntpdate[317]: adjust time server 10.201.136.38 offset 
-0.000653 sec
[root@gray00 /data]# date
Thu May 12 12:30:31 PDT 2016

[root@gray01 graylog]# ntpdate -q ntp0
server 10.201.136.38, stratum 3, offset -0.000568, delay 0.02576
12 May 12:29:22 ntpdate[31508]: adjust time server 10.201.136.38 offset 
-0.000568 sec
[root@gray01 graylog]# date
Thu May 12 12:30:31 PDT 2016

[root@gray02 /data]# ntpdate -q ntp0
server 10.201.136.38, stratum 3, offset -0.55, delay 0.02580
12 May 12:29:21 ntpdate[535]: adjust time server 10.201.136.38 offset 
-0.55 sec
[root@gray02 /data]# date
Thu May 12 12:30:32 PDT 2016

So what am I doing wrong here? Is there some additional troubleshooting I 
can perform to try and pinpoint the issue? Strangely, everything is fine if 
I restart the graylog instances for about 5-10 minutes, then these log 
entries start popping back up.

Here's some deets on how I have things configured:

3x nodes - RHEL6 x64 (gray00, gray01, gray02)

all three nodes run:
   elasticsearch
   mongo
   graylog

In front is an F5 LTM, VIP is known as "graylog". Services ports 9000, and 
12900. Sticky sessions enabled on both.

Configuration data for graylog below. All nodes have the same core config 
except for "is_master=false" and IP address changes:
is_master = true
node_id_file = /etc/graylog/server/node-id
password_secret = 
WQBdx6xgWTTykN9LHJhEGxfiSJbeYdaZhHhKEwbvAKQEWkVrl8lgTLvDDkfUtwhe7jgdFDFCBqpmVvY4aea1GyrbQ791UOCv
root_password_sha2 = 
e3ed009797ada49a3fd38a04069b13d5a7f62001a153ed4d9a3da22fa7a75c7b
plugin_dir = /usr/share/graylog-server/plugin
rest_listen_uri = http://10.201.137.208:12900/
rest_transport_uri = http://graylog.somewhere.com:12900/
rest_enable_gzip = true
web_listen_uri = http://10.201.137.208:9000/
web_enable_gzip = true
rotation_strategy = count
elasticsearch_max_docs_per_index = 2000
elasticsearch_max_number_of_indices = 20
retention_strategy = delete
elasticsearch_shards = 4
elasticsearch_replicas = 1
elasticsearch_index_prefix = graylog
allow_leading_wildcard_searches = false
allow_highlighting = false
elasticsearch_cluster_name = graylog
elasticsearch_node_name_prefix = graylog-
elasticsearch_discovery_zen_ping_unicast_hosts = gray00.somewhere.com:9300, 
gray01.somewhere.com:9300, gray02.somewhere.com:9300, 
gray00.somewhere.com:9350, gray01.somewhere.com:9350, 
gray02.somewhere.com:9350
elasticsearch_transport_tcp_port = 9350
elasticsearch_discovery_zen_ping_multicast_enabled = false
elasticsearch_network_host = gray00.somewhere.com
elasticsearch_network_bind_host = gray00.somewhere.com
elasticsearch_network_publish_host = gray00.somewhere.com
elasticsearch_analyzer = standard
output_batch_size = 500
output_flush_interval = 1
output_fault_count_threshold = 5
output_fault_penalty_seconds = 30
processbuffer_processors = 

[graylog2] "did not find meta info of this node." error

2016-05-13 Thread Jeff McCombs
I'd be grateful if anyone could help point me in the right direction for 
this... Here's the issue:

I've got a 3-node setup sitting behind an F5 as a POC... 

After about 5-10 minutes of all three nodes up and running, I start to see 
occasional "blips" in the web UI and the following entries in the logs of 
all three systems:

2016-05-12T10:25:24.514-07:00 WARN  [ProxiedResource] Node 
<3116ac6b-604f-4436-955c-1458cb489415> not found while trying to call 
org.graylog2.rest.resources.system.jobs.RemoteSystemJobResource on it.
2016-05-12T10:25:27.603-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.
2016-05-12T10:25:30.910-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.
2016-05-12T10:25:37.556-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.
2016-05-12T10:25:40.808-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.
2016-05-12T10:25:44.663-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.
2016-05-12T10:25:47.922-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.
2016-05-12T10:25:57.609-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.
2016-05-12T10:26:00.731-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.
2016-05-12T10:26:04.334-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.
2016-05-12T10:26:07.561-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.
2016-05-12T10:26:10.761-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.
2016-05-12T10:26:26.575-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.
2016-05-12T10:26:30.348-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.
2016-05-12T10:26:33.645-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.
2016-05-12T10:26:37.474-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.
2016-05-12T10:26:40.659-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.
2016-05-12T10:26:43.920-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.
2016-05-12T10:26:50.739-07:00 WARN  [ProxiedResource] Node 
<3116ac6b-604f-4436-955c-1458cb489415> not found while trying to call 
org.graylog2.shared.rest.resources.system.RemoteMetricsResource on it.

I did some googling, and the only thing I'm finding is that it's possible 
the nodes are out of sync time wise.. I checked the system times, and those 
all look correct:

root@gray00 /data]# ntpdate -q ntp0
server 10.201.136.38, stratum 3, offset -0.003738, delay 0.02574
12 May 10:31:40 ntpdate[30786]: adjust time server 10.201.136.38 offset 
-0.003738 sec

[root@gray01 graylog]# ntpdate -q ntp0
server 10.201.136.38, stratum 3, offset -0.000990, delay 0.02576
12 May 10:31:41 ntpdate[28966]: adjust time server 10.201.136.38 offset 
-0.000990 sec

[root@gray02 /data]# ntpdate -q ntp0
server 10.201.136.38, stratum 3, offset 0.001198, delay 0.02582
12 May 10:31:41 ntpdate[30441]: adjust time server 10.201.136.38 offset 
0.001198 sec

Anyone have any ideas?

Here's my setup:

3 nodes: gray00, gray01, gray02 -
each node runs one instance of 
  elastic search, standard ports.
  mongodb, standard ports
  graylog-server - elastic search ports are on 9350.

F5 provides a frontend as "graylog.somewhere.com" ports 12900 & 9000 load 
balanced to each of the 3 nodes. Sticky sessions enabled on both... so what 
am I missing here?

Graylog configuration settings (same for all 3 nodes, just the listen IPs 
are different and gray01/gray02 'is_master = false')

s_master = true
node_id_file = /etc/graylog/server/node-id
password_secret = 
WQBdx6xgWTTykN9LHJhEGxfiSJbeYdaZhHhKEwbvAKQEWkVrl8lgTLvDDkfUtwhe7jgdFDFCBqpmVvY4aea1GyrbQ791UOCv
root_password_sha2 = 
e3ed009797ada49a3fd38a04069b13d5a7f62001a153ed4d9a3da22fa7a75c7b
plugin_dir = /usr/share/graylog-server/plugin
rest_listen_uri = http://10.201.137.208:12900/
rest_transport_uri = http://graylog.somewhere.com:12900/
rest_enable_gzip = true
web_listen_uri = http://10.201.137.208:9000/
web_enable_gzip = true
rotation_strategy = count
elasticsearch_max_docs_per_index = 2000
elasticsearch_max_number_of_indices = 20
retention_strategy = delete
elasticsearch_shards = 4
elasticsearch_replicas = 1
elasticsearch_index_prefix = graylog
allow_leading_wildcard_searches = false
allow_highlighting = false
elasticsearch_cluster_name = graylog
elasticsearch_node_name_prefix = graylog-
elasticsearch_discovery_zen_ping_unicast_hosts = gray00:9300, gray01:9300, 
gray02:9300, gray00:9350, gray01:9350, gray02:9350
elasticsearch_transport_tcp_port = 9350
elasticsearch_discovery_zen_ping_multicast_enabled = false
elasticsearch_network_host = gray00
elasticsearch_network_bind_host = gray00

[graylog2] 'did not find meta info for this node' error, but not timesync related?

2016-05-12 Thread Jeff McCombs


Hi gang,


  I'm running into a strange problem where my graylog nodes are complaining 
about not being able to find their meta info:


2016-05-12T11:50:09.691-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.

2016-05-12T11:50:12.878-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.

2016-05-12T11:50:13.417-07:00 WARN  [ProxiedResource] Node 
<00ac0ad1-b96f-46c0-a2bc-bc9e7a90777f> not found while trying to call 
org.graylog2.shared.rest.resources.system.RemoteMetricsResource on it.

2016-05-12T11:50:15.808-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.

2016-05-12T11:50:19.175-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.

2016-05-12T11:50:24.767-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.

2016-05-12T11:50:28.020-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.

2016-05-12T11:50:37.849-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.

2016-05-12T11:50:40.978-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.

2016-05-12T11:50:41.904-07:00 WARN  [ProxiedResource] Node 
<00ac0ad1-b96f-46c0-a2bc-bc9e7a90777f> not found while trying to call 
org.graylog2.shared.rest.resources.system.RemoteMetricsResource on it.

2016-05-12T11:50:47.400-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.

2016-05-12T11:50:50.670-07:00 WARN  [NodePingThread] Did not find meta info 
of this node. Re-registering.


In addition to the log entries above, I see occasional timeouts and errors 
in the web UI about master nodes no longer being available, or the web-UI 
just disappears for a few seconds and comes back.. I've also seen nodes 
drop in/out of the webUI.. I'm assuming these are related.


Doing some basic google searches on this, the only thing I've seen on the 
log entries, is that the time for the nodes may be out of sync.. I've 
checked this and that's not the case here. All three nodes are running NTP 
and chiming off the local ntp server on the network:


[root@gray00 /data]# ntpdate -q ntp0

server 10.201.136.38, stratum 3, offset -0.000653, delay 0.02576

12 May 12:29:33 ntpdate[317]: adjust time server 10.201.136.38 offset 
-0.000653 sec

[root@gray00 /data]# date

Thu May 12 12:30:31 PDT 2016


[root@gray01 graylog]# ntpdate -q ntp0

server 10.201.136.38, stratum 3, offset -0.000568, delay 0.02576

12 May 12:29:22 ntpdate[31508]: adjust time server 10.201.136.38 offset 
-0.000568 sec

[root@gray01 graylog]# date

Thu May 12 12:30:31 PDT 2016


[root@gray02 /data]# ntpdate -q ntp0

server 10.201.136.38, stratum 3, offset -0.55, delay 0.02580

12 May 12:29:21 ntpdate[535]: adjust time server 10.201.136.38 offset 
-0.55 sec

[root@gray02 /data]# date

Thu May 12 12:30:32 PDT 2016


So what am I doing wrong here? Is there some additional troubleshooting I 
can perform to try and pinpoint the issue? Strangely, everything is fine if 
I restart the graylog instances for about 5-10 minutes, then these log 
entries start popping back up.


Here's some deets on how I have things configured:


3x nodes - RHEL6 x64 (gray00, gray01, gray02). Installation via the repo's 
for mongo, elasticsearch, and graylog.


all three nodes run:

   elasticsearch

   mongo

   graylog


In front is an F5 LTM, Virtual IP on the F5 is known as "graylog". Services 
ports 9000, and 12900. Sticky sessions enabled on both.


Configuration data for graylog below. All nodes have the same core config 
except for "is_master=false" and IP address changes:

is_master = true

node_id_file = /etc/graylog/server/node-id

password_secret = 
WQBdx6xgWTTykN9LHJhEGxfiSJbeYdaZhHhKEwbvAKQEWkVrl8lgTLvDDkfUtwhe7jgdFDFCBqpmVvY4aea1GyrbQ791UOCv

root_password_sha2 = 
e3ed009797ada49a3fd38a04069b13d5a7f62001a153ed4d9a3da22fa7a75c7b

plugin_dir = /usr/share/graylog-server/plugin

rest_listen_uri = http://10.201.137.208:12900/

rest_transport_uri = http://graylog.somewhere.com:12900/

rest_enable_gzip = true

web_listen_uri = http://10.201.137.208:9000/

web_enable_gzip = true

rotation_strategy = count

elasticsearch_max_docs_per_index = 2000

elasticsearch_max_number_of_indices = 20

retention_strategy = delete

elasticsearch_shards = 4

elasticsearch_replicas = 1

elasticsearch_index_prefix = graylog

allow_leading_wildcard_searches = false

allow_highlighting = false

elasticsearch_cluster_name = graylog

elasticsearch_node_name_prefix = graylog-

elasticsearch_discovery_zen_ping_unicast_hosts = gray00.somewhere.com:9300, 
gray01.somewhere.com:9300, gray02.somewhere.com:9300, 
gray00.somewhere.com:9350, gray01.somewhere.com:9350, 
gray02.somewhere.com:9350

elasticsearch_transport_tcp_port = 9350

elasticsearch_discovery_zen_ping_multicast_enabled = false

elasticsearch_network_host = gray00.somewhere.com

elasticsearch_network_bind_host =