Re: [foreman-users] Lots of "Mysql2::Error: Deadlock found when trying to get lock" under increased load

2017-10-27 Thread Lukas Zapletal
Try to change to :validate => true but beware there might be dragons. I do
not remember why we set this.

LZ

On Thu, Oct 26, 2017 at 6:20 PM, 'Konstantin Orekhov' via Foreman users <
foreman-users@googlegroups.com> wrote:

> Ok. Is there anything I could do now to workaround this? The only thing
> worked for me so far was to periodically go through discovered hosts  and
> remove duplicate entries.
>
> Thanks!
> Konstantin.
>
> On Oct 26, 2017, at 07:18, Lukas Zapletal  wrote:
>
> Ok this confirms it. http://projects.theforeman.org/issues/21479 we will
> fix later.
>
> We don't have an unique index on DB level, just in Rails level and a
> second NIC with same MAC can sneak in. The relevant code in core is:
>
> validate :mac_uniqueness, :if => Proc.new { |nic| nic.managed? &&
> nic.host && nic.host.managed? && !nic.host.compute? && !nic.virtual? &&
> nic.mac.present? }
>
> which will not trigger for Discovery at all (host is not managed). In
> discovery we try to search for existing host and if not found, we will
> create new discovered host. This does not work correctly, we have turned
> off validator for some reason:
>
> host.save(:validate => false) if host.new_record?
>
> So the validation for uniqueness won't hit.
>
>
> On Wed, Oct 25, 2017 at 6:50 PM, 'Konstantin Orekhov' via Foreman users <
> foreman-users@googlegroups.com> wrote:
>
>>
>>
>>> Please use foreman-rake (I assume this is a packaged .deb install).
>>>
>>>
>> This is CentOS7 install and foreman-rake did work. Here's the result:
>>
>> [root@spc01 ~]# cd ~foreman
>> [root@spc01 foreman]# foreman-rake console
>> Successfully encrypted field for Setting::Auth oauth_consumer_key
>> Successfully decrypted field for Setting::Auth oauth_consumer_key
>> Successfully decrypted field for Setting::Auth oauth_consumer_key
>> Successfully decrypted field for Setting::Auth oauth_consumer_key
>> Successfully decrypted field for Setting::Auth oauth_consumer_key
>> Successfully encrypted field for Setting::Auth oauth_consumer_secret
>> Successfully decrypted field for Setting::Auth oauth_consumer_secret
>> Successfully decrypted field for Setting::Auth oauth_consumer_secret
>> Successfully decrypted field for Setting::Auth oauth_consumer_secret
>> Successfully decrypted field for Setting::Auth oauth_consumer_secret
>> /usr/share/foreman/lib/tasks/console.rake:6: warning: already
>> initialized constant ARGV
>> For some operations a user must be set, try User..current = User.first
>>
>> Loading production environment (Rails 4.2.5.1)
>> Failed to load console gems, starting anyway
>> irb(main):001:0> ::Nic::Managed.where(:mac => "b4:99:ba:aa:4b:64",
>> :primary => true)
>> => #> "b4:99:ba:aa:4b:64", ip: "10.8.161.191", type: "Nic::Managed", name:
>> "macb499baaa4b64", host_id: 458555, subnet_id: nil, domain_id: nil, attrs:
>> {"netmask"=>"255.255.255.0", "mtu"=>"1500", "network"=>"10.8.161.0",
>> "speed"=>"1000", "duplex"=>"full", "port"=>"Twisted Pair",
>> "auto_negotiation"=>"true", "wol"=>true}, created_at: "2017-10-20
>> 03:44:00", updated_at: "2017-10-20 03:44:02", provider: nil, username: nil,
>> password: nil, virtual: false, link: true, identifier: "eth0", tag: "",
>> attached_to: "", managed: true, mode: "balance-rr", attached_devices: "",
>> bond_options: "", primary: true, provision: true, compute_attributes: {},
>> execution: true, ip6: nil, subnet6_id: nil>]>
>> irb(main):002:0>
>>
>> However, just as in my previous example, DB has 2 different IDs with that
>> MAC:
>>
>> [root@spc01 ~]# mysql -u foreman -p$DB_PASS foreman -e "SELECT * FROM
>> hosts WHERE type = 'Host::Discovered' and NAME = 'macb499baaa4b64'\G;"
>> *** 1. row ***
>>   id: 430926
>> name: macb499baaa4b64
>> last_compile: NULL
>>  last_report: 2017-09-30 06:56:07
>>   updated_at: 2017-09-30 06:56:09
>>   created_at: 2017-03-17 14:09:15
>>root_pass: NULL
>>  architecture_id: NULL
>>   operatingsystem_id: NULL
>>   environment_id: NULL
>>ptable_id: NULL
>>medium_id: NULL
>>build: 0
>>  comment: NULL
>> disk: NULL
>> installed_at: NULL
>> model_id: 7
>>
>> hostgroup_id: NULL
>> owner_id: 10
>>   owner_type: User
>>  enabled: 1
>>   puppet_ca_proxy_id: NULL
>>  managed: 0
>>use_image: NULL
>>   image_file: NULL
>> uuid: NULL
>>  compute_resource_id: NULL
>>  puppet_proxy_id: NULL
>> certname: NULL
>> image_id: NULL
>>  organization_id: NULL
>>  location_id: NULL
>> type: Host::Discovered
>>  otp: NULL
>> realm_id: NULL
>>   compute_profile_id: NULL
>> provision_method: NULL
>>grub_pass:
>>global_status: 0
>> lookup_value_matcher: NULL
>>

Re: [foreman-users] Lots of "Mysql2::Error: Deadlock found when trying to get lock" under increased load

2017-10-26 Thread 'Konstantin Orekhov' via Foreman users
Ok. Is there anything I could do now to workaround this? The only thing worked 
for me so far was to periodically go through discovered hosts  and remove 
duplicate entries. 

Thanks!
Konstantin.

> On Oct 26, 2017, at 07:18, Lukas Zapletal  wrote:
> 
> Ok this confirms it. http://projects.theforeman.org/issues/21479 we will fix 
> later.
> 
> We don't have an unique index on DB level, just in Rails level and a second 
> NIC with same MAC can sneak in. The relevant code in core is:
> 
> validate :mac_uniqueness, :if => Proc.new { |nic| nic.managed? && 
> nic.host && nic.host.managed? && !nic.host.compute? && !nic.virtual? && 
> nic.mac.present? }
> 
> which will not trigger for Discovery at all (host is not managed). In 
> discovery we try to search for existing host and if not found, we will create 
> new discovered host. This does not work correctly, we have turned off 
> validator for some reason:
> 
> host.save(:validate => false) if host.new_record?
> 
> So the validation for uniqueness won't hit.
> 
> 
>> On Wed, Oct 25, 2017 at 6:50 PM, 'Konstantin Orekhov' via Foreman users 
>>  wrote:
>> 
>>> 
>>> Please use foreman-rake (I assume this is a packaged .deb install). 
>>> 
>> 
>> This is CentOS7 install and foreman-rake did work. Here's the result:
>> 
>> [root@spc01 ~]# cd ~foreman
>> [root@spc01 foreman]# foreman-rake console
>> Successfully encrypted field for Setting::Auth oauth_consumer_key
>> Successfully decrypted field for Setting::Auth oauth_consumer_key
>> Successfully decrypted field for Setting::Auth oauth_consumer_key
>> Successfully decrypted field for Setting::Auth oauth_consumer_key
>> Successfully decrypted field for Setting::Auth oauth_consumer_key
>> Successfully encrypted field for Setting::Auth oauth_consumer_secret
>> Successfully decrypted field for Setting::Auth oauth_consumer_secret
>> Successfully decrypted field for Setting::Auth oauth_consumer_secret
>> Successfully decrypted field for Setting::Auth oauth_consumer_secret
>> Successfully decrypted field for Setting::Auth oauth_consumer_secret
>> /usr/share/foreman/lib/tasks/console.rake:6: warning: already initialized 
>> constant ARGV
>> For some operations a user must be set, try User.current = User.first
>> Loading production environment (Rails 4.2.5.1)
>> Failed to load console gems, starting anyway
>> irb(main):001:0> ::Nic::Managed.where(:mac => "b4:99:ba:aa:4b:64", :primary 
>> => true)
>> => #> "b4:99:ba:aa:4b:64", ip: "10.8.161.191", type: "Nic::Managed", name: 
>> "macb499baaa4b64", host_id: 458555, subnet_id: nil, domain_id: nil, attrs: 
>> {"netmask"=>"255..255.255.0", "mtu"=>"1500", "network"=>"10.8.161.0", 
>> "speed"=>"1000", "duplex"=>"full", "port"=>"Twisted Pair", 
>> "auto_negotiation"=>"true", "wol"=>true}, created_at: "2017-10-20 03:44:00", 
>> updated_at: "2017-10-20 03:44:02", provider: nil, username: nil, password: 
>> nil, virtual: false, link: true, identifier: "eth0", tag: "", attached_to: 
>> "", managed: true, mode: "balance-rr", attached_devices: "", bond_options: 
>> "", primary: true, provision: true, compute_attributes: {}, execution: true, 
>> ip6: nil, subnet6_id: nil>]>
>> irb(main):002:0>
>> 
>> However, just as in my previous example, DB has 2 different IDs with that 
>> MAC:
>> 
>> [root@spc01 ~]# mysql -u foreman -p$DB_PASS foreman -e "SELECT * FROM hosts 
>> WHERE type = 'Host::Discovered' and NAME = 'macb499baaa4b64'\G;"
>> *** 1. row ***
>>   id: 430926
>> name: macb499baaa4b64
>> last_compile: NULL
>>  last_report: 2017-09-30 06:56:07
>>   updated_at: 2017-09-30 06:56:09
>>   created_at: 2017-03-17 14:09:15
>>root_pass: NULL
>>  architecture_id: NULL
>>   operatingsystem_id: NULL
>>   environment_id: NULL
>>ptable_id: NULL
>>medium_id: NULL
>>build: 0
>>  comment: NULL
>> disk: NULL
>> installed_at: NULL
>> model_id: 7
>> 
>> hostgroup_id: NULL
>> owner_id: 10
>>   owner_type: User
>>  enabled: 1
>>   puppet_ca_proxy_id: NULL
>>  managed: 0
>>use_image: NULL
>>   image_file: NULL
>> uuid: NULL
>>  compute_resource_id: NULL
>>  puppet_proxy_id: NULL
>> certname: NULL
>> image_id: NULL
>>  organization_id: NULL
>>  location_id: NULL
>> type: Host::Discovered
>>  otp: NULL
>> realm_id: NULL
>>   compute_profile_id: NULL
>> provision_method: NULL
>>grub_pass:
>>global_status: 0
>> lookup_value_matcher: NULL
>>discovery_rule_id: NULL
>>salt_proxy_id: NULL
>>  salt_environment_id: NULL
>>   pxe_loader: NULL
>> *** 2. row ***
>>   id: 

Re: [foreman-users] Lots of "Mysql2::Error: Deadlock found when trying to get lock" under increased load

2017-10-26 Thread Lukas Zapletal
Ok this confirms it. http://projects.theforeman.org/issues/21479 we will
fix later.

We don't have an unique index on DB level, just in Rails level and a second
NIC with same MAC can sneak in. The relevant code in core is:

validate :mac_uniqueness, :if => Proc.new { |nic| nic.managed? &&
nic.host && nic.host.managed? && !nic.host.compute? && !nic.virtual? &&
nic.mac.present? }

which will not trigger for Discovery at all (host is not managed). In
discovery we try to search for existing host and if not found, we will
create new discovered host. This does not work correctly, we have turned
off validator for some reason:

host.save(:validate => false) if host.new_record?

So the validation for uniqueness won't hit.


On Wed, Oct 25, 2017 at 6:50 PM, 'Konstantin Orekhov' via Foreman users <
foreman-users@googlegroups.com> wrote:

>
>
>> Please use foreman-rake (I assume this is a packaged .deb install).
>>
>>
> This is CentOS7 install and foreman-rake did work. Here's the result:
>
> [root@spc01 ~]# cd ~foreman
> [root@spc01 foreman]# foreman-rake console
> Successfully encrypted field for Setting::Auth oauth_consumer_key
> Successfully decrypted field for Setting::Auth oauth_consumer_key
> Successfully decrypted field for Setting::Auth oauth_consumer_key
> Successfully decrypted field for Setting::Auth oauth_consumer_key
> Successfully decrypted field for Setting::Auth oauth_consumer_key
> Successfully encrypted field for Setting::Auth oauth_consumer_secret
> Successfully decrypted field for Setting::Auth oauth_consumer_secret
> Successfully decrypted field for Setting::Auth oauth_consumer_secret
> Successfully decrypted field for Setting::Auth oauth_consumer_secret
> Successfully decrypted field for Setting::Auth oauth_consumer_secret
> /usr/share/foreman/lib/tasks/console.rake:6: warning: already initialized
> constant ARGV
> For some operations a user must be set, try User.current = User.first
> Loading production environment (Rails 4.2.5.1)
> Failed to load console gems, starting anyway
> irb(main):001:0> ::Nic::Managed.where(:mac => "b4:99:ba:aa:4b:64",
> :primary => true)
> => # "b4:99:ba:aa:4b:64", ip: "10.8.161.191", type: "Nic::Managed", name:
> "macb499baaa4b64", host_id: 458555, subnet_id: nil, domain_id: nil, attrs:
> {"netmask"=>"255.255.255.0", "mtu"=>"1500", "network"=>"10.8.161.0",
> "speed"=>"1000", "duplex"=>"full", "port"=>"Twisted Pair",
> "auto_negotiation"=>"true", "wol"=>true}, created_at: "2017-10-20
> 03:44:00", updated_at: "2017-10-20 03:44:02", provider: nil, username: nil,
> password: nil, virtual: false, link: true, identifier: "eth0", tag: "",
> attached_to: "", managed: true, mode: "balance-rr", attached_devices: "",
> bond_options: "", primary: true, provision: true, compute_attributes: {},
> execution: true, ip6: nil, subnet6_id: nil>]>
> irb(main):002:0>
>
> However, just as in my previous example, DB has 2 different IDs with that
> MAC:
>
> [root@spc01 ~]# mysql -u foreman -p$DB_PASS foreman -e "SELECT * FROM
> hosts WHERE type = 'Host::Discovered' and NAME = 'macb499baaa4b64'\G;"
> *** 1. row ***
>   id: 430926
> name: macb499baaa4b64
> last_compile: NULL
>  last_report: 2017-09-30 06:56:07
>   updated_at: 2017-09-30 06:56:09
>   created_at: 2017-03-17 14:09:15
>root_pass: NULL
>  architecture_id: NULL
>   operatingsystem_id: NULL
>   environment_id: NULL
>ptable_id: NULL
>medium_id: NULL
>build: 0
>  comment: NULL
> disk: NULL
> installed_at: NULL
> model_id: 7
>
> hostgroup_id: NULL
> owner_id: 10
>   owner_type: User
>  enabled: 1
>   puppet_ca_proxy_id: NULL
>  managed: 0
>use_image: NULL
>   image_file: NULL
> uuid: NULL
>  compute_resource_id: NULL
>  puppet_proxy_id: NULL
> certname: NULL
> image_id: NULL
>  organization_id: NULL
>  location_id: NULL
> type: Host::Discovered
>  otp: NULL
> realm_id: NULL
>   compute_profile_id: NULL
> provision_method: NULL
>grub_pass:
>global_status: 0
> lookup_value_matcher: NULL
>discovery_rule_id: NULL
>salt_proxy_id: NULL
>  salt_environment_id: NULL
>   pxe_loader: NULL
> *** 2. row ***
>   id: 458555
> name: macb499baaa4b64
> last_compile: NULL
>  last_report: 2017-10-25 16:47:08
>   updated_at: 2017-10-25 16:47:09
>   created_at: 2017-10-20 03:44:00
>
>root_pass: NULL
>  architecture_id: NULL
>   operatingsystem_id: NULL
>   environment_id: NULL
>ptable_id: NULL
>medium_id: NULL
>build: 0
>  comment: NULL
>  

Re: [foreman-users] Lots of "Mysql2::Error: Deadlock found when trying to get lock" under increased load

2017-10-25 Thread 'Konstantin Orekhov' via Foreman users


>
> Please use foreman-rake (I assume this is a packaged .deb install). 
>
>
This is CentOS7 install and foreman-rake did work. Here's the result:

[root@spc01 ~]# cd ~foreman
[root@spc01 foreman]# foreman-rake console
Successfully encrypted field for Setting::Auth oauth_consumer_key
Successfully decrypted field for Setting::Auth oauth_consumer_key
Successfully decrypted field for Setting::Auth oauth_consumer_key
Successfully decrypted field for Setting::Auth oauth_consumer_key
Successfully decrypted field for Setting::Auth oauth_consumer_key
Successfully encrypted field for Setting::Auth oauth_consumer_secret
Successfully decrypted field for Setting::Auth oauth_consumer_secret
Successfully decrypted field for Setting::Auth oauth_consumer_secret
Successfully decrypted field for Setting::Auth oauth_consumer_secret
Successfully decrypted field for Setting::Auth oauth_consumer_secret
/usr/share/foreman/lib/tasks/console.rake:6: warning: already initialized 
constant ARGV
For some operations a user must be set, try User.current = User.first
Loading production environment (Rails 4.2.5.1)
Failed to load console gems, starting anyway
irb(main):001:0> ::Nic::Managed.where(:mac => "b4:99:ba:aa:4b:64", :primary 
=> true)
=> #"255.255.255.0", "mtu"=>"1500", "network"=>"10.8.161.0", 
"speed"=>"1000", "duplex"=>"full", "port"=>"Twisted Pair", 
"auto_negotiation"=>"true", "wol"=>true}, created_at: "2017-10-20 
03:44:00", updated_at: "2017-10-20 03:44:02", provider: nil, username: nil, 
password: nil, virtual: false, link: true, identifier: "eth0", tag: "", 
attached_to: "", managed: true, mode: "balance-rr", attached_devices: "", 
bond_options: "", primary: true, provision: true, compute_attributes: {}, 
execution: true, ip6: nil, subnet6_id: nil>]>
irb(main):002:0>

However, just as in my previous example, DB has 2 different IDs with that 
MAC:

[root@spc01 ~]# mysql -u foreman -p$DB_PASS foreman -e "SELECT * FROM hosts 
WHERE type = 'Host::Discovered' and NAME = 'macb499baaa4b64'\G;"
*** 1. row ***
  id: 430926
name: macb499baaa4b64
last_compile: NULL
 last_report: 2017-09-30 06:56:07
  updated_at: 2017-09-30 06:56:09
  created_at: 2017-03-17 14:09:15
   root_pass: NULL
 architecture_id: NULL
  operatingsystem_id: NULL
  environment_id: NULL
   ptable_id: NULL
   medium_id: NULL
   build: 0
 comment: NULL
disk: NULL
installed_at: NULL
model_id: 7
hostgroup_id: NULL
owner_id: 10
  owner_type: User
 enabled: 1
  puppet_ca_proxy_id: NULL
 managed: 0
   use_image: NULL
  image_file: NULL
uuid: NULL
 compute_resource_id: NULL
 puppet_proxy_id: NULL
certname: NULL
image_id: NULL
 organization_id: NULL
 location_id: NULL
type: Host::Discovered
 otp: NULL
realm_id: NULL
  compute_profile_id: NULL
provision_method: NULL
   grub_pass:
   global_status: 0
lookup_value_matcher: NULL
   discovery_rule_id: NULL
   salt_proxy_id: NULL
 salt_environment_id: NULL
  pxe_loader: NULL
*** 2. row ***
  id: 458555
name: macb499baaa4b64
last_compile: NULL
 last_report: 2017-10-25 16:47:08
  updated_at: 2017-10-25 16:47:09
  created_at: 2017-10-20 03:44:00
   root_pass: NULL
 architecture_id: NULL
  operatingsystem_id: NULL
  environment_id: NULL
   ptable_id: NULL
   medium_id: NULL
   build: 0
 comment: NULL
disk: NULL
installed_at: NULL
model_id: NULL
hostgroup_id: NULL
owner_id: NULL
  owner_type: NULL
 enabled: 1
  puppet_ca_proxy_id: NULL
 managed: 0
   use_image: NULL
  image_file: NULL
uuid: NULL
 compute_resource_id: NULL
 puppet_proxy_id: NULL
certname: NULL
image_id: NULL
 organization_id: NULL
 location_id: NULL
type: Host::Discovered
 otp: NULL
realm_id: NULL
  compute_profile_id: NULL
provision_method: NULL
   grub_pass:
   global_status: 0
lookup_value_matcher: NULL
   discovery_rule_id: NULL
   salt_proxy_id: NULL
 salt_environment_id: NULL
  pxe_loader: NULL
[root@spc01 ~]#



-- 
You received this message because you are subscribed to the Google Groups 
"Foreman users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 

Re: [foreman-users] Lots of "Mysql2::Error: Deadlock found when trying to get lock" under increased load

2017-10-25 Thread Michael Moll
On Mon, Oct 23, 2017 at 05:50:52PM -0700, 'Konstantin Orekhov' via Foreman 
users wrote:
> [root@spc02 foreman]# rake --trace console

Please use foreman-rake (I assume this is a packaged .deb install).

-- 
Michael Moll

-- 
You received this message because you are subscribed to the Google Groups 
"Foreman users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to foreman-users+unsubscr...@googlegroups.com.
To post to this group, send email to foreman-users@googlegroups.com.
Visit this group at https://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.


Re: [foreman-users] Lots of "Mysql2::Error: Deadlock found when trying to get lock" under increased load

2017-10-25 Thread Lukas Zapletal
Hmm is this Debian? Should work.

LLZ

On Tue, Oct 24, 2017 at 2:50 AM, 'Konstantin Orekhov' via Foreman
users  wrote:
>
>> OK, thanks, Lukas! As soon as I get that duplicate entries show up again,
>> I'll run above and provide a result here. After patching for MySQL query
>> issue, I don't see this happening very often (which is a good thing).
>
>
> Hmm, I've got 2 duplicates, but I can't seem to run what you asked for:
>
> [root@spc02 foreman]# pwd
> /usr/share/foreman
>
> [root@spc02 foreman]# rake --trace console
> rake aborted!
> cannot load such file -- apipie/middleware/checksum_in_headers
> /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in `require'
> /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in `require'
> /usr/share/foreman/config/application.rb:5:in `'
> /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in `require'
> /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in `require'
> /usr/share/foreman/Rakefile:1:in `'
> /usr/share/gems/gems/rake-0.9.6/lib/rake/rake_module.rb:25:in `load'
> /usr/share/gems/gems/rake-0.9.6/lib/rake/rake_module.rb:25:in
> `load_rakefile'
> /usr/share/gems/gems/rake-0.9.6/lib/rake/application.rb:604:in
> `raw_load_rakefile'
> /usr/share/gems/gems/rake-0.9.6/lib/rake/application.rb:89:in `block in
> load_rakefile'
> /usr/share/gems/gems/rake-0.9.6/lib/rake/application.rb:160:in
> `standard_exception_handling'
> /usr/share/gems/gems/rake-0.9.6/lib/rake/application.rb:88:in
> `load_rakefile'
> /usr/share/gems/gems/rake-0.9.6/lib/rake/application.rb:72:in `block in run'
> /usr/share/gems/gems/rake-0.9.6/lib/rake/application.rb:160:in
> `standard_exception_handling'
> /usr/share/gems/gems/rake-0.9.6/lib/rake/application.rb:70:in `run'
> /usr/bin/rake:37:in `'
> [root@spc02 foreman]#
>
> What am I missing here?
>
> --
> You received this message because you are subscribed to the Google Groups
> "Foreman users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to foreman-users+unsubscr...@googlegroups.com.
> To post to this group, send email to foreman-users@googlegroups.com.
> Visit this group at https://groups.google.com/group/foreman-users.
> For more options, visit https://groups.google.com/d/optout.



-- 
Later,
  Lukas @lzap Zapletal

-- 
You received this message because you are subscribed to the Google Groups 
"Foreman users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to foreman-users+unsubscr...@googlegroups.com.
To post to this group, send email to foreman-users@googlegroups.com.
Visit this group at https://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.


Re: [foreman-users] Lots of "Mysql2::Error: Deadlock found when trying to get lock" under increased load

2017-10-18 Thread 'Konstantin Orekhov' via Foreman users


> In the console with both records presnet do something like:
>
> ::Nic::Managed.where(:mac => "MA:CA:DDRESS::", :primary => true)
>
>
OK, thanks, Lukas! As soon as I get that duplicate entries show up again, 
I'll run above and provide a result here. After patching for MySQL query 
issue, I don't see this happening very often (which is a good thing).

-- 
You received this message because you are subscribed to the Google Groups 
"Foreman users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to foreman-users+unsubscr...@googlegroups.com.
To post to this group, send email to foreman-users@googlegroups.com.
Visit this group at https://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.


Re: [foreman-users] Lots of "Mysql2::Error: Deadlock found when trying to get lock" under increased load

2017-10-13 Thread Lukas Zapletal
Ideally I would like to see DEBUG and SQL logs for that transactions, but
thats flooding your production server.

In the console with both records presnet do something like:

::Nic::Managed.where(:mac => "MA:CA:DDRESS::", :primary => true)

On Wed, Oct 11, 2017 at 9:08 PM, 'Konstantin Orekhov' via Foreman users <
foreman-users@googlegroups.com> wrote:

>
>
>> Theoretically if you changed configuration of bootif fact in settings,
>> this could happen. But I assume you haven't. Was it also the same FDI
>> version, just in case facter changed facts?
>>
>
> Yes, the same system with the same discovery proxy. I'm not using official
> FDI, but a netbooted Ubuntu14.04 (Casper-based) with old foreman-proxy
> version 1.9.2, smart_proxy_discovery_image-1.0.5.gem and discover-host
> from around the same time. I had trouble bringing in later versions of
> foreman-proxy to u14.04 because of dependency on ruby 2.x. If the fact that
> I run such old versions on a client side is an issue in your opinion, I can
> start the work on migrating to u16.04. But so far I have not seen an
> indication that this is an issue - duplicate entries happen not too often
> and most of the time it is working fine (and I have several thousands of
> hosts going through this), especially now with that MySQL patch you gave me
> earlier.
>
>
>> If you visit the original discovery host in the UI, do you see
>> Interfaces list? What MAC address is there? Is it detected as the
>> primary interface there?
>>
>
> Yes, all of the data is in place for both duplicate entries and look
> absolutely identical to me:
>
>
> 
>
>
> 
>
>
>
>> Do you still have both records in the DB? Can you run rake console and
>> try these statements? Or insert Rails.logger.info statements there to
>> see the code flow.
>>
>
> I do still have those in a DB, yes. But I don't have that host running a
> discover proxy anymore :(
> But if I find more duplicate entries, I don't quite understand what you
> want me to do - try which statements? Or insert logger statements where
> exactly?
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Foreman users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to foreman-users+unsubscr...@googlegroups.com.
> To post to this group, send email to foreman-users@googlegroups.com.
> Visit this group at https://groups.google.com/group/foreman-users.
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Later,
  Lukas @lzap Zapletal

-- 
You received this message because you are subscribed to the Google Groups 
"Foreman users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to foreman-users+unsubscr...@googlegroups.com.
To post to this group, send email to foreman-users@googlegroups.com.
Visit this group at https://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.


Re: [foreman-users] Lots of "Mysql2::Error: Deadlock found when trying to get lock" under increased load

2017-10-11 Thread Lukas Zapletal
Oh now I understand, looks like a new bug. I am not able to reproduce
with FDI 3.4.1 and develop Foreman. The code responsible for finding
existing record is here:

https://github.com/theforeman/foreman_discovery/blob/develop/app/models/host/discovered.rb#L53-L62

Theoretically if you changed configuration of bootif fact in settings,
this could happen. But I assume you haven't. Was it also the same FDI
version, just in case facter changed facts?

If you visit the original discovery host in the UI, do you see
Interfaces list? What MAC address is there? Is it detected as the
primary interface there?

Do you still have both records in the DB? Can you run rake console and
try these statements? Or insert Rails.logger.info statements there to
see the code flow.

On Tue, Oct 10, 2017 at 8:01 PM, 'Konstantin Orekhov' via Foreman
users  wrote:
>
>>
>> Name has already been taken - this usually means that host (either
>> discovered or managed or unmanaged) of that name "macX" already
>> exist. Same mac address? You can change easily how discovered hosts are
>> being named, by default it is "mac" + MAC address, you can change that to
>> random number or any different fact you want. See settings and our
>> documentation. Try to add a random number at the end if that helps.
>
>
> Well, no, that's the same host. It was discovered at some point back (8 days
> ago as you can see in the screenshot above). Then it got rebooted/crashed,
> PXE-booted again and started sending its discovery payload again, but gets
> 422. Usually, if it is the same host, Foreman just updates existing entry
> with new facts, or at least, the time of last report. But in the case above,
> that did not happen as for some reason Foreman created another record in a
> DB (with a different ID) for the same host. Unless I remove both of the
> records, 422s will continue to happen. I was hoping that a gist I provided
> gives you enough info on what could have caused that double-record situation
> for the same host.
>
> Here are my DB entries for the above host:
>
> [root@spc03 ~]# mysql -u foreman -p$DB_PASS foreman -e "SELECT * from hosts
> where type = 'Host::Discovered' and NAME = 'mac90e2baea5d58'\G;"
> *** 1. row ***
>   id: 446735
> name: mac90e2baea5d58
> last_compile: NULL
>  last_report: 2017-10-02 05:49:35
>   updated_at: 2017-10-02 05:49:46
>   created_at: 2017-09-15 22:44:42
>root_pass: NULL
>  architecture_id: NULL
>   operatingsystem_id: NULL
>   environment_id: NULL
>ptable_id: NULL
>medium_id: NULL
>build: 0
>  comment: NULL
> disk: NULL
> installed_at: NULL
> model_id: 6
> hostgroup_id: NULL
> owner_id: 10
>   owner_type: User
>  enabled: 1
>   puppet_ca_proxy_id: NULL
>  managed: 0
>use_image: NULL
>   image_file: NULL
> uuid: NULL
>  compute_resource_id: NULL
>  puppet_proxy_id: NULL
> certname: NULL
> image_id: NULL
>  organization_id: NULL
>  location_id: NULL
> type: Host::Discovered
>  otp: NULL
> realm_id: NULL
>   compute_profile_id: NULL
> provision_method: NULL
>grub_pass:
>global_status: 0
> lookup_value_matcher: NULL
>discovery_rule_id: NULL
>salt_proxy_id: NULL
>  salt_environment_id: NULL
>   pxe_loader: NULL
> *** 2. row ***
>   id: 456978
> name: mac90e2baea5d58
> last_compile: NULL
>  last_report: 2017-10-10 16:04:20
>   updated_at: 2017-10-10 16:04:20
>   created_at: 2017-10-07 07:13:19
>root_pass: NULL
>  architecture_id: NULL
>   operatingsystem_id: NULL
>   environment_id: NULL
>ptable_id: NULL
>medium_id: NULL
>build: 0
>  comment: NULL
> disk: NULL
> installed_at: NULL
> model_id: NULL
> hostgroup_id: NULL
> owner_id: NULL
>   owner_type: NULL
>  enabled: 1
>   puppet_ca_proxy_id: NULL
>  managed: 0
>use_image: NULL
>   image_file: NULL
> uuid: NULL
>  compute_resource_id: NULL
>  puppet_proxy_id: NULL
> certname: NULL
> image_id: NULL
>  organization_id: NULL
>  location_id: NULL
> type: Host::Discovered
>  otp: NULL
> realm_id: NULL
>   compute_profile_id: NULL
> provision_method: NULL
>grub_pass:
>global_status: 0
> lookup_value_matcher: NULL
>discovery_rule_id: NULL
>salt_proxy_id: NULL
>  salt_environment_id: NULL
>

Re: [foreman-users] Lots of "Mysql2::Error: Deadlock found when trying to get lock" under increased load

2017-10-10 Thread 'Konstantin Orekhov' via Foreman users


>
> Name has already been taken - this usually means that host (either 
> discovered or managed or unmanaged) of that name "macX" already 
> exist. Same mac address? You can change easily how discovered hosts are 
> being named, by default it is "mac" + MAC address, you can change that to 
> random number or any different fact you want. See settings and our 
> documentation. Try to add a random number at the end if that helps.
>

Well, no, that's the same host. It was discovered at some point back (8 
days ago as you can see in the screenshot above). Then it got 
rebooted/crashed, PXE-booted again and started sending its discovery 
payload again, but gets 422. Usually, if it is the same host, Foreman just 
updates existing entry with new facts, or at least, the time of last 
report. But in the case above, that did not happen as for some reason 
Foreman created another record in a DB (with a different ID) for the same 
host. Unless I remove both of the records, 422s will continue to happen. I 
was hoping that a gist I provided gives you enough info on what could have 
caused that double-record situation for the same host.

Here are my DB entries for the above host:

[root@spc03 ~]# mysql -u foreman -p$DB_PASS foreman -e "SELECT * from hosts 
where type = 'Host::Discovered' and NAME = 'mac90e2baea5d58'\G;"
*** 1. row ***
  id: 446735
name: mac90e2baea5d58
last_compile: NULL
 last_report: 2017-10-02 05:49:35
  updated_at: 2017-10-02 05:49:46
  created_at: 2017-09-15 22:44:42
   root_pass: NULL
 architecture_id: NULL
  operatingsystem_id: NULL
  environment_id: NULL
   ptable_id: NULL
   medium_id: NULL
   build: 0
 comment: NULL
disk: NULL
installed_at: NULL
model_id: 6
hostgroup_id: NULL
owner_id: 10
  owner_type: User
 enabled: 1
  puppet_ca_proxy_id: NULL
 managed: 0
   use_image: NULL
  image_file: NULL
uuid: NULL
 compute_resource_id: NULL
 puppet_proxy_id: NULL
certname: NULL
image_id: NULL
 organization_id: NULL
 location_id: NULL
type: Host::Discovered
 otp: NULL
realm_id: NULL
  compute_profile_id: NULL
provision_method: NULL
   grub_pass:
   global_status: 0
lookup_value_matcher: NULL
   discovery_rule_id: NULL
   salt_proxy_id: NULL
 salt_environment_id: NULL
  pxe_loader: NULL
*** 2. row ***
  id: 456978
name: mac90e2baea5d58
last_compile: NULL
 last_report: 2017-10-10 16:04:20
  updated_at: 2017-10-10 16:04:20
  created_at: 2017-10-07 07:13:19
   root_pass: NULL
 architecture_id: NULL
  operatingsystem_id: NULL
  environment_id: NULL
   ptable_id: NULL
   medium_id: NULL
   build: 0
 comment: NULL
disk: NULL
installed_at: NULL
model_id: NULL
hostgroup_id: NULL
owner_id: NULL
  owner_type: NULL
 enabled: 1
  puppet_ca_proxy_id: NULL
 managed: 0
   use_image: NULL
  image_file: NULL
uuid: NULL
 compute_resource_id: NULL
 puppet_proxy_id: NULL
certname: NULL
image_id: NULL
 organization_id: NULL
 location_id: NULL
type: Host::Discovered
 otp: NULL
realm_id: NULL
  compute_profile_id: NULL
provision_method: NULL
   grub_pass:
   global_status: 0
lookup_value_matcher: NULL
   discovery_rule_id: NULL
   salt_proxy_id: NULL
 salt_environment_id: NULL
  pxe_loader: NULL

-- 
You received this message because you are subscribed to the Google Groups 
"Foreman users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to foreman-users+unsubscr...@googlegroups.com.
To post to this group, send email to foreman-users@googlegroups.com.
Visit this group at https://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.


Re: [foreman-users] Lots of "Mysql2::Error: Deadlock found when trying to get lock" under increased load

2017-10-10 Thread Lukas Zapletal
Hey

I did not see any 422 error before this transaction so I think this is it.
> Although I did not see any long MySQL queries, the whole transaction still
> took ~11 seconds to complete for some reason:
>

Name has already been taken - this usually means that host (either
discovered or managed or unmanaged) of that name "macX" already
exist. Same mac address? You can change easily how discovered hosts are
being named, by default it is "mac" + MAC address, you can change that to
random number or any different fact you want. See settings and our
documentation. Try to add a random number at the end if that helps.


>
>
>> For smart proxy, there was a patch done by Dmitri who redesigned DHCP
>> parser, it's much more capable and faster now. I think this landed in
>> 1.16 RC1, yeah: http://projects.theforeman.org/issues/19441
>> 
>> (https://github.com/theforeman/smart-proxy/commit/21813c6cde
>> 0d2be10747682f1a001a7c0bd3ffb9)
>>
>
> From my side, any performance improvements for DHCP SmP is always a
> welcomed change :)
>
>
>> I did not hear about unresponsive smart-proxy processes, can you check
>> system limits (open handles etc)? SELinux? Firewall. Any proxy plugins
>> enable? Then file a redmine bug, haven't seen that.
>>
>
> That's the problem - no smoking gun that I could find. No system resource
> shortages logged, system itself is a rather beefy VM that does not even
> sweat, no firewalls, selinux set to permissive mode. I only run 3 SmP -
> bmc, dhcp and tftp.
> On top of that, since I can't replicate this at will, I have to wait until
> this issue manifests itself naturally.
>
> And just to make it clear - it is not that SmP process becomes completely
> unresponsive, but only an API-facing part. That's why I'm wondering if
> moving away from Webrick to Apache or Nginx with Passenger is a possibility.
>

Proxy is a regular Sinatra app, so any Rack servere should do the trick
(Puma perhaps). I'd try that to see if it helps. Might be bug in Webrick,
try to downgrade or upgrade it.

Another question along the same lines - is it possible to run each of the
> smart-proxies as a separate processes (listening on different ports)
> instead of one with several proxies and single port. For example, in this
> particular situation even if one SmP was having an issue, it would not
> affect the other 2, also it would also pinpoint the troubled proxy
> simplifying troubleshooting efforts.
>

We don't support that, unfortunately.

-- 
You received this message because you are subscribed to the Google Groups 
"Foreman users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to foreman-users+unsubscr...@googlegroups.com.
To post to this group, send email to foreman-users@googlegroups.com.
Visit this group at https://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.


Re: [foreman-users] Lots of "Mysql2::Error: Deadlock found when trying to get lock" under increased load

2017-10-09 Thread 'Konstantin Orekhov' via Foreman users


> please use git to find out which branches it landed in, I believe the 
> MySQL facter patch is 1.15+ only. 
>

Yes, I already found that and am planning on an upgrade in our lab instance.
BTW, even after applying a patch (on 1.14), which helped tremendously, from 
time to time I still get some duplicate entries caused by an already 
discovered system trying to send its discovery payload. For whatever reason 
though, Foreman discovery plugin does not recognize it as the same host and 
creates a new entry in a DB with different ID:



The host in question keeps of retrying, of course, and gets 422 "Name 
already taken" over and over again. My question though is why a duplicate 
was created instead of updating an existing host? It seems to me that this 
host was not recognized properly. A complete debug log of a operation, that 
I believe resulted in above duplicate entry, is here - 
https://gist.github.com/anonymous/a9073629454074c67aa4799597fe23d5
I did not see any 422 error before this transaction so I think this is it. 
Although I did not see any long MySQL queries, the whole transaction still 
took ~11 seconds to complete for some reason:

2017-10-07 01:26:16 e3da2f90 [app] [I] Completed 422 Unprocessable Entity 
in 11291ms (Views: 0.4ms | ActiveRecord: 172.9ms)
 

> For smart proxy, there was a patch done by Dmitri who redesigned DHCP 
> parser, it's much more capable and faster now. I think this landed in 
> 1.16 RC1, yeah: http://projects.theforeman.org/issues/19441 
> 
>  
> (
> https://github.com/theforeman/smart-proxy/commit/21813c6cde0d2be10747682f1a001a7c0bd3ffb9)
>  
>
>

>From my side, any performance improvements for DHCP SmP is always a 
welcomed change :)
 

> I did not hear about unresponsive smart-proxy processes, can you check 
> system limits (open handles etc)? SELinux? Firewall. Any proxy plugins 
> enable? Then file a redmine bug, haven't seen that. 
>

That's the problem - no smoking gun that I could find. No system resource 
shortages logged, system itself is a rather beefy VM that does not even 
sweat, no firewalls, selinux set to permissive mode. I only run 3 SmP - 
bmc, dhcp and tftp.
On top of that, since I can't replicate this at will, I have to wait until 
this issue manifests itself naturally.

And just to make it clear - it is not that SmP process becomes completely 
unresponsive, but only an API-facing part. That's why I'm wondering if 
moving away from Webrick to Apache or Nginx with Passenger is a possibility.
Another question along the same lines - is it possible to run each of the 
smart-proxies as a separate processes (listening on different ports) 
instead of one with several proxies and single port. For example, in this 
particular situation even if one SmP was having an issue, it would not 
affect the other 2, also it would also pinpoint the troubled proxy 
simplifying troubleshooting efforts.
 
Thanks!
Konstantin.

-- 
You received this message because you are subscribed to the Google Groups 
"Foreman users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to foreman-users+unsubscr...@googlegroups.com.
To post to this group, send email to foreman-users@googlegroups.com.
Visit this group at https://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.


Re: [foreman-users] Lots of "Mysql2::Error: Deadlock found when trying to get lock" under increased load

2017-10-09 Thread Lukas Zapletal
Hello,

please use git to find out which branches it landed in, I believe the
MySQL facter patch is 1.15+ only.

For 1.15.5 you need to talk with release engineer of this version
which is Daniel, if the changes are small enough I see no reason not
to include them. I think it's too late for 1.15.5 tho, maybe .6.

For smart proxy, there was a patch done by Dmitri who redesigned DHCP
parser, it's much more capable and faster now. I think this landed in
1.16 RC1, yeah: http://projects.theforeman.org/issues/19441
(https://github.com/theforeman/smart-proxy/commit/21813c6cde0d2be10747682f1a001a7c0bd3ffb9)

I did not hear about unresponsive smart-proxy processes, can you check
system limits (open handles etc)? SELinux? Firewall. Any proxy plugins
enable? Then file a redmine bug, haven't seen that.

On Fri, Oct 6, 2017 at 3:35 AM, 'Konstantin Orekhov' via Foreman users
 wrote:
>
>> Let us know next week if this helped. I highly suggest upgrade to
>> 1.15, it is a very solid release.
>
>
> Are there any performance improvements for Smart-Proxy in 1.15, BTW?
>
> Lately, in one of my busiest locations, we've started seeing a strange issue
> when SmP stops responding on 8443 for API calls. The process itself is
> running, logs messages are logged, just no response from it:
>
> [root@spc01 ~]# systemctl start foreman-proxy
>
> [root@spc01 ~]# date; curl --connect-timeout 30 -kSs
> https://localhost:8443/bmc; date
> Thu Oct  5 17:53:36 MST 2017
> curl: (7) Failed connect to localhost:8443; Connection refused
> Thu Oct  5 17:53:36 MST 2017
>
> It does take ~30 seconds to start up in our env because of large DHCP
> dataset, during which the connection would be refused.
>
>
> [root@spc01 ~]# date; curl --connect-timeout 30 -kSs
> https://localhost:8443/bmc; date
> Thu Oct  5 17:53:49 MST 2017
> curl: (28) NSS: client certificate not found (nickname not specified)
> Thu Oct  5 17:54:19 MST 2017
>
> Then it starts working for a very short period of time (above) and then
> stops (below).
>
> [root@spc01 ~]# date; curl --connect-timeout 30 -kSs
> https://localhost:8443/bmc; date
> Thu Oct  5 17:54:24 MST 2017
> curl: (28) Operation timed out after 30001 milliseconds with 0 out of 0
> bytes received
> Thu Oct  5 17:54:54 MST 2017
>
> So far there's nothing in proxy.log that helps me identify the issue. I
> can't replicate it at will no matter what I do - had a bunch clients hitting
> different APIs for a couple of days, nothing.
> Then today the above happens and the only thing that helped me is to move
> SmP from one node to another (I really wish DHCP SmP would allow for
> active/active horizontal scaling instead of just being limited to a single
> node).
> Strace is useless as it only give this on tracing "ruby foreman-proxy"
> process:
>
> [root@spc03 ~]# strace -p 12526
> strace: Process 12526 attached
> futex(0x184e634, FUTEX_WAIT_PRIVATE, 1, NULL^Cstrace: Process 12526 detached
>  
>
> I tried https://github.com/tmm1/rbtrace, but it is so heavy that it actually
> pretty much kills SmP by itself.
>
> Do you have any suggestions on ways to troubleshoot this? I have DEBUG
> enabled with these values:
>
> :log_buffer: 4000
> :log_buffer_errors: 2000
>
> Also, is a way to move SmP from WebBrick to Apache/Passenger if that makes
> sense at all? If so, any docs? Any other ways to increase the performance as
> it does feel like a performance issue to me.
>
> Thanks!
>
> --
> You received this message because you are subscribed to the Google Groups
> "Foreman users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to foreman-users+unsubscr...@googlegroups.com.
> To post to this group, send email to foreman-users@googlegroups.com.
> Visit this group at https://groups.google.com/group/foreman-users.
> For more options, visit https://groups.google.com/d/optout.



-- 
Later,
  Lukas @lzap Zapletal

-- 
You received this message because you are subscribed to the Google Groups 
"Foreman users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to foreman-users+unsubscr...@googlegroups.com.
To post to this group, send email to foreman-users@googlegroups.com.
Visit this group at https://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.


Re: [foreman-users] Lots of "Mysql2::Error: Deadlock found when trying to get lock" under increased load

2017-10-06 Thread Greg Sutcliffe
I've not seen that, no - I've CC'd someone who might know ;)

Greg

On Thu, 2017-10-05 at 18:35 -0700, 'Konstantin Orekhov' via Foreman
users wrote:
> > Let us know next week if this helped. I highly suggest upgrade to 
> > 1.15, it is a very solid release. 
> 
>  
> Are there any performance improvements for Smart-Proxy in 1.15, BTW?
> 
> Lately, in one of my busiest locations, we've started seeing a
> strange issue when SmP stops responding on 8443 for API calls. The
> process itself is running, logs messages are logged, just no response
> from it:
> 
> [root@spc01 ~]# systemctl start foreman-proxy
> 
> [root@spc01 ~]# date; curl --connect-timeout 30 -kSs https://localhos
> t:8443/bmc; date
> Thu Oct  5 17:53:36 MST 2017
> curl: (7) Failed connect to localhost:8443; Connection refused
> Thu Oct  5 17:53:36 MST 2017
> 
> It does take ~30 seconds to start up in our env because of large DHCP
> dataset, during which the connection would be refused.
> 
> 
> [root@spc01 ~]# date; curl --connect-timeout 30 -kSs https://localhos
> t:8443/bmc; date
> Thu Oct  5 17:53:49 MST 2017
> curl: (28) NSS: client certificate not found (nickname not specified)
> Thu Oct  5 17:54:19 MST 2017
> 
> Then it starts working for a very short period of time (above) and
> then stops (below).
> 
> [root@spc01 ~]# date; curl --connect-timeout 30 -kSs https://localhos
> t:8443/bmc; date
> Thu Oct  5 17:54:24 MST 2017
> curl: (28) Operation timed out after 30001 milliseconds with 0 out of
> 0 bytes received
> Thu Oct  5 17:54:54 MST 2017
> 
> So far there's nothing in proxy.log that helps me identify the issue.
> I can't replicate it at will no matter what I do - had a bunch
> clients hitting different APIs for a couple of days, nothing.
> Then today the above happens and the only thing that helped me is to
> move SmP from one node to another (I really wish DHCP SmP would allow
> for active/active horizontal scaling instead of just being limited to
> a single node).
> Strace is useless as it only give this on tracing "ruby foreman-
> proxy" process:
> 
> [root@spc03 ~]# strace -p 12526
> strace: Process 12526 attached
> futex(0x184e634, FUTEX_WAIT_PRIVATE, 1, NULL^Cstrace: Process 12526
> detached
>  
> 
> I tried https://github.com/tmm1/rbtrace, but it is so heavy that it
> actually pretty much kills SmP by itself.
> 
> Do you have any suggestions on ways to troubleshoot this? I have
> DEBUG enabled with these values:
> 
> :log_buffer: 4000
> :log_buffer_errors: 2000
> 
> Also, is a way to move SmP from WebBrick to Apache/Passenger if that
> makes sense at all? If so, any docs? Any other ways to increase the
> performance as it does feel like a performance issue to me.
> 
> Thanks!
> -- 
> You received this message because you are subscribed to the Google
> Groups "Foreman users" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to foreman-users+unsubscr...@googlegroups.com.
> To post to this group, send email to foreman-users@googlegroups.com.
> Visit this group at https://groups.google.com/group/foreman-users.
> For more options, visit https://groups.google.com/d/optout.

-- 
IRC / Twitter: @gwmngilfen
Diaspora: gwmngil...@joindiaspora.com

-- 
You received this message because you are subscribed to the Google Groups 
"Foreman users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to foreman-users+unsubscr...@googlegroups.com.
To post to this group, send email to foreman-users@googlegroups.com.
Visit this group at https://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.


Re: [foreman-users] Lots of "Mysql2::Error: Deadlock found when trying to get lock" under increased load

2017-10-05 Thread 'Konstantin Orekhov' via Foreman users


> Let us know next week if this helped. I highly suggest upgrade to 
> 1.15, it is a very solid release. 
>
 
Are there any performance improvements for Smart-Proxy in 1.15, BTW?

Lately, in one of my busiest locations, we've started seeing a strange 
issue when SmP stops responding on 8443 for API calls. The process itself 
is running, logs messages are logged, just no response from it:

[root@spc01 ~]# systemctl start foreman-proxy

[root@spc01 ~]# date; curl --connect-timeout 30 -kSs 
https://localhost:8443/bmc; date
Thu Oct  5 17:53:36 MST 2017
curl: (7) Failed connect to localhost:8443; Connection refused
Thu Oct  5 17:53:36 MST 2017

It does take ~30 seconds to start up in our env because of large DHCP 
dataset, during which the connection would be refused.


[root@spc01 ~]# date; curl --connect-timeout 30 -kSs 
https://localhost:8443/bmc; date
Thu Oct  5 17:53:49 MST 2017
curl: (28) NSS: client certificate not found (nickname not specified)
Thu Oct  5 17:54:19 MST 2017

Then it starts working for a very short period of time (above) and then 
stops (below).

[root@spc01 ~]# date; curl --connect-timeout 30 -kSs 
https://localhost:8443/bmc; date
Thu Oct  5 17:54:24 MST 2017
curl: (28) Operation timed out after 30001 milliseconds with 0 out of 0 
bytes received
Thu Oct  5 17:54:54 MST 2017

So far there's nothing in proxy.log that helps me identify the issue. I 
can't replicate it at will no matter what I do - had a bunch clients 
hitting different APIs for a couple of days, nothing.
Then today the above happens and the only thing that helped me is to move 
SmP from one node to another (I really wish DHCP SmP would allow for 
active/active horizontal scaling instead of just being limited to a single 
node).
Strace is useless as it only give this on tracing "ruby foreman-proxy" 
process:

[root@spc03 ~]# strace -p 12526
strace: Process 12526 attached
futex(0x184e634, FUTEX_WAIT_PRIVATE, 1, NULL^Cstrace: Process 12526 detached
 

I tried https://github.com/tmm1/rbtrace, but it is so heavy that it 
actually pretty much kills SmP by itself.

Do you have any suggestions on ways to troubleshoot this? I have DEBUG 
enabled with these values:

:log_buffer: 4000
:log_buffer_errors: 2000

Also, is a way to move SmP from WebBrick to Apache/Passenger if that makes 
sense at all? If so, any docs? Any other ways to increase the performance 
as it does feel like a performance issue to me.

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"Foreman users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to foreman-users+unsubscr...@googlegroups.com.
To post to this group, send email to foreman-users@googlegroups.com.
Visit this group at https://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.


Re: [foreman-users] Lots of "Mysql2::Error: Deadlock found when trying to get lock" under increased load

2017-10-05 Thread 'Konstantin Orekhov' via Foreman users


>
> Agreed, a proper place to hook it would be ideal, I'm just throwing 
> ideas out that might help in the short term. 
>

Sure, would be a nice thing to have to start from. 

-- 
You received this message because you are subscribed to the Google Groups 
"Foreman users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to foreman-users+unsubscr...@googlegroups.com.
To post to this group, send email to foreman-users@googlegroups.com.
Visit this group at https://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.


Re: [foreman-users] Lots of "Mysql2::Error: Deadlock found when trying to get lock" under increased load

2017-10-05 Thread 'Konstantin Orekhov' via Foreman users


> Let us know next week if this helped. I highly suggest upgrade to 
> 1.15, it is a very solid release. 
>
>
Is this patch for MySQL a part of 1.15? As you suggested, I've taken it 
from develop branch, so assumed it is not released yet.

Plus, there are other 2 things that worry me:

- user-reported bug http://projects.theforeman.org/issues/21120
- your own "heads-up" - 
https://groups.google.com/forum/#!topic/foreman-users/M_DcyFMZwxM (only one 
left as far as I can see)

Do you think all of the above make it to 1.15.5?


-- 
You received this message because you are subscribed to the Google Groups 
"Foreman users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to foreman-users+unsubscr...@googlegroups.com.
To post to this group, send email to foreman-users@googlegroups.com.
Visit this group at https://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.


Re: [foreman-users] Lots of "Mysql2::Error: Deadlock found when trying to get lock" under increased load

2017-10-05 Thread Lukas Zapletal
Let us know next week if this helped. I highly suggest upgrade to
1.15, it is a very solid release.

LZ

On Wed, Oct 4, 2017 at 11:04 PM, 'Konstantin Orekhov' via Foreman
users  wrote:
>
>> See the comment there, do you have this in your instance? If not git
>> blame the commit and apply it. You have some older version I assume.
>
>
> Yes, I'm running several 1.14.1 and 1.14.3 instances/clusters. Both had the
> same issue with deadlocks. I've updated 2 of them with above patch and was
> lucky enough to immediately observe a registration of at least 62 systems
> went through w/o a single error.
> I'll monitor things more, but so far this is huge steps forward.
>
> Thanks!
>
> --
> You received this message because you are subscribed to the Google Groups
> "Foreman users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to foreman-users+unsubscr...@googlegroups.com.
> To post to this group, send email to foreman-users@googlegroups.com.
> Visit this group at https://groups.google.com/group/foreman-users.
> For more options, visit https://groups.google.com/d/optout.



-- 
Later,
  Lukas @lzap Zapletal

-- 
You received this message because you are subscribed to the Google Groups 
"Foreman users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to foreman-users+unsubscr...@googlegroups.com.
To post to this group, send email to foreman-users@googlegroups.com.
Visit this group at https://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.


Re: [foreman-users] Lots of "Mysql2::Error: Deadlock found when trying to get lock" under increased load

2017-10-05 Thread Greg Sutcliffe
On Tue, 2017-10-03 at 16:12 -0700, 'Konstantin Orekhov' via Foreman
users wrote:
> > As Lukas says, a full refactor may well happen, and we'd love input
> > on that as we go forward. 
> 
> Any of you, guys, going to PuppetConf this year? If so, can we meet
> and have a discussion on this maybe?

I certainly won't be, sadly. I'll ask around and see if anyone is
heading down.

> > I think I agree - the hosts should keep retrying until they get a 
> > response from Foreman, but then actions can be taken. I'd probably
> > be in favour of keeping the retry (so that, say, if the offending
> > MAC is removed in Foreman, the host can register on the next
> > retry), but perhaps split the process into two calls. The first is
> > a light "am I registered?" call that returns true/false, and only
> > if false would the heavier registration call be made. Does that
> > work? 
> 
> Yes, this would definitely work. This is also is one of the states of
> a system in the state machine we talked about above.

Agreed, a proper place to hook it would be ideal, I'm just throwing
ideas out that might help in the short term. Sounds like Lukas has you
covered on the DB locking issues anyway though :P

Greg

-- 
You received this message because you are subscribed to the Google Groups 
"Foreman users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to foreman-users+unsubscr...@googlegroups.com.
To post to this group, send email to foreman-users@googlegroups.com.
Visit this group at https://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.


Re: [foreman-users] Lots of "Mysql2::Error: Deadlock found when trying to get lock" under increased load

2017-10-04 Thread 'Konstantin Orekhov' via Foreman users


> See the comment there, do you have this in your instance? If not git 
> blame the commit and apply it. You have some older version I assume. 
>

Yes, I'm running several 1.14.1 and 1.14.3 instances/clusters. Both had the 
same issue with deadlocks. I've updated 2 of them with above patch and was 
lucky enough to immediately observe a registration of at least 62 systems 
went through w/o a single error.
I'll monitor things more, but so far this is huge steps forward.

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"Foreman users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to foreman-users+unsubscr...@googlegroups.com.
To post to this group, send email to foreman-users@googlegroups.com.
Visit this group at https://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.


Re: [foreman-users] Lots of "Mysql2::Error: Deadlock found when trying to get lock" under increased load

2017-10-04 Thread Lukas Zapletal
Ok I can see there is a subselect, these are sometimes painful
particularly for MySQL. We fixed that already, see in fact_importer.rb
(this is develop branch):

  def delete_removed_facts
ActiveSupport::Notifications.instrument
"fact_importer_deleted.foreman", :host_id => host.id, :host_name =>
host.name, :   +facts => facts, :deleted => [] do |payload|
  delete_query = FactValue.joins(:fact_name).where(:host => host,
'fact_names.type' => fact_name_class.name).where.
+not('fact_names.name' => facts.keys)
  if ActiveRecord::Base.connection.adapter_name.downcase.starts_with?
'mysql'
# MySQL does not handle delete with inner query correctly
(slow) so we will do two queries on purpose
payload[:count] = @counters[:deleted] = FactValue.where(:id =>
delete_query.pluck(:id)).delete_all
  else
# deletes all facts using a single SQL query with inner query otherwise
payload[:count] = @counters[:deleted] = delete_query.delete_all
  end
end
  end

See the comment there, do you have this in your instance? If not git
blame the commit and apply it. You have some older version I assume.

On Wed, Oct 4, 2017 at 1:38 AM, 'Konstantin Orekhov' via Foreman users
 wrote:
>
>> One more idea - we have seen similar (but different tables) deadlocks
>> when a background (cron) job we ship by default attempts to delete old
>> reports. Can you check if there is any cronjob or any other process
>> doing some management of facts? Even deleting lot of data can block
>> all updates for a long time (minutes to hours). Perhaps try to disable
>> all foreman jobs and re-test.
>
>
> I have tried this to no avail. However, I think the culprit of a problem is
> in very slow DELETE MySQL query, which is apparently happens even for
> absolutely new and freshly-discovered systems as well already discovered
> ones.
>
> 2017-09-28 13:09:49 c75f5c40 [sql] [D]   SQL (50843.2ms)  DELETE FROM
> `fact_values` WHERE `fact_values`.`id` IN
>
> Please see these gists I've recorded with SQL debug enabled. I have a ton of
> hosts doing exactly the same thing - try to register, mysql delete expires
> (it takes up to 50 sec as you can see), some rollback happens and expires
> again. And so on and so forth until systems register one by one. This
> results in many empty or duplicate entries even for a small batch of systems
> coming online at the same time.
>
> https://gist.github.com/anonymous/a721e220d82f5160450e483b8776489d
>
> The above examples are taken from a single Foreman instance running against
> a regular (non-Galera) MySQL DB, so at least I can say that the fact that I
> had several Foreman instances behind a load-balancer talking to
> Galera-replicated MySQL has nothing to do with this behavior. The only
> difference is that in Galera-enabled DB, expiration errors are replaced with
> deadlock error, which makes total sense - if delete operation takes almost a
> minute, no wonder it results in some rows being locked. As load increases
> (more systems register at the same time), more and more such errors are
> happening, so I believe a proper way to deal with this is optimize MySQL
> query first and the go from there. Would you agree?
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Foreman users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to foreman-users+unsubscr...@googlegroups.com.
> To post to this group, send email to foreman-users@googlegroups.com.
> Visit this group at https://groups.google.com/group/foreman-users.
> For more options, visit https://groups.google.com/d/optout.



-- 
Later,
  Lukas @lzap Zapletal

-- 
You received this message because you are subscribed to the Google Groups 
"Foreman users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to foreman-users+unsubscr...@googlegroups.com.
To post to this group, send email to foreman-users@googlegroups.com.
Visit this group at https://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.


Re: [foreman-users] Lots of "Mysql2::Error: Deadlock found when trying to get lock" under increased load

2017-10-03 Thread 'Konstantin Orekhov' via Foreman users


> This is absolutely true. We had, at one time, considered adding a state 
> machine (or similar) to Foreman, so that such things (as well as boot 
> loops in Kickstart, and so forth) could be detected, but it was never 
> completed. 
>

State machine would be nice as it allows for more actions to be taken for a 
machine in different states. For example, in some other threads, I was 
asking about ability to use RemoteExec for discovered hosts, not just 
managed hosts as it is now.
Proper hooks for systems entering/leaving any of those states also open up 
a lot of opportunities.
 

> As Lukas says, a full refactor may well happen, and we'd love input on 
> that as we go forward. 


Any of you, guys, going to PuppetConf this year? If so, can we meet and 
have a discussion on this maybe?

I think I agree - the hosts should keep retrying until they get a 
> response from Foreman, but then actions can be taken. I'd probably be 
> in favour of keeping the retry (so that, say, if the offending MAC is 
> removed in Foreman, the host can register on the next retry), but 
> perhaps split the process into two calls. The first is a light "am I 
> registered?" call that returns true/false, and only if false would the 
> heavier registration call be made. Does that work? 
>

Yes, this would definitely work. This is also is one of the states of a 
system in the state machine we talked about above.

-- 
You received this message because you are subscribed to the Google Groups 
"Foreman users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to foreman-users+unsubscr...@googlegroups.com.
To post to this group, send email to foreman-users@googlegroups.com.
Visit this group at https://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.


Re: [foreman-users] Lots of "Mysql2::Error: Deadlock found when trying to get lock" under increased load

2017-09-26 Thread Greg Sutcliffe
A few extra thoughts on this, since a lot of it is still based on my
design from nearly 5 years ago ;)

On Wed, 2017-09-20 at 17:27 -0700, 'Konstantin Orekhov' via Foreman
users wrote:
> 
> Hmm, one generic question on this - according to above logic, if my
> managed host had crashed, say because it lost its HW RAID controller,
> for example, so it can't boot off the disk anymore thus resulting in
> PXE boot (given that BIOS boot order is set that way), correct?

> Now, by default, Foreman default pxeconfig file makes a system to
> boot off its disk, which in this particular situation will result in
> endless loop until some external (to Foreman) monitoring detects a
> system failure, then a human gets on a console and real
> troubleshooting starts only then.

This is absolutely true. We had, at one time, considered adding a state
machine (or similar) to Foreman, so that such things (as well as boot
loops in Kickstart, and so forth) could be detected, but it was never
completed.

> Now, with that in mind, I was thinking of moving actual OS
> provisioning tasks to Foreman as well. However, if crashed system
> would never be allowed to re-register (get discovered) because it is
> already managed by Foreman, the above flow is just not going to work
> anymore and I'd have re-think all flows. Are there specific reasons
> why this in place? I understand that this is how it is implemented
> now, but is there a bigger idea behind that? If so, what is it?

There were two goals - to prevent duplicates (if unprovisioned hosts
are rebooted, for example), and to allow recycling (delete a host from
Foreman, reboot it, and it'll be back in the discovered hosts list to
be re-used). Neither of these is insurmountable some other way, but
this was the easiest.

> Also, if you take my example of flows stitching for a complete system
> lifecycle management, what would you suggest we could do differently
> to allow Foreman to be a system that we use for both discovery and OS
> provisioning?

As Lukas says, a full refactor may well happen, and we'd love input on
that as we go forward. For a workaround today, I'd probably lean
towards a secondary plugin that sits on top of Discovery and interacts
with the registration process - given your example, you could add a
check if the regitraion matches a host that's already provisioned, and
take further action if so. That might also be a good way to proof-of-
concept some ideas, before merging the code back into Discovery. 

> Another thing (not as generic as above, but actually very applicable
> to my current issue) - if a client system is not allowed to register
> and given 422 error, for example, it keeps trying to register
> resulting in huge amount of work. This is also a gap, IMHO -
> discovery plug-in needs to do this differently somehow so rejected
> systems do not take away Foreman resources (see below for actual
> numbers of such attempts in one of my cluster).

I think I agree - the hosts should keep retrying until they get a
response from Foreman, but then actions can be taken. I'd probably be
in favour of keeping the retry (so that, say, if the offending MAC is
removed in Foreman, the host can register on the next retry), but
perhaps split the process into two calls. The first is a light "am I
registered?" call that returns true/false, and only if false would the
heavier registration call be made. Does that work?

Thanks!
Greg

-- 
You received this message because you are subscribed to the Google Groups 
"Foreman users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to foreman-users+unsubscr...@googlegroups.com.
To post to this group, send email to foreman-users@googlegroups.com.
Visit this group at https://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.


Re: [foreman-users] Lots of "Mysql2::Error: Deadlock found when trying to get lock" under increased load

2017-09-22 Thread Lukas Zapletal
Hey, you are absolutely right that this is huge design gap in
discovery, we are tracking a refactor ticket to redesign how
discovered hosts are stored, but this is complete change of how
discovery hosts are being provisioned (you would not be able to use
New Hosts screen for example). I think this change will happen as soon
as we redesign new host form to be a session-full wizard.

A workaround could be a setting that would attempt to delete existing
host when new one is discovered, but this would be very dangerous
thing (security related), not sure if that is feasible even via
opt-in.

In the past, we have seen these deadlocks (on fact_name or fact_value)
because this is very busy table - discovery, facter/ENC and other
plugins (katello rhsm, openscap, ansible...) are all writing there or
changing data. I am unable to tell from info you provided what is
going on - you need to dig deeper.

One more idea - we have seen similar (but different tables) deadlocks
when a background (cron) job we ship by default attempts to delete old
reports. Can you check if there is any cronjob or any other process
doing some management of facts? Even deleting lot of data can block
all updates for a long time (minutes to hours). Perhaps try to disable
all foreman jobs and re-test.

LZ

On Thu, Sep 21, 2017 at 2:27 AM, 'Konstantin Orekhov' via Foreman
users  wrote:
>
> On Wednesday, September 20, 2017 at 3:55:43 AM UTC-7, Lukas Zapletal wrote:
>>
>> A MAC address can only exist once, if you already have a
>> (managed/unmanaged) host and you try to discover a host with same MAC,
>> you will get error. Depending on Foreman discovery it is either 422 or
>> "Host already exists":
>>
>> https://github.com/theforeman/foreman_discovery/commit/210f143bc85c58caeb67e8bf9a5cc2edbe764683
>
>
> Hmm, one generic question on this - according to above logic, if my managed
> host had crashed, say because it lost its HW RAID controller, for example,
> so it can't boot off the disk anymore thus resulting in PXE boot (given that
> BIOS boot order is set that way), correct?
> Now, by default, Foreman default pxeconfig file makes a system to boot off
> its disk, which in this particular situation will result in endless loop
> until some external (to Foreman) monitoring detects a system failure, then a
> human gets on a console and real troubleshooting starts only then.
> That does not scale beyond a 100 systems or so. For this reason in our
> current setup where we *don't* use Foreman for OS provisioning but only for
> system discovery, I've updated the default pxeconfig to always load a
> discovery OS. This covers both a new systems and a crashed system scenario I
> described above. Each of discovered hosts is reported to a higher layer of
> orchestration on a after_commit event and that orchestration handles OS
> provisioning on its own so the discovered system never ends up in managed
> hosts in Foreman. Once OS provisioning is done, higher layer comes and
> deletes a host it just provisioned from discovered hosts. If orchestration
> detects that a hook call from Foreman reports a system that was previously
> provisioned, such system is automatically marked "maintenance" and HW
> diagnostics auto-started. Based on the result of that, orchestration will
> start either a HW replacement flow or a new problem troubleshooting starts.
> As you can see, humans are only involved very late in a process and only if
> auto-remediation is not possible (HW component failed, unknown signature
> detected). Otherwise, at large scale environments it is just impossible to
> attend to each of failed system individually. Such automation flow is
> allowing us to save hundreds of man-hours, as you can imagine.
> Now, with that in mind, I was thinking of moving actual OS provisioning
> tasks to Foreman as well. However, if crashed system would never be allowed
> to re-register (get discovered) because it is already managed by Foreman,
> the above flow is just not going to work anymore and I'd have re-think all
> flows. Are there specific reasons why this in place? I understand that this
> is how it is implemented now, but is there a bigger idea behind that? If so,
> what is it? Also, if you take my example of flows stitching for a complete
> system lifecycle management, what would you suggest we could do differently
> to allow Foreman to be a system that we use for both discovery and OS
> provisioning?
>
> Another thing (not as generic as above, but actually very applicable to my
> current issue) - if a client system is not allowed to register and given 422
> error, for example, it keeps trying to register resulting in huge amount of
> work. This is also a gap, IMHO - discovery plug-in needs to do this
> differently somehow so rejected systems do not take away Foreman resources
> (see below for actual numbers of such attempts in one of my cluster).
>
>>
>> Anyway you wrote you have deadlocks, but in the log snippet I do see
>> that you 

Re: [foreman-users] Lots of "Mysql2::Error: Deadlock found when trying to get lock" under increased load

2017-09-20 Thread 'Konstantin Orekhov' via Foreman users

On Wednesday, September 20, 2017 at 3:55:43 AM UTC-7, Lukas Zapletal wrote:
>
> A MAC address can only exist once, if you already have a 
> (managed/unmanaged) host and you try to discover a host with same MAC, 
> you will get error. Depending on Foreman discovery it is either 422 or 
> "Host already exists": 
>
> https://github.com/theforeman/foreman_discovery/commit/210f143bc85c58caeb67e8bf9a5cc2edbe764683
>  
>

Hmm, one generic question on this - according to above logic, if my managed 
host had crashed, say because it lost its HW RAID controller, for example, 
so it can't boot off the disk anymore thus resulting in PXE boot (given 
that BIOS boot order is set that way), correct?
Now, by default, Foreman default pxeconfig file makes a system to boot off 
its disk, which in this particular situation will result in endless loop 
until some external (to Foreman) monitoring detects a system failure, then 
a human gets on a console and real troubleshooting starts only then.
That does not scale beyond a 100 systems or so. For this reason in our 
current setup where we *don't* use Foreman for OS provisioning but only for 
system discovery, I've updated the default pxeconfig to always load a 
discovery OS. This covers both a new systems and a crashed system scenario 
I described above. Each of discovered hosts is reported to a higher layer 
of orchestration on a after_commit event and that orchestration handles OS 
provisioning on its own so the discovered system never ends up in managed 
hosts in Foreman. Once OS provisioning is done, higher layer comes and 
deletes a host it just provisioned from discovered hosts. If orchestration 
detects that a hook call from Foreman reports a system that was previously 
provisioned, such system is automatically marked "maintenance" and HW 
diagnostics auto-started. Based on the result of that, orchestration will 
start either a HW replacement flow or a new problem troubleshooting starts. 
As you can see, humans are only involved very late in a process and only if 
auto-remediation is not possible (HW component failed, unknown signature 
detected). Otherwise, at large scale environments it is just impossible to 
attend to each of failed system individually. Such automation flow is 
allowing us to save hundreds of man-hours, as you can imagine.
Now, with that in mind, I was thinking of moving actual OS provisioning 
tasks to Foreman as well. However, if crashed system would never be allowed 
to re-register (get discovered) because it is already managed by Foreman, 
the above flow is just not going to work anymore and I'd have re-think all 
flows. Are there specific reasons why this in place? I understand that this 
is how it is implemented now, but is there a bigger idea behind that? If 
so, what is it? Also, if you take my example of flows stitching for a 
complete system lifecycle management, what would you suggest we could do 
differently to allow Foreman to be a system that we use for both discovery 
and OS provisioning?

Another thing (not as generic as above, but actually very applicable to my 
current issue) - if a client system is not allowed to register and given 
422 error, for example, it keeps trying to register resulting in huge 
amount of work. This is also a gap, IMHO - discovery plug-in needs to do 
this differently somehow so rejected systems do not take away Foreman 
resources (see below for actual numbers of such attempts in one of my 
cluster).
 

> Anyway you wrote you have deadlocks, but in the log snippet I do see 
> that you have host discovery at rate 1-2 imports per minute. This 
> cannot block anything, this is quite slow rate. I don't understand, 
> can you pastebin log snippet from the peak time when you have these 
> deadlocks? 
>

After more digging I've done after this issue was reported to me, it does 
not look to me as load-related. Even with low number of registrations, I 
see a high rate of deadlocks. I took another Foreman cluster (3 active 
nodes as well) and see the following activity as it pertains to system 
discovery (since 3:30am this morning):

[root@spc01 ~]# grep "/api/v2/discovered_hosts/facts" 
/var/log/foreman/production.log | wc -l
282

[root@spc02 ~]# grep "/api/v2/discovered_hosts/facts" 
/var/log/foreman/production.log | wc -l
2278

[root@spc03 ~]# grep "/api/v2/discovered_hosts/facts" 
/var/log/foreman/production.log | wc -l
143

These are the numbers of attempts rejected (all of them are 422s):

[root@spc01 ~]# grep Entity /var/log/foreman/production.log | wc -l
110

[root@spc02 ~]# grep Entity /var/log/foreman/production.log | wc -l
2182

[root@spc03 ~]# grep Entity /var/log/foreman/production.log | wc -l
57

A number of deadlocks:

[root@spc01 ~]# grep -i deadlock /var/log/foreman/production.log | wc -l
59

[root@spc02 ~]# grep -i deadlock /var/log/foreman/production.log | wc -l
31

[root@spc03 ~]# grep -i deadlock /var/log/foreman/production.log | wc -l
30

Actual deadlock messages are here - 

Re: [foreman-users] Lots of "Mysql2::Error: Deadlock found when trying to get lock" under increased load

2017-09-20 Thread Lukas Zapletal
A MAC address can only exist once, if you already have a
(managed/unmanaged) host and you try to discover a host with same MAC,
you will get error. Depending on Foreman discovery it is either 422 or
"Host already exists":
https://github.com/theforeman/foreman_discovery/commit/210f143bc85c58caeb67e8bf9a5cc2edbe764683

Anyway you wrote you have deadlocks, but in the log snippet I do see
that you have host discovery at rate 1-2 imports per minute. This
cannot block anything, this is quite slow rate. I don't understand,
can you pastebin log snippet from the peak time when you have these
deadlocks?



On Tue, Sep 19, 2017 at 10:03 PM, 'Konstantin Orekhov' via Foreman
users  wrote:
> After I got the debug output, I've deleted this host from Foreman and on its
> next attempt it got registered perfectly fine - no issues with interfaces or
> anything anymore:
>
> (on a client side)
>
> Discovered by URL: https://spc.vip
> Registering host with Foreman (https://spc.vip)
> Response from Foreman 201: {"id":447371,"name":"mac3cfdfe52252c" 
>
> (on Foreman side):
> 2017-09-19 12:39:34 7ca37aca [app] [I] Started DELETE
> "/discovered_hosts/mac3cfdfe52252c" for 10.102.141.20 at 2017-09-19 12:39:34
> -0700
> 2017-09-19 12:39:34 7ca37aca [app] [I]   Parameters:
> {"authenticity_token"=>"", "id"=>"mac3cfdfe52252c"}
>
> 2017-09-19 12:40:04 1a346c39 [app] [I] Started POST
> "/api/v2/discovered_hosts/facts" for 10.102.141.20 at 2017-09-19 12:40:04
> -0700
> 2017-09-19 12:40:04 1a346c39 [app] [I] Processing by
> Api::V2::DiscoveredHostsController#facts as JSON
> 2017-09-19 12:40:04 1a346c39 [app] [I]   Parameters: {"facts"=>"[FILTERED]",
> "apiv"=>"v2", "discovered_host"=>{"facts"=>"[FILTERED]"}}
> 2017-09-19 12:40:06 1a346c39 [audit] [I] [mac3cfdfe52252c] deleted 0
> (1694.6ms)
> 2017-09-19 12:40:06 1a346c39 [audit] [I] [mac3cfdfe52252c] updated 0 (2.6ms)
> 2017-09-19 12:40:07 1a346c39 [audit] [I] [mac3cfdfe52252c] added 385
> (1637.5ms)
> 2017-09-19 12:40:07 1a346c39 [app] [I] Import facts for 'mac3cfdfe52252c'
> completed. Added: 385, Updated: 0, Deleted 0 facts
>
> It would be nice to figure out what's causing this in a first place - I do
> see a lot of those "Unprocessable Entity" messages logged.
> Thanks!
>
> --
> You received this message because you are subscribed to the Google Groups
> "Foreman users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to foreman-users+unsubscr...@googlegroups.com.
> To post to this group, send email to foreman-users@googlegroups.com.
> Visit this group at https://groups.google.com/group/foreman-users.
> For more options, visit https://groups.google.com/d/optout.



-- 
Later,
  Lukas @lzap Zapletal

-- 
You received this message because you are subscribed to the Google Groups 
"Foreman users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to foreman-users+unsubscr...@googlegroups.com.
To post to this group, send email to foreman-users@googlegroups.com.
Visit this group at https://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.


Re: [foreman-users] Lots of "Mysql2::Error: Deadlock found when trying to get lock" under increased load

2017-09-20 Thread Lukas Zapletal
Well no, the biggest update was for 1.14 there:

http://projects.theforeman.org/issues/9016

That focused on memory consumption tho, there was a little speedup but
nothing big.

On Tue, Sep 19, 2017 at 10:05 PM, 'Konstantin Orekhov' via Foreman
users  wrote:
> BTW, Lukas, you mentioned that some improvements were made in 1.14. I am
> running 1.14.1 and 1.14.3.
> Did you mean 1.15 maybe? Should I even consider an upgrade to help resolve
> this situation?
>
> --
> You received this message because you are subscribed to the Google Groups
> "Foreman users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to foreman-users+unsubscr...@googlegroups.com.
> To post to this group, send email to foreman-users@googlegroups.com.
> Visit this group at https://groups.google.com/group/foreman-users.
> For more options, visit https://groups.google.com/d/optout.



-- 
Later,
  Lukas @lzap Zapletal

-- 
You received this message because you are subscribed to the Google Groups 
"Foreman users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to foreman-users+unsubscr...@googlegroups.com.
To post to this group, send email to foreman-users@googlegroups.com.
Visit this group at https://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.


Re: [foreman-users] Lots of "Mysql2::Error: Deadlock found when trying to get lock" under increased load

2017-09-19 Thread 'Konstantin Orekhov' via Foreman users
BTW, Lukas, you mentioned that some improvements were made in 1.14. I am 
running 1.14.1 and 1.14.3.
Did you mean 1.15 maybe? Should I even consider an upgrade to help resolve 
this situation?

-- 
You received this message because you are subscribed to the Google Groups 
"Foreman users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to foreman-users+unsubscr...@googlegroups.com.
To post to this group, send email to foreman-users@googlegroups.com.
Visit this group at https://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.


Re: [foreman-users] Lots of "Mysql2::Error: Deadlock found when trying to get lock" under increased load

2017-09-19 Thread 'Konstantin Orekhov' via Foreman users
After I got the debug output, I've deleted this host from Foreman and on 
its next attempt it got registered perfectly fine - no issues with 
interfaces or anything anymore:

(on a client side)

Discovered by URL: https://spc.vip
Registering host with Foreman (https://spc.vip)
Response from Foreman 201: {"id":447371,"name":"mac3cfdfe52252c" 

(on Foreman side):
2017-09-19 12:39:34 7ca37aca [app] [I] Started DELETE 
"/discovered_hosts/mac3cfdfe52252c" for 10.102.141.20 at 2017-09-19 12:39:34 
-0700
2017-09-19 12:39:34 7ca37aca [app] [I]   Parameters: 
{"authenticity_token"=>"", "id"=>"mac3cfdfe52252c"}

2017-09-19 12:40:04 1a346c39 [app] [I] Started POST 
"/api/v2/discovered_hosts/facts" for 10.102.141.20 at 2017-09-19 12:40:04 -0700
2017-09-19 12:40:04 1a346c39 [app] [I] Processing by 
Api::V2::DiscoveredHostsController#facts as JSON
2017-09-19 12:40:04 1a346c39 [app] [I]   Parameters: {"facts"=>"[FILTERED]", 
"apiv"=>"v2", "discovered_host"=>{"facts"=>"[FILTERED]"}}
2017-09-19 12:40:06 1a346c39 [audit] [I] [mac3cfdfe52252c] deleted 0 (1694.6ms)
2017-09-19 12:40:06 1a346c39 [audit] [I] [mac3cfdfe52252c] updated 0 (2.6ms)
2017-09-19 12:40:07 1a346c39 [audit] [I] [mac3cfdfe52252c] added 385 (1637.5ms)
2017-09-19 12:40:07 1a346c39 [app] [I] Import facts for 'mac3cfdfe52252c' 
completed. Added: 385, Updated: 0, Deleted 0 facts

It would be nice to figure out what's causing this in a first place - I do see 
a lot of those "Unprocessable Entity" messages logged.
Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"Foreman users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to foreman-users+unsubscr...@googlegroups.com.
To post to this group, send email to foreman-users@googlegroups.com.
Visit this group at https://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.


Re: [foreman-users] Lots of "Mysql2::Error: Deadlock found when trying to get lock" under increased load

2017-09-19 Thread 'Konstantin Orekhov' via Foreman users
Here you go, Lukas (just one host that can't register and keeps on 
retrying):

2017-09-19 11:45:55 5de80a14 [audit] [I] [mac3cfdfe52252c] deleted 0 
(1898.9ms)
2017-09-19 11:45:56 5de80a14 [audit] [I] [mac3cfdfe52252c] updated 0 
(575.8ms)
2017-09-19 11:45:56 5de80a14 [audit] [I] [mac3cfdfe52252c] added 0 (3.2ms)
2017-09-19 11:45:56 5de80a14 [app] [I] Import facts for 'mac3cfdfe52252c' 
completed. Added: 0, Updated: 0, Deleted 0 facts
2017-09-19 11:46:28 1c0d90c8 [audit] [I] [mac3cfdfe52252c] deleted 0 
(1787.2ms)
2017-09-19 11:46:29 1c0d90c8 [audit] [I] [mac3cfdfe52252c] updated 0 
(495.7ms)
2017-09-19 11:46:29 1c0d90c8 [audit] [I] [mac3cfdfe52252c] added 0 (2.9ms)
2017-09-19 11:46:29 1c0d90c8 [app] [I] Import facts for 'mac3cfdfe52252c' 
completed. Added: 0, Updated: 0, Deleted 0 facts
2017-09-19 11:47:02 7de14afb [audit] [I] [mac3cfdfe52252c] deleted 0 
(1705.5ms)
2017-09-19 11:47:02 7de14afb [audit] [I] [mac3cfdfe52252c] updated 0 
(612.4ms)
2017-09-19 11:47:02 7de14afb [audit] [I] [mac3cfdfe52252c] added 0 (4.2ms)
2017-09-19 11:47:02 7de14afb [app] [I] Import facts for 'mac3cfdfe52252c' 
completed. Added: 0, Updated: 0, Deleted 0 facts
2017-09-19 11:47:35 51585eea [audit] [I] [mac3cfdfe52252c] deleted 0 
(1755.6ms)
2017-09-19 11:47:36 51585eea [audit] [I] [mac3cfdfe52252c] updated 0 
(1187.4ms)
2017-09-19 11:47:36 51585eea [audit] [I] [mac3cfdfe52252c] added 0 (5.3ms)
2017-09-19 11:47:36 51585eea [app] [I] Import facts for 'mac3cfdfe52252c' 
completed. Added: 0, Updated: 0, Deleted 0 facts
2017-09-19 11:48:09 4b643a56 [audit] [I] [mac3cfdfe52252c] deleted 0 
(1895.9ms)
2017-09-19 11:48:10 4b643a56 [audit] [I] [mac3cfdfe52252c] updated 0 
(536.7ms)
2017-09-19 11:48:10 4b643a56 [audit] [I] [mac3cfdfe52252c] added 0 (4.4ms)
2017-09-19 11:48:10 4b643a56 [app] [I] Import facts for 'mac3cfdfe52252c' 
completed. Added: 0, Updated: 0, Deleted 0 facts
2017-09-19 11:48:48 8bc4666b [audit] [I] [mac3cfdfe52252c] deleted 0 
(1653.8ms)
2017-09-19 11:48:48 8bc4666b [audit] [I] [mac3cfdfe52252c] updated 0 
(708.9ms)
2017-09-19 11:48:48 8bc4666b [audit] [I] [mac3cfdfe52252c] added 0 (4.1ms)
2017-09-19 11:48:48 8bc4666b [app] [I] Import facts for 'mac3cfdfe52252c' 
completed. Added: 0, Updated: 0, Deleted 0 facts
2017-09-19 11:49:21 c58afee7 [audit] [I] [mac3cfdfe52252c] deleted 0 
(1739.9ms)
2017-09-19 11:49:22 c58afee7 [audit] [I] [mac3cfdfe52252c] updated 0 
(862.5ms)
2017-09-19 11:49:22 c58afee7 [audit] [I] [mac3cfdfe52252c] added 0 (3.1ms)
2017-09-19 11:49:22 c58afee7 [app] [I] Import facts for 'mac3cfdfe52252c' 
completed. Added: 0, Updated: 0, Deleted 0 facts

On a client side (mac3cfdfe52252c), in foreman-discovery log, I see these 
messages:

Discovered by URL: https://
Registering host with Foreman (https://)
Response from Foreman 422: {"message":"Validation failed: Interfaces some 
interfaces are invalid"}

Over and over again.

On Foreman side:

# grep 323e78f4 /var/log/foreman/production.log
2017-09-19 11:51:00 323e78f4 [app] [I] Started POST 
"/api/v2/discovered_hosts/facts" for 10.102.141.20 at 2017-09-19 11:51:00 -0700
2017-09-19 11:51:00 323e78f4 [app] [I] Processing by 
Api::V2::DiscoveredHostsController#facts as JSON
2017-09-19 11:51:00 323e78f4 [app] [I]   Parameters: {"facts"=>"[FILTERED]", 
"apiv"=>"v2", "discovered_host"=>{"facts"=>"[FILTERED]"}}
2017-09-19 11:51:02 323e78f4 [audit] [I] [mac3cfdfe52252c] deleted 0 (1718.3ms)
2017-09-19 11:51:03 323e78f4 [audit] [I] [mac3cfdfe52252c] updated 0 (613.2ms)
2017-09-19 11:51:03 323e78f4 [audit] [I] [mac3cfdfe52252c] added 0 (3.6ms)
2017-09-19 11:51:03 323e78f4 [app] [I] Import facts for 'mac3cfdfe52252c' 
completed. Added: 0, Updated: 0, Deleted 0 facts
2017-09-19 11:51:03 323e78f4 [app] [W] Subnet could not be detected for 
10.212.36.110
2017-09-19 11:51:03 323e78f4 [app] [W] Host discovery failed, facts: 

2017-09-19 11:51:03 323e78f4 [app] [I] Completed 422 Unprocessable Entity in 
2823ms (Views: 0.7ms | ActiveRecord: 1923.9ms)

I'm going to enable a debug level to see if I can get more data on why some 
interfaces are considered to be invalid here. If you know - please let me know.
Also, if you need to see all facts sent from a client system - please let me 
know too.

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"Foreman users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to foreman-users+unsubscr...@googlegroups.com.
To post to this group, send email to foreman-users@googlegroups.com.
Visit this group at https://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.


Re: [foreman-users] Lots of "Mysql2::Error: Deadlock found when trying to get lock" under increased load

2017-09-19 Thread Lukas Zapletal
I would rather fix importing code to be faster than doing async, that
is the last resort.

Konstantin, thanks for analysis. Our import code is slow indeed, we
improved it a bit in 1.14. Note we mostly test this on PostgreSQL. For
each import, there is a log in INFO level about how much time was
spent in each phase of import (delete, add, update). Can you share the
numbers there?

What happens *I think* is that by default every node tries to update
facts every 5 minutes. At that scale, you need to increase this to
more resonable value.

When Foreman is busy, these requests can stack up. We are not using
transactions, so imports can fail leaving incorrect records.

LZ

On Tue, Sep 19, 2017 at 8:12 AM, Ohad Levy  wrote:
>
>
> On Fri, Sep 15, 2017 at 8:56 PM, 'Konstantin Orekhov' via Foreman users
>  wrote:
>>
>>
>>>
>>> what kind of load do you have? Puppet? Facter? Is that ENC? Something
>>> else?
>>>
>>> Can you tell which requests are slow from logs or monitoring?
>>>
>>
>> Yes, I should have mentioned that - there's very little puppet and ENC
>> work done by this cluster at this point (more is coming soon though). Host
>> discovery is by far the largest workload - 7600 discovered systems at this
>> point. The last spike that I saw the impact to overall flows was when
>> 300-400 systems were trying to register at the same time. Because of the
>> deadlocks, about 200-300 systems could not register repeatedly and had to
>> keep retrying for a rather long time.
>> Rather often these registration attempts would end up creating either
>> duplicate entries with the same "mac" but different IDs in a DB or an
>> "empty" discovery host entry. Both of these would prevent a system
>> successfully register unless they are removed (I had to write a little
>> script that runs from the cron to do so). Here are the examples of an
>> "empty" record and duplicate ones (as they get deleted):
>>
>
> Lukas - how about we change discovery to be async? e.g. import all new
> discovered systems into active job and than process than one / multiple at a
> time? I assume this would require a image change too (so it knows when the
> discovery"job" is done)
>>
>> {
>> "id": 437923,
>> "name": "mac90e2bae6cc70",
>> "last_compile": null,
>> "last_report": null,
>> "updated_at": "2017-08-22T07:08:54.000Z",
>> "created_at": "2017-08-22T07:08:54.000Z",
>> "root_pass": "",
>> "architecture_id": null,
>> "operatingsystem_id": null,
>> "environment_id": null,
>> "ptable_id": null,
>> "medium_id": null,
>> "build": false,
>> "comment": null,
>> "disk": null,
>> "installed_at": null,
>> "model_id": null,
>> "hostgroup_id": null,
>> "owner_id": null,
>> "owner_type": null,
>> "enabled": true,
>> "puppet_ca_proxy_id": null,
>> "managed": false,
>> "use_image": null,
>> "image_file": null,
>> "uuid": null,
>> "compute_resource_id": null,
>> "puppet_proxy_id": null,
>> "certname": null,
>> "image_id": null,
>> "organization_id": null,
>> "location_id": null,
>> "otp": null,
>> "realm_id": null,
>> "compute_profile_id": null,
>> "provision_method": null,
>> "grub_pass": "",
>> "global_status": 0,
>> "lookup_value_matcher": null,
>> "discovery_rule_id": null,
>> "salt_proxy_id": null,
>> "salt_environment_id": null,
>> "pxe_loader": null
>> }
>>
>> Duplicates (usually the later duplicate would be an empty one as well, but
>> not all the time):
>>
>> {
>>   "id": 430090,
>>   "name": "mac3417ebe3f8f1",
>>   "last_compile": null,
>>   "last_report": "2017-09-14T19:47:55.000Z",
>>   "updated_at": "2017-09-14T19:47:57.000Z",
>>   "created_at": "2017-03-08T20:24:05.000Z",
>>   "root_pass": "",
>>   "architecture_id": null,
>>   "operatingsystem_id": null,
>>   "environment_id": null,
>>   "ptable_id": null,
>>   "medium_id": null,
>>   "build": false,
>>   "comment": null,
>>   "disk": null,
>>   "installed_at": null,
>>   "model_id": 3,
>>   "hostgroup_id": null,
>>   "owner_id": 10,
>>   "owner_type": "User",
>>   "enabled": true,
>>   "puppet_ca_proxy_id": null,
>>   "managed": false,
>>   "use_image": null,
>>   "image_file": null,
>>   "uuid": null,
>>   "compute_resource_id": null,
>>   "puppet_proxy_id": null,
>>   "certname": null,
>>   "image_id": null,
>>   "organization_id": null,
>>   "location_id": null,
>>   "otp": null,
>>   "realm_id": null,
>>   "compute_profile_id": null,
>>   "provision_method": null,
>>   "grub_pass": "",
>>   "global_status": 0,
>>   "lookup_value_matcher": null,
>>   "discovery_rule_id": null,
>>   "salt_proxy_id": null,
>>   "salt_environment_id": null,
>>   "pxe_loader": null
>> }
>> {
>>   "id": 438146,
>>   "name": "mac3417ebe3f8f1",
>>   "last_compile": null,
>>   "last_report": "2017-09-11T08:58:05.000Z",
>>   "updated_at": "2017-09-11T08:58:07.000Z",
>>   "created_at": 

Re: [foreman-users] Lots of "Mysql2::Error: Deadlock found when trying to get lock" under increased load

2017-09-19 Thread Ohad Levy
On Fri, Sep 15, 2017 at 8:56 PM, 'Konstantin Orekhov' via Foreman users <
foreman-users@googlegroups.com> wrote:

>
>
>> what kind of load do you have? Puppet? Facter? Is that ENC? Something
>> else?
>>
>> Can you tell which requests are slow from logs or monitoring?
>>
>>
> Yes, I should have mentioned that - there's very little puppet and ENC
> work done by this cluster at this point (more is coming soon though). Host
> discovery is by far the largest workload - 7600 discovered systems at this
> point. The last spike that I saw the impact to overall flows was when
> 300-400 systems were trying to register at the same time. Because of the
> deadlocks, about 200-300 systems could not register repeatedly and had to
> keep retrying for a rather long time.
> Rather often these registration attempts would end up creating either
> duplicate entries with the same "mac" but different IDs in a DB or an
> "empty" discovery host entry. Both of these would prevent a system
> successfully register unless they are removed (I had to write a little
> script that runs from the cron to do so). Here are the examples of an
> "empty" record and duplicate ones (as they get deleted):
>
>
Lukas - how about we change discovery to be async? e.g. import all new
discovered systems into active job and than process than one / multiple at
a time? I assume this would require a image change too (so it knows when
the discovery"job" is done)

> {
> "id": 437923,
> "name": "mac90e2bae6cc70",
> "last_compile": null,
> "last_report": null,
> "updated_at": "2017-08-22T07:08:54.000Z",
> "created_at": "2017-08-22T07:08:54.000Z",
> "root_pass": "",
> "architecture_id": null,
> "operatingsystem_id": null,
> "environment_id": null,
> "ptable_id": null,
> "medium_id": null,
> "build": false,
> "comment": null,
> "disk": null,
> "installed_at": null,
> "model_id": null,
> "hostgroup_id": null,
> "owner_id": null,
> "owner_type": null,
> "enabled": true,
> "puppet_ca_proxy_id": null,
> "managed": false,
> "use_image": null,
> "image_file": null,
> "uuid": null,
> "compute_resource_id": null,
> "puppet_proxy_id": null,
> "certname": null,
> "image_id": null,
> "organization_id": null,
> "location_id": null,
> "otp": null,
> "realm_id": null,
> "compute_profile_id": null,
> "provision_method": null,
> "grub_pass": "",
> "global_status": 0,
> "lookup_value_matcher": null,
> "discovery_rule_id": null,
> "salt_proxy_id": null,
> "salt_environment_id": null,
> "pxe_loader": null
> }
>
> Duplicates (usually the later duplicate would be an empty one as well, but
> not all the time):
>
> {
>   "id": 430090,
>   "name": "mac3417ebe3f8f1",
>   "last_compile": null,
>   "last_report": "2017-09-14T19:47:55.000Z",
>   "updated_at": "2017-09-14T19:47:57.000Z",
>   "created_at": "2017-03-08T20:24:05.000Z",
>   "root_pass": "",
>   "architecture_id": null,
>   "operatingsystem_id": null,
>   "environment_id": null,
>   "ptable_id": null,
>   "medium_id": null,
>   "build": false,
>   "comment": null,
>   "disk": null,
>   "installed_at": null,
>   "model_id": 3,
>   "hostgroup_id": null,
>   "owner_id": 10,
>   "owner_type": "User",
>   "enabled": true,
>   "puppet_ca_proxy_id": null,
>   "managed": false,
>   "use_image": null,
>   "image_file": null,
>   "uuid": null,
>   "compute_resource_id": null,
>   "puppet_proxy_id": null,
>   "certname": null,
>   "image_id": null,
>   "organization_id": null,
>   "location_id": null,
>   "otp": null,
>   "realm_id": null,
>   "compute_profile_id": null,
>   "provision_method": null,
>   "grub_pass": "",
>   "global_status": 0,
>   "lookup_value_matcher": null,
>   "discovery_rule_id": null,
>   "salt_proxy_id": null,
>   "salt_environment_id": null,
>   "pxe_loader": null
> }
> {
>   "id": 438146,
>   "name": "mac3417ebe3f8f1",
>   "last_compile": null,
>   "last_report": "2017-09-11T08:58:05.000Z",
>   "updated_at": "2017-09-11T08:58:07.000Z",
>   "created_at": "2017-08-24T19:44:23.000Z",
>   "root_pass": "",
>   "architecture_id": null,
>   "operatingsystem_id": null,
>   "environment_id": null,
>   "ptable_id": null,
>   "medium_id": null,
>   "build": false,
>   "comment": null,
>   "disk": null,
>   "installed_at": null,
>   "model_id": null,
>   "hostgroup_id": null,
>   "owner_id": null,
>   "owner_type": null,
>   "enabled": true,
>   "puppet_ca_proxy_id": null,
>   "managed": false,
>   "use_image": null,
>   "image_file": null,
>   "uuid": null,
>   "compute_resource_id": null,
>   "puppet_proxy_id": null,
>   "certname": null,
>   "image_id": null,
>   "organization_id": null,
>   "location_id": null,
>   "otp": null,
>   "realm_id": null,
>   "compute_profile_id": null,
>   "provision_method": null,
>   "grub_pass": "",
>   "global_status": 0,
>   "lookup_value_matcher": null,
>   "discovery_rule_id": null,
>   "salt_proxy_id": null,
>   

Re: [foreman-users] Lots of "Mysql2::Error: Deadlock found when trying to get lock" under increased load

2017-09-18 Thread 'Konstantin Orekhov' via Foreman users
This is how it looks in WebUI:

Empties:


And most of those empties would also have duplicates in this form:

-- 
You received this message because you are subscribed to the Google Groups 
"Foreman users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to foreman-users+unsubscr...@googlegroups.com.
To post to this group, send email to foreman-users@googlegroups.com.
Visit this group at https://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.


Re: [foreman-users] Lots of "Mysql2::Error: Deadlock found when trying to get lock" under increased load

2017-09-15 Thread 'Konstantin Orekhov' via Foreman users


>
> what kind of load do you have? Puppet? Facter? Is that ENC? Something 
> else? 
>
> Can you tell which requests are slow from logs or monitoring? 
>
>
Yes, I should have mentioned that - there's very little puppet and ENC work 
done by this cluster at this point (more is coming soon though). Host 
discovery is by far the largest workload - 7600 discovered systems at this 
point. The last spike that I saw the impact to overall flows was when 
300-400 systems were trying to register at the same time. Because of the 
deadlocks, about 200-300 systems could not register repeatedly and had to 
keep retrying for a rather long time.
Rather often these registration attempts would end up creating either 
duplicate entries with the same "mac" but different IDs in a DB or an 
"empty" discovery host entry. Both of these would prevent a system 
successfully register unless they are removed (I had to write a little 
script that runs from the cron to do so). Here are the examples of an 
"empty" record and duplicate ones (as they get deleted):

{
"id": 437923,
"name": "mac90e2bae6cc70",
"last_compile": null,
"last_report": null,
"updated_at": "2017-08-22T07:08:54.000Z",
"created_at": "2017-08-22T07:08:54.000Z",
"root_pass": "",
"architecture_id": null,
"operatingsystem_id": null,
"environment_id": null,
"ptable_id": null,
"medium_id": null,
"build": false,
"comment": null,
"disk": null,
"installed_at": null,
"model_id": null,
"hostgroup_id": null,
"owner_id": null,
"owner_type": null,
"enabled": true,
"puppet_ca_proxy_id": null,
"managed": false,
"use_image": null,
"image_file": null,
"uuid": null,
"compute_resource_id": null,
"puppet_proxy_id": null,
"certname": null,
"image_id": null,
"organization_id": null,
"location_id": null,
"otp": null,
"realm_id": null,
"compute_profile_id": null,
"provision_method": null,
"grub_pass": "",
"global_status": 0,
"lookup_value_matcher": null,
"discovery_rule_id": null,
"salt_proxy_id": null,
"salt_environment_id": null,
"pxe_loader": null
}

Duplicates (usually the later duplicate would be an empty one as well, but 
not all the time):

{
  "id": 430090,
  "name": "mac3417ebe3f8f1",
  "last_compile": null,
  "last_report": "2017-09-14T19:47:55.000Z",
  "updated_at": "2017-09-14T19:47:57.000Z",
  "created_at": "2017-03-08T20:24:05.000Z",
  "root_pass": "",
  "architecture_id": null,
  "operatingsystem_id": null,
  "environment_id": null,
  "ptable_id": null,
  "medium_id": null,
  "build": false,
  "comment": null,
  "disk": null,
  "installed_at": null,
  "model_id": 3,
  "hostgroup_id": null,
  "owner_id": 10,
  "owner_type": "User",
  "enabled": true,
  "puppet_ca_proxy_id": null,
  "managed": false,
  "use_image": null,
  "image_file": null,
  "uuid": null,
  "compute_resource_id": null,
  "puppet_proxy_id": null,
  "certname": null,
  "image_id": null,
  "organization_id": null,
  "location_id": null,
  "otp": null,
  "realm_id": null,
  "compute_profile_id": null,
  "provision_method": null,
  "grub_pass": "",
  "global_status": 0,
  "lookup_value_matcher": null,
  "discovery_rule_id": null,
  "salt_proxy_id": null,
  "salt_environment_id": null,
  "pxe_loader": null
}
{
  "id": 438146,
  "name": "mac3417ebe3f8f1",
  "last_compile": null,
  "last_report": "2017-09-11T08:58:05.000Z",
  "updated_at": "2017-09-11T08:58:07.000Z",
  "created_at": "2017-08-24T19:44:23.000Z",
  "root_pass": "",
  "architecture_id": null,
  "operatingsystem_id": null,
  "environment_id": null,
  "ptable_id": null,
  "medium_id": null,
  "build": false,
  "comment": null,
  "disk": null,
  "installed_at": null,
  "model_id": null,
  "hostgroup_id": null,
  "owner_id": null,
  "owner_type": null,
  "enabled": true,
  "puppet_ca_proxy_id": null,
  "managed": false,
  "use_image": null,
  "image_file": null,
  "uuid": null,
  "compute_resource_id": null,
  "puppet_proxy_id": null,
  "certname": null,
  "image_id": null,
  "organization_id": null,
  "location_id": null,
  "otp": null,
  "realm_id": null,
  "compute_profile_id": null,
  "provision_method": null,
  "grub_pass": "",
  "global_status": 0,
  "lookup_value_matcher": null,
  "discovery_rule_id": null,
  "salt_proxy_id": null,
  "salt_environment_id": null,
  "pxe_loader": null
}

I can't tell if any queries are slow - can you remind me how to do that?
Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"Foreman users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to foreman-users+unsubscr...@googlegroups.com.
To post to this group, send email to foreman-users@googlegroups.com.
Visit this group at https://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.


Re: [foreman-users] Lots of "Mysql2::Error: Deadlock found when trying to get lock" under increased load

2017-09-15 Thread Lukas Zapletal
Hey,

what kind of load do you have? Puppet? Facter? Is that ENC? Something else?

Can you tell which requests are slow from logs or monitoring?

LZ

On Fri, Sep 15, 2017 at 3:35 AM, 'Konstantin Orekhov' via Foreman
users  wrote:
>
> Hi, all!
>
> Under increased load (which comes in spikes), I noticed lots of mysql
> deadlock errors resulting in failed transactions and incorrectly discovered
> systems (duplicate and/or empty entries in discovered_hosts I reported in
> this group some time ago, just can't find those posts for some reason).
>
> Anyway, these are the type of messages I receive:
>
> 2017-09-14 15:01:13 173c1d40 [app] [E] Fact processor37 could not be
> imported because of Mysql2::Error: Deadlock found when trying to get lock;
> try restarting transaction: SELECT  1 AS one FROM `fact_values` WHERE
> (`fact_values`.`fact_name_id` = BINARY 248 AND `fact_values`.`host_id` =
> 446074) LIMIT 1
> 2017-09-14 15:01:14 173c1d40 [audit] [I] [mac90e2bae93da0] added 353
> (2693.0ms)
> 2017-09-14 15:01:14 173c1d40 [app] [W] Error during fact import for
> mac90e2bae93da0
>  | ActiveRecord::StatementInvalid: Mysql2::Error: Deadlock found when trying
> to get lock; try restarting transaction: SELECT  1 AS one FROM `fact_values`
> WHERE (`fact_values`.`fact_name_id` = BINARY 248 AND `fact_values`.`host_id`
> = 446074) LIMIT 1
>  |
> /opt/theforeman/tfm/root/usr/share/gems/gems/mysql2-0.4.5/lib/mysql2/client.rb:120:in
> `_query'
>  |
> /opt/theforeman/tfm/root/usr/share/gems/gems/mysql2-0.4.5/lib/mysql2/client.rb:120:in
> `block in query'
>  |
> /opt/theforeman/tfm/root/usr/share/gems/gems/mysql2-0.4.5/lib/mysql2/client.rb:119:in
> `handle_interrupt'
>  |
> /opt/theforeman/tfm/root/usr/share/gems/gems/mysql2-0.4.5/lib/mysql2/client.rb:119:in
> `query'
> 
>
> I do run an active/active cluster of 3 1.14.x Foreman VMs with replicated
> MariaDB Mysql backend.
>
> I saw a couple of people were the same questions in the IRC chat, but I
> could not find any responses to that over there.
>
> Anyone has any suggestions/recommendations? Anything like
> https://github.com/qertoip/transaction_retry is planned to be used instead
> of failing transactions in Foreman?
>
> Thanks!
>
> --
> You received this message because you are subscribed to the Google Groups
> "Foreman users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to foreman-users+unsubscr...@googlegroups.com.
> To post to this group, send email to foreman-users@googlegroups.com.
> Visit this group at https://groups.google.com/group/foreman-users.
> For more options, visit https://groups.google.com/d/optout.



-- 
Later,
  Lukas @lzap Zapletal

-- 
You received this message because you are subscribed to the Google Groups 
"Foreman users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to foreman-users+unsubscr...@googlegroups.com.
To post to this group, send email to foreman-users@googlegroups.com.
Visit this group at https://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.