Hello group:
We eventually performed the test John suggested and we caught  the "thief" 
-> virtual.rb 
We didn't even try to analyze why it is hanging the machine. Due to the 
fact that this facter is not being used in ours recipes we just dropped it 
out.
Thanks for you help. 


On Wednesday, November 28, 2012 5:41:00 PM UTC+1, Montse Seisdedos wrote:
>
> Hello John, 
> Your assumption is ok.
> I can not do the facter loop because we are in a production environment. 
> Every time I run puppet on this machines I make sure I can reach its IPMI 
> interface so I can reboot the machine in few minutes.
> Thanks for you help
> Regards.
>
>
> 2012/11/28 jcbollinger <john.bo...@stjude.org <javascript:>>
>
>>
>>
>> On Wednesday, November 28, 2012 4:49:13 AM UTC-6, Mon wrote:
>>>
>>> Hello John,
>>> Thanks for your answer. I have open an issue with my hardward 
>>> manufacturer and so I will do it with my SO one.
>>> Anyway I paste the strace listings so maybe someone can shed light on it:
>>>
>>> server1:
>>>
>>> BIOS: American Megatrends Inc. 1.2       
>>> SYS: Supermicro X8SIE
>>> CPU: Intel(R) Core(TM) i3 CPU 550 @ 3.20GHz [4 cores]
>>> MEM:
>>>   SLOT0  2048 MB
>>>   SLOT1  2048 MB
>>>
>>>
>>> open("/usr/lib/ruby/1.8/**facter/osfamily.rb", O_RDONLY|O_LARGEFILE) = 3
>>> close(3) = 0
>>> open("/usr/lib/ruby/1.8/**facter/osfamily.rb", O_RDONLY|O_LARGEFILE) = 3
>>> fstat64(3, {st_mode=S_IFREG|0644, st_size=800, ...}) = 0
>>> mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 
>>> 0) = 0xb7297000
>>> read(3, "# Fact: osfamily\n#\n# Purpose: Re"..., 4096) = 800
>>> ......CRASH 
>>>
>>>
>>> server2:
>>>
>>> BIOS: American Megatrends Inc. 1.2       
>>> SYS: Supermicro X8SIE
>>> CPU: Intel(R) Core(TM) i3 CPU 560 @ 3.33GHz [4 cores]
>>> MEM:
>>>   SLOT0  2048 MB
>>>   SLOT1  2048 MB
>>>
>>>
>>>
>>> stat64("/usr/sbin/dmidecode", {st_mode=S_IFREG|0755, st_size=48408, 
>>> ...}) = 0
>>> pipe([3, 4]) = 0
>>> clone(child_stack=0, 
>>> flags=CLONE_CHILD_CLEARTID|**CLONE_CHILD_SETTID|SIGCHLD, 
>>> child_tidptr=0xb74e5ba8) = 8709
>>> close(4) = 0
>>> fcntl64(3, F_GETFL) = 0 (flags O_RDONLY)
>>> fstat64(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
>>> mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 
>>> 0) = 0xb725e000
>>> _llseek(3, 0, 0xbf900930, SEEK_CUR) = -1 ESPIPE(Illegal seek)
>>> fstat64(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
>>> read(3, "# dmidecode 2.9\nSMBIOS 2.6 prese"..., 1024) = 1024
>>> read(3, "oot is supported\n\t\tBIOS boot spe"..., 1024) = 1024
>>> read(3, "tate: Safe\n\tThermal State: Safe\n"..., 1024) = 1024
>>> read(3, "Maximum Size: 128 KB\n\tSupported "..., 1024) = 1024
>>> read(3, "e 5, 28 bytes\nMemory Controller "..., 1024) = 1024
>>> read(3, " Installed\n\tError Status: OK\n\nHa"..., 1024) = 1024
>>> read(3, " type 8, 9 bytes\nPort Connector "..., 1024) = 1024
>>> read(3, "ternal Reference Designator: LPT"..., 1024) = 1024
>>> read(3, "nal Reference Designator: Not Sp"..., 1024) = 1024
>>> read(3, "nator: Not Specified\n\tExternal C"..., 1024) = 1024
>>> read(3, "or Type: None\n\tPort Type: Other\n"..., 1024) = 1024
>>> read(3, "ector Information\n\tInternal Refe"..., 1024) = 1024
>>> read(3, "\tLength: Short\n\tID: 1\n\tCharacter"..., 1024) = 1024
>>> read(3, "escriptor 5: POST error\n\tData Fo"..., 1024) = 1024
>>> read(3, "ype 19, 15 bytes\nMemory Array Ma"..., 1024) = 1024
>>> read(3, " Width: Unknown\n\tSize: No Module"..., 1024) = 1024
>>> read(3, "ry Device Mapped Address\n\tStarti"..., 1024) = 1024
>>> read(3, "on Handle: Not Provided\n\tTotal W"..., 1024) = 1024
>>> --- SIGCHLD (Child exited) @ 0 (0) ---
>>> read(3, "\n\nHandle 0x0039, DMI type 20, 19"..., 1024) = 1024
>>> read(3, "on-recoverable Threshold: 6\n\nHan"..., 1024) = 1024
>>> read(3, "UT OF SPEC>\n\tCooling Unit Group:"..., 1024) = 1024
>>> read(3, "ed: Yes\n\tHot Replaceable: No\n\tCo"..., 1024) = 669
>>> read(3, "", 1024) = 0
>>> close(3) = 0
>>> munmap(0xb725e000, 4096) = 0
>>> rt_sigaction(SIGHUP, {SIG_IGN}, {0xb77388f0, [HUP], SA_RESTART}, 8) = 0
>>> rt_sigaction(SIGQUIT, {SIG_IGN}, {0xb77388f0, [QUIT], SA_RESTART}, 8) = 0
>>> rt_sigaction(SIGINT, {SIG_IGN}, {0xb77388f0, [INT], SA_RESTART}, 8) = 0
>>> waitpid(8709, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 8709
>>> rt_sigaction(SIGHUP, {0xb77388f0, [HUP], SA_RESTART}, {SIG_IGN}, 8) = 0
>>> rt_sigaction(SIGQUIT, {0xb77388f0, [QUIT], SA_RESTART}, {SIG_IGN}, 8) = 0
>>> rt_sigaction(SIGINT, {0xb77388f0, [INT], SA_RESTART}, {SIG_IGN}, 8) = 0
>>> ............
>>> sigprocmask(SIG_SETMASK, [], NULL) = 0
>>> sigprocmask(SIG_BLOCK, NULL, []) = 0
>>> sigprocmask(SIG_BLOCK, NULL, []) = 0
>>> sigprocmask(SIG_BLOCK, NULL, []) = 0
>>> sigprocmask(SIG_SETMASK, [], NULL) = 0
>>> sigprocmask(SIG_BLOCK, NULL, []) = 0
>>> sigprocmask(SIG_BLOCK, NULL, []) = 0
>>> sigprocmask(SIG_BLOCK, NULL, []) = 0
>>> .............
>>> sigprocmask(SIG_BLOCK, NULL, []) = 0
>>> sigprocmask(SIG_BLOCK, NULL, []) = 0
>>> sigprocmask(SIG_BLOCK, NULL, []) = 0
>>> sigprocmask(SIG_BLOCK, NULL, []) = 0
>>> sigprocmask(SIG_BLOCK, NULL, []) = 0
>>> sigprocmask(SIG_BLOCK, NULL, []) = 0
>>> sigprocmask(SIG_SETMASK, [], NULL) = 0
>>> sigprocmask(SIG_BLOCK, NULL, []) = 0
>>> sigprocmask(SIG_BLOCK, NULL, []) = 0
>>> .........
>>> sigprocmask(SIG_BLOCK, NULL, []) = 0
>>> sigprocmask(SIG_BLOCK, NULL, []) = 0
>>> .......CRASH
>>>
>>>
>> I'm supposing that ".......CRASH" means "more of the same syscall, with 
>> similar results, until the trace ends on account of a system crash.
>>
>> The second trace says nothing useful, as far as I can tell.  The last 
>> thing it shows before all the signal mask handling is the successful 
>> completion of a fact evaluation.
>>
>> The first trace is not much more helpful.  The last thing it shows is 
>> Facter reading the Ruby code for the 'osfamily' fact.  That might indicate 
>> that it is during evaluation of that fact that the system crashed, but it's 
>> too far removed from fact evaluation for me to have any confidence in that.
>>
>> My bet would be that the crash cuts off communication before its cause is 
>> reported in the trace, as I warned might be the case.
>>
>> Here's another thing you could try: since facter doesn't always crash the 
>> system (if I understand correctly), you should be able to get a list of all 
>> the facts it is evaluating (and their values) by running "facter -p" from 
>> the command line.  Take that list, and use it to stress test facter on each 
>> fact individually (i.e. run facter -p <factname> many times in a loop), in 
>> a way that lets you be sure you always know which fact is currently under 
>> test.  In this way you may be able to identify one or more facts whose 
>> evaluation sometimes crashes the machine.
>>
>> Note: don't neglect the "or more" above.  It is conceivable that your 
>> problem is deeper than just one fact.
>>
>> Once you know the facts with which the problem is associated, we can 
>> investigate the commands facter is running, and thereby narrow down the 
>> cause of the crash.
>>
>>
>> John
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Puppet Users" group.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msg/puppet-users/-/B7AKDJ-7U40J.
>>
>> To post to this group, send email to puppet...@googlegroups.com<javascript:>
>> .
>> To unsubscribe from this group, send email to 
>> puppet-users...@googlegroups.com <javascript:>.
>> For more options, visit this group at 
>> http://groups.google.com/group/puppet-users?hl=en.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-users+unsubscr...@googlegroups.com.
To post to this group, send email to puppet-users@googlegroups.com.
Visit this group at http://groups.google.com/group/puppet-users.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to