Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash

2012-03-29 Thread Doug Barton
On 3/28/2012 1:59 PM, Mark Felder wrote:
 FreeBSD 8-STABLE, 8.3, and 9.0 are untested

As much as I'm sensitive to your production requirements, realistically
it's not likely that you'll get a helpful result without testing a newer
version. 8.2 came out over a year ago, many many things have changed
since then.

Doug
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash

2012-03-29 Thread Michael Powell
Mark Felder wrote:

 Alright guys, I'm at the end of my rope here. For those that haven't seen
 my previous emails here's the (not so) quick breakdown:
 
 Overview:
 
 FreeBSD ?? - 7.4 never crash
 FreeBSD 8.0 - 8.2 crashes
 FreeBSD 8-STABLE, 8.3, and 9.0 are untested (Sorry, not possible in our
 production at this time, and we were hoping we could base some stuff on
 8.3 for long term stability...)
 ESXi: Confirmed ESXi 4.0 - 5.0 has this problem. Haven't tested on others.
 
[snip]
 
 
 I think we've finally found enough data that this is definitely something
 in the FreeBSD world. I'm going to begin prepping some of the known crashy
 servers with more debugging. Any suggestions on what I should build the
 kernel with? They never do a proper panic, but I definitely want to at
 least *try* to get into the debugger the next time it crashes. And when it
 crashes, what the heck should I be running? I've never played with the KDB
 before...
 
 
 Thank you for any suggestions and help you can give me

I am definitely out of my league here and this is way over my head, to be 
sure. Just a couple of shots in the dark for possibly covering a couple more 
data points for your research. And I am a tad fuzzy on both as I have never 
needed to dig into either because I've not had any trouble with either.

IIRC there are three different timer subsystems one may choose from. You may 
want to look into expirementation with each of the three, just to see if 
this changes any observed behaviors. Or to possibly rule it out. 

Your situation sounds like a candidate for reverse logic - if I can't get 
any handle on what's wrong I start at the opposite end and try to make a 
list of what is right in an attempt to leave a smaller subset to probe.

I also think this most likely has nothing to do with what's happening, but 
for some reason it just pops into my head. Try disabling msi in 
/boot/loader.conf like this:

hw.pci.enable_msi=0
hw.pci.enable_msix=0

At least if it makes no difference maybe this will exclude it from being a 
'possible'. Developers who are more in-depth aware of what the differences 
are between 7.x and 8.x/9.x in the development timeline can probably provide 
a better picture so as to narrow the field of what to look at. This is way 
over my head, just wish I could help - I know and have experienced the kind 
of quandary you have here (I feel for you).   :-)

-Mike
 

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: need info builing ports properly

2012-03-29 Thread Julian H. Stacey
Mark Felder wrote:
 My second suggestion is to please never ever ever mess with CFLAGS on  
 FreeBSD. You can get away with it on some Linux distros, but FreeBSD  
 strongly discourages it.

Not true. 

eg I've set various CFLAGS for years.

What FreeBSD requires is if one sets either CFLAGS or env vars then
experiences problems, one should Unset them  try again before
reporting bugs.

Cheers,
Julian
-- 
Julian Stacey, BSD Unix Linux C Sys Eng Consultants Munich http://berklix.com
 Reply below not above, cumulative like a play script,  indent with  .
 Format: Plain text. Not HTML, multipart/alternative, base64, quoted-printable.
Mail from @yahoo dumped @berklix.  http://berklix.org/yahoo/
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


How to suppress PAM/sshd root login warnings?

2012-03-29 Thread Duckbreath
My system has root login via sshd disabled, and it is going to stay disabled.

I don't care if the whole of the entire internet tries to login as root, 
because:

Root login is disabled.

However, syslog likes to print little warnings on my console, and in my 
auth.log, everytime some bot tries.
I would like to disable the displaying and logging of these messages.  Login 
attempts on non-root accounts are of interest to me, so I don't want to disable 
those messages; I only want to disable messages in which the attempt is the 
root user, because:

Root login is disabled.

There is no way I am going to ban all the bots, forget that.  I'm not getting 
pam_abl or some other auto-black list solution.  Not going there.  I'm OK with 
them trying, I just want to stop seeing the messages.

Can anyone help me out with this?
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash

2012-03-29 Thread Mark Felder
On Wed, 28 Mar 2012 18:31:38 -0500, Adrian Chadd adr...@freebsd.org  
wrote:



* have you filed a PR?


No


* is the crash easily reproducable?


Unfortunately not. It's totally random. Some servers will get the bug  
and crash daily, some will crash weekly, some might seem to be fine but 3  
months later hit this crash.



* are you able to boot some ramdisk-only FreeBSD-8.2 images (eg create
a ramdisk image using nanobsd?) and do some stress testing inside
that?


That's a plan I'd like to execute but my free time for building that  
environment is rather short at the moment :(



I'm not that
cluey on ESXi, but there may be some PIC/APIC/ACPI change between 7.x
and 8.0 which has caused this to surface.


Was there a setting to revert ACPI behavior from 8.x to 7.x? I thought I  
read about that at one point or perhaps this was something available  
back in the dev cycle when 8 was -CURRENT. *shrug* I know 9.0 and onward  
has even more ACPI changes so assuming it truly is an ACPI bug I guess we  
could cross our fingers and hope that the bug has mysteriously vanished?

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash

2012-03-29 Thread Mark Felder

On Thu, 29 Mar 2012 02:36:49 -0500, Doug Barton do...@freebsd.org wrote:


As much as I'm sensitive to your production requirements, realistically
it's not likely that you'll get a helpful result without testing a newer
version. 8.2 came out over a year ago, many many things have changed
since then.


The sad part is that VMWare's supported FreeBSD versions are a joke, and  
we've been trying to keep VMWare happy by only running supported  
versions. I honestly don't think they even test. It's so stupid.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash

2012-03-29 Thread Mark Felder
Thank you for the suggestion. We'll put it in our toolbox and see if it  
helps!

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash

2012-03-29 Thread Mark Felder
Alright, new data. It happened to crash about 10 minutes after I came in  
this morning and I ran some stuff in the DDB. I have no idea what  
information is useful, but perhaps someone will see something out of the  
ordinary?



http://feld.me/freebsd/esx_crash/


Thanks...
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Vivaldi Tablet

2012-03-29 Thread Polytropon
On Wed, 28 Mar 2012 16:24:51 -0700, Gary Kline wrote:
   i dont have a clue what a chording keybd is;

This kind of keyboard uses key combination of its FEWER keys
to generate characters (or even syllables or words). The
name chorded is used synonymously with instruments like
the guitar where you use one hand to hold down certain
strings in a defined manner, and then it plays a chord
like A major or D minor.

There's an initial article about it on WP:

http://en.wikipedia.org/wiki/Chorded_keyboard

This kind of keyboard is typically used by court recorders
in the US. They are trained to record whole conversations
in real time directly onto paper. By bressing three, four
or more keys at a time, a specific output is generated by
the device. It's often called stenotype, because it's like
typing in stenography, emphasizing that's a phonetic code
in the foreground.

http://en.wikipedia.org/wiki/Stenotype

http://upload.wikimedia.org/wikipedia/commons/4/40/Stenkeys.gif

http://upload.wikimedia.org/wikipedia/en/c/cf/Steno-example.gif

Also typewriters for blind persons use this approach. The
model Erika Picht Portable (paper format DIN A5 I think)
is  still well known to me. There's also a regular (DIN A4)
model, produced by Schreibmaschinenwerke Dresden (type-
writer works Dresden), part of the combinate robotron.
Those machines are _stiill_ produced in Dresden.

http://www.aph.org/museum/images/braillewriters/30.jpg

http://petitmuseedubraille.free.fr/_machines-braille/images/_m15a.jpg

http://www.gfai-sachsen.de/images/Erika-Picht_MultiTech-E511_800.jpg

Input devices with comparable key layouts are also available
for the PC, but instead of stenotype, they generate regular
characters.



   i v much like this vivaldi 7 tablet, just as-is.  i wonder
   if a future 7inch model could have more memory Along with a
   slide-in kybd.  slide out and work: edit, use ffox,
   konsole or xterms, then slide back in place. this tablet
   could replace the ipad, nook, asus.  

Interesting thought. Maybe it wouldn't target home commodity
users in the first place, but a sliding keyboard could be
a benefit for professional users who want to do more than
just watching movies on such a thing. It would also help
to bring the concept of separating input and output to the
device in a physical manner (because it might be useful
in certain conditions when your fingers aren't located
at places where you are supposed to read something), and
STILL keeping the regular touch interface (no real separation)
available, intact and unbroken.



-- 
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash

2012-03-29 Thread Joe Greco
 Hi,
 
 * have you filed a PR?
 * is the crash easily reproducable?
 * are you able to boot some ramdisk-only FreeBSD-8.2 images (eg create
 a ramdisk image using nanobsd?) and do some stress testing inside
 that?
 
 It sounds like you've established it's a storage issue, or at least
 interrupt handling for storage issue. So I'd definitely try the
 ramdisk-only boot and thrash it using lighttpd/httperf or something.
 If that survives fine, I'd look at trying to establish whether there's
 something wrong in the disk driver(s) freebsd is using. I'm not that
 cluey on ESXi, but there may be some PIC/APIC/ACPI change between 7.x
 and 8.0 which has caused this to surface.

We've seen this.  Or something that seems really like it.

We run dozens of FreeBSD VM's, many of which are 8.mumble.  We have a
scripted build environment dating back many years, so generally servers
come out in a fairly reproducible form.

After several months of smooth running, we had need to shuffle some
things around, and migrated some servers to a different datastore.
Suddenly, one particular VM, our corp Jabber server, started randomly 
disconnecting people every morning.  Some inspection showed that the
machine was running, but disk I/O in the VM was freezing up.  
Subsequent inspection suggested that it was happening during the 
periodic daily, though we never managed to get it to happen by manually 
forcing periodic daily, so that's only a theory.  Given that several 
times it appeared that one of the find commands was running, I was 
guessing that something in the thin provisioned disk image for the 
system had gone bad, but reading the entire disk with dd didn't cause 
a hang, running the periodic daily by hand didn't cause a hang, etc.

Migrating the VM to a different host and datastore did not fix the
issue.  Migrating the VM from an Opteron to a Xeon host with all the
latest ESXi 4 patches also didn't make any difference.  Migrating the
disk image from thin to full seemed to fix it, but I only gave it a
day or two, then decided there were other good reasons to reload the
VM, so I nuked the VM, which, of course, fixed it.

In the meantime, a dozen other similar VM's alongside it run just
fine.  My conclusion was that it was something specific that had gone
awry in the virtual machine, probably in the disk image, but I could
not identify it without significant digging that I had no particular
reason or inclination to do; since it appeared to be a VMware problem,
the reload it and be done with it seemed the quickest path to 
resolution.

That having been said, if anyone has any brilliant ideas about what 
would constitute useful further steps to isolate this, I can look at
recovering the faulty VM from backup and seeing if it still exhibits
the problem.

... JG
-- 
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again. - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash

2012-03-29 Thread Hans Petter Selasky
On Thursday 29 March 2012 15:42:42 Joe Greco wrote:
  Hi,

Do both 32- and 64-bit versions of FreeBSD crash?

--HPS
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash

2012-03-29 Thread Joe Greco
 On 3/28/2012 1:59 PM, Mark Felder wrote:
  FreeBSD 8-STABLE, 8.3, and 9.0 are untested
 
 As much as I'm sensitive to your production requirements, realistically
 it's not likely that you'll get a helpful result without testing a newer
 version. 8.2 came out over a year ago, many many things have changed
 since then.
 
 Doug

So you're saying that he should have been using 8.3-RELEASE, then.

If you'll kindly go over to http://www.freebsd.org and look under
Latest Releases, please note that 8.2 is a production release.
If you don't want it to be a production release, then find a way
to make it so, but please don't snipe at people who are using the
code that the FreeBSD project has indicated is a current production
offering.

There are many good reasons not to run arbitrary snapshots on your
production gear.  It's unrealistic to expect people to run non-
RELEASE non-production code on their production gear.  We can have
that discussion if you don't understand that, drop me a note off-
list and I'll be happy to explain it.

Otherwise, you've told him to run a newer version, of which NONE
IS AVAILABLE, unless you're thinking 9.0, but FreeBSD has a rather
catastrophic history of point zero releases, and most clueful
admins won't run those in production without carefully measuring
the risks and benefits.  So you've basically told him to run a
newer version without any such version being realistically 
available.

WTF? 

You want people not to use releases that came out over a year 
ago?  The generally sensible solution to that is to release 
RELEASEs more than once every fourteen or fifteen months.

... JG
-- 
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again. - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash

2012-03-29 Thread Mark Felder
On Thu, 29 Mar 2012 09:58:16 -0500, Hans Petter Selasky hsela...@c2i.net  
wrote:



Do both 32- and 64-bit versions of FreeBSD crash?


Correct, we see both i386 and amd64 flavors crash in the same way.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash

2012-03-29 Thread Eduardo Morras

At 16:03 29/03/2012, you wrote:

Alright, new data. It happened to crash about 10 minutes after I came in
this morning and I ran some stuff in the DDB. I have no idea what
information is useful, but perhaps someone will see something out of the
ordinary?


http://feld.me/freebsd/esx_crash/


Don't know about ESXi but on others VM Managers i can change the 
chipset emulation from ICH10 to ICH4. Can you change it to an older 
chipset too?




Thanks...



___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash

2012-03-29 Thread Joe Greco
 On Thursday 29 March 2012 15:42:42 Joe Greco wrote:
   Hi,
 
 Do both 32- and 64-bit versions of FreeBSD crash?

We've only seen it happen on one virtual machine.  That was a 32-bit
version.  And it's not so much a crash as it is a disk I/O hang.

The fact that it was happening regularly to that one VM, while a
bunch of other similar VM's were running alongside it without any
incident, along with the problem moving with the VM as it is moved
from host to host and from Opteron to Xeon, strongly points at 
something being wrong with the VM itself.  Our systems are built
mostly by script; I rebuilt the VM a few months ago and the
problem vanished.  The rebuilt system should have been virtually
identical to the original.  I never actually compared them though.

My working theory was that something bad had happened to the VM
during a migration from one datastore to another.  We have a really
slow-writing iSCSI server that it had been migrated onto for a little
bit, which was where the problem first appeared, I believe.  At
first I thought it was the nightly cron jobs just exceeding the iSCSI
server's capacity to cope, so we migrated the VM onto a host with
local datastores, and it remained broken thereafter.

So my conclusion was that it seemed likely that somehow VMware's 
thin provisioned disk image had gotten fouled up, and under some
unknown use case, it could be teased into locking up further I/O
on the VM.  I wasn't able to prove it.  I tried a read-dd of the
entire disk - passed, flying.  I tried several things to duplicate
the nightly periodic tasks where it seemed so prone to locking up.
They all ran fine.  But if I left the machine run, it'd do it
again eventually.

I explained it at the time to one of my VMware friends:

 But here's where it gets weird.  Three times, now, one VM - our Jabber
 server - has gone wonky in the wee early AM hours.  Disk I/O on the VM
 just locks up.  You can type at the console until it does I/O, so you
 can put in root at the login: prompt but never get a pw prompt.  My
 systems all run top from /etc/ttys and I can see that a whole bunch
 of processes are stopped in getblk.  It's like the iSCSI disk has gone
 away, except it hasn't, since the other VM's are all happily churning
 away, on the same datastore, on the same VMware host.

http://www.sol.net/tmp/freebsd/freebsd-esxi-lockup.gif

 Now it's *possible* that the problem actually happens after the 3AM cron
 run (note slight CPU/memory drop) but the Jabber implosion actually
 happens around 0530, see drop in memory%.  But the root problem at the
 VM level seems to be that disk I/O has frozen.  I can't tell for sure when
 that happens.  All three instances are similar to this.
 
 I can't explain this or figure out how to debug it.  Since it's locked up
 right now, thought I'd ping you for ideas before resetting it.

Now that was actually before we migrated it back to local datastore,
but when we did, the problem remained, suggesting that whatever has
happened to the VM, it is contained within the VM's vmdk or other
files.

... JG
-- 
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again. - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash

2012-03-29 Thread Hans Petter Selasky
On Thursday 29 March 2012 17:49:30 Joe Greco wrote:
  On Thursday 29 March 2012 15:42:42 Joe Greco wrote:
Hi,
  
  Do both 32- and 64-bit versions of FreeBSD crash?
 
 We've only seen it happen on one virtual machine.  That was a 32-bit
 version.  And it's not so much a crash as it is a disk I/O hang.

It almost sounds like the lost interrupt issue I've seen with USB EHCI 
devices, though disk I/O should have a retry timeout?

What does wmstat -i output?

--HPS
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


FreeBSD Stable Image

2012-03-29 Thread Mike Barnard
Hi,

Any one know where I can get a FreeBSD-9.0-STABLE ISO/IMG image?

ftp.freebsd.org/pub/FreeBSD/FreeBSD-stable/

That path does not seem to have it.

-- 
Mike

Of course, you might discount this possibility, but remember that one in a
million chances happen 99% of the time.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash

2012-03-29 Thread Mark Felder
On Thu, 29 Mar 2012 10:31:24 -0500, Eduardo Morras nec...@retena.com  
wrote:




Don't know about ESXi but on others VM Managers i can change the chipset  
emulation from ICH10 to ICH4. Can you change it to an older chipset too?


Unfortunately there's no setting in the GUI for that but I'll keep looking  
to see if there's a hidden option -- perhaps in the VM's config file.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: FreeBSD Stable Image

2012-03-29 Thread Matthew Seaman
On 29/03/2012 17:10, Mike Barnard wrote:
 Hi,
 
 Any one know where I can get a FreeBSD-9.0-STABLE ISO/IMG image?
 
 ftp.freebsd.org/pub/FreeBSD/FreeBSD-stable/
 
 That path does not seem to have it.
 

ftp://ftp.freebsd.org/pub/FreeBSD/releases/amd64/amd64/ISO-IMAGES/9.0/

Cheers,

Matthew

-- 
Dr Matthew J Seaman MA, D.Phil.
PGP: http://www.infracaninophile.co.uk/pgpkey




signature.asc
Description: OpenPGP digital signature


Re: FreeBSD Stable Image

2012-03-29 Thread Matthew Seaman
On 29/03/2012 17:18, Matthew Seaman wrote:
 On 29/03/2012 17:10, Mike Barnard wrote:
 Hi,

 Any one know where I can get a FreeBSD-9.0-STABLE ISO/IMG image?

 ftp.freebsd.org/pub/FreeBSD/FreeBSD-stable/

 That path does not seem to have it.

 
 ftp://ftp.freebsd.org/pub/FreeBSD/releases/amd64/amd64/ISO-IMAGES/9.0/

Errr... except of course that is -RELEASE and you asked for -STABLE.  I
don't believe there's a 9.0-STABLE snapshot available at freebsd.org
right now.  Instead, try one from here:

ftp://ftp.allbsd.org/pub/FreeBSD-snapshots/amd64-amd64/9.0-RELENG_9-20120329-JPSNAP/

There's a new snapshot available there pretty much daily.

Cheers,

Matthew

-- 
Dr Matthew J Seaman MA, D.Phil.
PGP: http://www.infracaninophile.co.uk/pgpkey




signature.asc
Description: OpenPGP digital signature


Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash

2012-03-29 Thread Mark Felder
On Thu, 29 Mar 2012 10:55:36 -0500, Hans Petter Selasky hsela...@c2i.net  
wrote:


It almost sounds like the lost interrupt issue I've seen with USB EHCI
devices, though disk I/O should have a retry timeout?

What does wmstat -i output?

--HPS



Here's a server that has a week uptime and is due for a crash any hour now:

root@server:/# vmstat -i
interrupt  total   rate
irq1: atkbd0  34  0
irq6: fdc0 9  0
irq15: ata1   34  0
irq16: em1778061  1
irq17: mpt0 19217711 31
irq18: em0 283674769460
cpu0: timer246571507400
Total  550242125892
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash

2012-03-29 Thread Mark Felder

On Thu, 29 Mar 2012 10:49:30 -0500, Joe Greco jgr...@ns.sol.net wrote:


I explained it at the time to one of my VMware friends:



This is 100% identical to what we see, Joe! And we're so unlucky that we  
have this happen on probably a dozen servers, but a handful are the really  
bad ones. We've rebuilt them from scratch many times with no improvement.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Per-file, per-process disk access statistics

2012-03-29 Thread Vitaly Magerya
Hi, folks. I want to diagnose which programs trigger disk writes,
so I'm wondering if there's a way to determine which processes write
to which files on what disks with what speed?

I know there are gstat(8) and iostat(8) which show how busy each
disk is, but they do not show which files are being written and
which processes are doing it.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash

2012-03-29 Thread Jim Bryant
This sounds just like a race condition that happens under Windows 7 on 
this laptop.  The race condition, as far as I can tell involves heavy 
disk access and heavy network access, and usually leaves the drive light 
on, while all activity monitors (alldisk, allcpu, allnetwork) are still 
active, although on this laptop disk takes priority, and network slows 
to a crawl.  occasionally, the mouse will stop working, along with 
everything else, but usually not.  keyboard is lower priority, and 
doesn't do anything.


You might want to check with mickeysoft, this might just be their 
problem.  This sounds so freaking similar to the issue I get, and I 
think it's a race condition (shared interrupts??).


This laptop is a Compaq Presario C300 series, with the 945GM chipset and 
a T7600 Core2 Duo CPU, with 3G of RAM.


Mark Felder wrote:
Alright guys, I'm at the end of my rope here. For those that haven't 
seen my previous emails here's the (not so) quick breakdown:


Overview:

FreeBSD ?? - 7.4 never crash
FreeBSD 8.0 - 8.2 crashes
FreeBSD 8-STABLE, 8.3, and 9.0 are untested (Sorry, not possible in 
our production at this time, and we were hoping we could base some 
stuff on 8.3 for long term stability...)
ESXi: Confirmed ESXi 4.0 - 5.0 has this problem. Haven't tested on 
others.



History:

Over the course of the last 2 years we've been banging our heads on 
the wall. VMWare is done debugging this. They claim it's not a VMWare 
issue. They can't identify what the heck happens. We had a glimmer of 
hope with ESXi 5.0 fixing it because we never saw any crashes in the 
handful of deployments, but our dreams were crushed today -- two days 
before an outage to begin migration to ESXi 5.0 -- when a customer's 
ESXi 5.0 server and FreeBSD 8.2 guest crashed.



Crash Details:

The keyboard/mouse usually stops responding for input on the console; 
normally we can't type in a username or password. However, we can 
switch VTs.


If there's a shell on the console and we can type, we can only run 
things in memory. Any time we try to access the disk it will hang 
indefinitely.


The server still has network access. We can ping it without issue. SSH 
of course kicks you out because it can't do any I/O.


If we were to serve a lightweight http server off a memory backed 
filesystem I'm confident it would run just fine as long as it wasn't 
logging or anything.


On ESXi you see that there is a CPU spike of 100% that goes on 
indefinitely. No idea what the FreeBSD OS itself thinks it is doing 
because we can't run top during the crash.


This crash can affect a server and happen multiple times a week. It 
can also not show up for 180 days or more. But it does happen. The 
server can be 100% idle and crash. We have servers that do more I/O 
than the ones that crash could ever attempt to do and these don't 
crash at all. Completely inexplicable.



Things we've looked into:

Nothing about the installed software matters. We've tried cross 
referencing the crashed servers by the programs they run but the base 
OS is the only common denominator due to the wide variety of servers 
it has affected.


Storage doesn't matter. We've tried different iSCSI SANs, we've tried 
different switches, we've tried local datastores on the ESXi servers 
themselves.


HP servers, Dell servers -- doesn't seem to matter either. (All with 
latest firmwares, BIOSes, etc)


VMWare gave us a ton of debugging tasks, and we've given them 
gigabytes of debugging info and data; they can't find anything.


VMWare tools -- with, without, using open-vm-tools makes no 
difference. I think we've done a fair job ruling out VMWare.



I think we've finally found enough data that this is definitely 
something in the FreeBSD world. I'm going to begin prepping some of 
the known crashy servers with more debugging. Any suggestions on what 
I should build the kernel with? They never do a proper panic, but I 
definitely want to at least *try* to get into the debugger the next 
time it crashes. And when it crashes, what the heck should I be 
running? I've never played with the KDB before...



Thank you for any suggestions and help you can give me
___
freebsd-hack...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to 
freebsd-hackers-unsubscr...@freebsd.org



___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash

2012-03-29 Thread Alan Cox
On Thu, Mar 29, 2012 at 11:27 AM, Mark Felder f...@feld.me wrote:

 On Thu, 29 Mar 2012 10:55:36 -0500, Hans Petter Selasky hsela...@c2i.net
 wrote:


 It almost sounds like the lost interrupt issue I've seen with USB EHCI
 devices, though disk I/O should have a retry timeout?

 What does wmstat -i output?

 --HPS



 Here's a server that has a week uptime and is due for a crash any hour now:

 root@server:/# vmstat -i
 interrupt  total   rate
 irq1: atkbd0  34  0
 irq6: fdc0 9  0
 irq15: ata1   34  0
 irq16: em1778061  1
 irq17: mpt0 19217711 31
 irq18: em0 283674769460
 cpu0: timer246571507400
 Total  550242125892



Not so long ago, VMware implemented a clever scheme for reducing the
overhead of virtualized interrupts that must be delivered by at least some
(if not all) of their emulated storage controllers:

http://static.usenix.org/events/atc11/tech/techAbstracts.html#Ahmad

Perhaps, there is a bad interaction between this scheme and FreeBSD's mpt
driver.

Alan
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash

2012-03-29 Thread Mark Atkinson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 03/29/2012 07:03, Mark Felder wrote:
 Alright, new data. It happened to crash about 10 minutes after I
 came in this morning and I ran some stuff in the DDB. I have no
 idea what information is useful, but perhaps someone will see
 something out of the ordinary?
 
 
 http://feld.me/freebsd/esx_crash/

If this is an interrupt problem with disk i/o, then you might want to
look into (DDB(4))

show intr
show intrcount

maybe

show allrman
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.19 (FreeBSD)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk90lloACgkQrDN5kXnx8yaCZACbBamQksNyWC26PUsOn5N9LJLV
ql0AoJwYCFDfXhCpZIN735V9qg0VepFf
=fCLN
-END PGP SIGNATURE-

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash

2012-03-29 Thread Jerry
On Thu, 29 Mar 2012 11:43:45 -0500
Jim Bryant articulated:

 Mark Felder wrote:
  Alright guys, I'm at the end of my rope here. For those that
  haven't seen my previous emails here's the (not so) quick breakdown:
 
  Overview:
 
  FreeBSD ?? - 7.4 never crash
  FreeBSD 8.0 - 8.2 crashes
  FreeBSD 8-STABLE, 8.3, and 9.0 are untested (Sorry, not possible in 
  our production at this time, and we were hoping we could base some 
  stuff on 8.3 for long term stability...)
  ESXi: Confirmed ESXi 4.0 - 5.0 has this problem. Haven't tested on 
  others.
 
 
  History:
 
  Over the course of the last 2 years we've been banging our heads on 
  the wall. VMWare is done debugging this. They claim it's not a
  VMWare issue. They can't identify what the heck happens. We had a
  glimmer of hope with ESXi 5.0 fixing it because we never saw any
  crashes in the handful of deployments, but our dreams were crushed
  today -- two days before an outage to begin migration to ESXi 5.0
  -- when a customer's ESXi 5.0 server and FreeBSD 8.2 guest crashed.
 
 
  Crash Details:
 
  The keyboard/mouse usually stops responding for input on the
  console; normally we can't type in a username or password. However,
  we can switch VTs.
 
  If there's a shell on the console and we can type, we can only run 
  things in memory. Any time we try to access the disk it will hang 
  indefinitely.
 
  The server still has network access. We can ping it without issue.
  SSH of course kicks you out because it can't do any I/O.
 
  If we were to serve a lightweight http server off a memory backed 
  filesystem I'm confident it would run just fine as long as it
  wasn't logging or anything.
 
  On ESXi you see that there is a CPU spike of 100% that goes on 
  indefinitely. No idea what the FreeBSD OS itself thinks it is doing 
  because we can't run top during the crash.
 
  This crash can affect a server and happen multiple times a week. It 
  can also not show up for 180 days or more. But it does happen. The 
  server can be 100% idle and crash. We have servers that do more I/O 
  than the ones that crash could ever attempt to do and these don't 
  crash at all. Completely inexplicable.
 
 
  Things we've looked into:
 
  Nothing about the installed software matters. We've tried cross 
  referencing the crashed servers by the programs they run but the
  base OS is the only common denominator due to the wide variety of
  servers it has affected.
 
  Storage doesn't matter. We've tried different iSCSI SANs, we've
  tried different switches, we've tried local datastores on the ESXi
  servers themselves.
 
  HP servers, Dell servers -- doesn't seem to matter either. (All
  with latest firmwares, BIOSes, etc)
 
  VMWare gave us a ton of debugging tasks, and we've given them 
  gigabytes of debugging info and data; they can't find anything.
 
  VMWare tools -- with, without, using open-vm-tools makes no 
  difference. I think we've done a fair job ruling out VMWare.
 
 
  I think we've finally found enough data that this is definitely 
  something in the FreeBSD world. I'm going to begin prepping some of 
  the known crashy servers with more debugging. Any suggestions on
  what I should build the kernel with? They never do a proper panic,
  but I definitely want to at least *try* to get into the debugger
  the next time it crashes. And when it crashes, what the heck should
  I be running? I've never played with the KDB before...
 
 
  Thank you for any suggestions and help you can give me
 
 This sounds just like a race condition that happens under Windows 7
 on this laptop.  The race condition, as far as I can tell involves
 heavy disk access and heavy network access, and usually leaves the
 drive light on, while all activity monitors (alldisk, allcpu,
 allnetwork) are still active, although on this laptop disk takes
 priority, and network slows to a crawl.  occasionally, the mouse will
 stop working, along with everything else, but usually not.  keyboard
 is lower priority, and doesn't do anything.
 
 You might want to check with mickeysoft, this might just be their 
 problem.  This sounds so freaking similar to the issue I get, and I 
 think it's a race condition (shared interrupts??).
 
 This laptop is a Compaq Presario C300 series, with the 945GM chipset
 and a T7600 Core2 Duo CPU, with 3G of RAM.

{TOP POSTING CORRECTED}

I just started reading this tread, but I am wondering if I missed
something here. What does this have to do with Windows 7?

-- 
Jerry ♔

Disclaimer: off-list followups get on-list replies or get ignored.
Please do not ignore the Reply-To header.
__

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash

2012-03-29 Thread Mark Felder
On Thu, 29 Mar 2012 12:05:30 -0500, Mark Atkinson atkin...@gmail.com  
wrote:




If this is an interrupt problem with disk i/o, then you might want to
look into (DDB(4))
show intr
show intrcount
maybe
show allrman



Thank you! I really don't know what things we should be running in DDB to  
diagnose this and we will try this upon the next crash.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash

2012-03-29 Thread Mark Felder

On Thu, 29 Mar 2012 12:24:30 -0500, je...@seibercom.net wrote:



I just started reading this tread, but I am wondering if I missed
something here. What does this have to do with Windows 7?


I emailed him off-list but I'm guessing he thought this was on VMWare  
Workstation or another product that would virtualize FreeBSD on top of  
Windows as the host OS.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash

2012-03-29 Thread Mark Felder

On Thu, 29 Mar 2012 11:53:02 -0500, Alan Cox alan.l@gmail.com wrote:



Not so long ago, VMware implemented a clever scheme for reducing the
overhead of virtualized interrupts that must be delivered by at least  
some

(if not all) of their emulated storage controllers:

http://static.usenix.org/events/atc11/tech/techAbstracts.html#Ahmad

Perhaps, there is a bad interaction between this scheme and FreeBSD's mpt
driver.

Alan


If we assume mpt is the culprit how can I go about diagnosing this more  
accurately? Is there something I should be looking for in vmstat -i? Too  
many interrupts? Not enough? Rate too high or too low? Or is this  
something that is much harder to track down because we're dealing with  
emulated hardware?


If any BSD devs are interested in access to our environment I think we  
could comply. I might even be able to get authorization to give you an  
account on the most crash-prone server which doesn't have any sensitive  
customer data on it. I think at this point we'd even be willing to pay  
someone to look at a server in this state just so we (and hopefully  
others) can benefit and hopefully we end up with a more reliable  
FreeBSD-on-VMWare for everyone.


I know Doug mentioned running newer OS versions and that is definitely  
tempting but because it's not 100% reproducible on demand it's hard to  
prove it fixes it without waiting 6 months. We're fighting internally here  
with trust 9.0 fixes it vs jump back to 7.4 because we KNOW it doesn't  
happen there. Having someone look at this and say oh, yes, that's a  
deficiency in mpt that appears to be fixed in the newer driver that was  
MFC'd to 8-STABLE and you'll find in 8.3-RELEASE and 9.0-RELEASE would be  
more comforting.


Thanks to everyone for their time on this!
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash

2012-03-29 Thread Joe Greco
 On Thursday 29 March 2012 17:49:30 Joe Greco wrote:
   On Thursday 29 March 2012 15:42:42 Joe Greco wrote:
 Hi,
   
   Do both 32- and 64-bit versions of FreeBSD crash?
  
  We've only seen it happen on one virtual machine.  That was a 32-bit
  version.  And it's not so much a crash as it is a disk I/O hang.
 
 It almost sounds like the lost interrupt issue I've seen with USB EHCI 
 devices, though disk I/O should have a retry timeout?

That doesn't seem to fit.  Why would a perfectly functional VM suddenly
develop this problem when given a slow underlying datastore (fits so far)
but then the problem *remains* when returned to a fast local datastore,
even on a different host and architecture?  And why wouldn't the other
VM's running alongside develop the same problem?

 What does wmstat -i output?

No idea, we reloaded the VM months ago.

... JG
-- 
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again. - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash

2012-03-29 Thread Adam Vande More
On Thu, Mar 29, 2012 at 1:22 PM, Mark Felder f...@feld.me wrote:


 If we assume mpt is the culprit


Doesn't VMWare offer different types of emulated disk controllers?  If so,
that might be the easiest way to narrow the field.  Another thing maybe to
try would be to backport the mpt

Also, it's not VMWare's place to claim not our problem when you are
paying for support.  If this doesn't happen on bare metal, it's a VMWare
issue, or they need to demonstrate it's not their issue.  At least that
would be the expectation I have.

There is also a comment on this post indicating someone else with the issue
and who has received unofficial vmware feedback.

http://www.hailang.me/tech/virtual/freebsd-vmware-esx-a-weird-error-with-san-storage/

And then there is this one with similar symptoms and a workaround:

http://forums.freebsd.org/showthread.php?t=27899

-- 
Adam Vande More
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash

2012-03-29 Thread Mark Felder
On Thu, 29 Mar 2012 15:53:52 -0500, Adam Vande More  
amvandem...@gmail.com wrote:




Doesn't VMWare offer different types of emulated disk controllers?  If  
so,
that might be the easiest way to narrow the field.  Another thing maybe  
to

try would be to backport the mpt


Yes, they offer Paravirtual (not applicable for FreeBSD), LSI Parallel  
(default option), LSI SAS, and Buslogic (not available for 64bit).


Both LSI SAS and LSI Parallel use the mpt driver.



Also, it's not VMWare's place to claim not our problem when you are
paying for support.  If this doesn't happen on bare metal, it's a VMWare
issue, or they need to demonstrate it's not their issue.  At least that
would be the expectation I have.


You're right, but we've thrown a ton of money at their support and had  
direct phone access to their engineers. The best we can get out of them is  
no indication this is a VMWare problem. It's easy for them to blow  
people off when they're as big as they've grown to be.


There is also a comment on this post indicating someone else with the  
issue

and who has received unofficial vmware feedback.

http://www.hailang.me/tech/virtual/freebsd-vmware-esx-a-weird-error-with-san-storage/


I found that post ages ago and that's me, mf, as the only person to  
comment on it. Unfortunately our problem does not align with what he's  
describing.




And then there is this one with similar symptoms and a workaround:

http://forums.freebsd.org/showthread.php?t=27899



I'm now investigating those loader.conf options. I have my crashy machine  
set to use them on next boot so we'll see if it crashes now that I'm using  
LSI SAS emulated controller. If it still crashes, we'll see what happens  
after that with those loader.conf options enabled.



___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash

2012-03-29 Thread Doug Barton
On 3/29/2012 7:01 AM, Joe Greco wrote:
 On 3/28/2012 1:59 PM, Mark Felder wrote:
 FreeBSD 8-STABLE, 8.3, and 9.0 are untested

 As much as I'm sensitive to your production requirements, realistically
 it's not likely that you'll get a helpful result without testing a newer
 version. 8.2 came out over a year ago, many many things have changed
 since then.

 Doug
 
 So you're saying that he should have been using 8.3-RELEASE, then.

That isn't what I said at all, sorry if I wasn't clear. The OP mentioned
9.0-RELEASE, and in the context of his message (which I snipped) he
mentioned 8-stable. That's what I was referring to.

 If you'll kindly go over to http://www.freebsd.org and look under
 Latest Releases, please note that 8.2 is a production release.
 If you don't want it to be a production release, then find a way
 to make it so, but please don't snipe at people who are using the
 code that the FreeBSD project has indicated is a current production
 offering.
 
 There are many good reasons not to run arbitrary snapshots on your
 production gear.  It's unrealistic to expect people to run non-
 RELEASE non-production code on their production gear.  We can have
 that discussion if you don't understand that, drop me a note off-
 list and I'll be happy to explain it.

I can see that you're upset about something, sorry if my message caused
you additional stress. I actually understand the realities of production
environments quite well, and believe it or not I agree with some of your
frustration about how we handle support for our supported releases.
We've had various public threads about these issues, which have sparked
some quite-lively private discussions amongst our committers, and I'm
hoping that once the long-overdue 8.3-RELEASE is out we'll be able to
buckle down and start putting some of those ideas into action.

Meanwhile, this is still a volunteer project, and as a result sometimes
the best way to get attention to a problem is to verify that it hasn't
already been fixed. You've been around more than long enough to
understand this Joe. We can spend time arguing about what *should* be
(actually we can't ...) but my point was in trying to help the OP get
the most/best help the fastest way possible.

Doug
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash

2012-03-29 Thread Joe Greco
  And then there is this one with similar symptoms and a workaround:
 
  http://forums.freebsd.org/showthread.php?t=3D27899
 
 I'm now investigating those loader.conf options. I have my crashy machine
 set to use them on next boot so we'll see if it crashes now that I'm using
 LSI SAS emulated controller. If it still crashes, we'll see what happens 
 after that with those loader.conf options enabled.

Um, if I may, that's something completely different.

VMDirectPath, or PCIe passthru, is making a hardware device on a VMware
host available directly to a guest.  It'll take your LSI controller, in
the example cited, and make it unavailable to VMware ESXi, and present
it instead inside the guest environment.  You do this when you have an
app whose performance would suffer greatly when made to operate through
the indirection that a VM naturally lives in; for example, it is quite
common for FreeNAS users to pass a disk controller through to a VM guest
in order to allow a virtualized FreeNAS instance to directly manage the
physical disks.

In that case, there are some issues with ESXi and interrupt delivery to
the guest VM; virtualization doesn't actually get rid of the possibility
of ESXi problems, since the hypervisor is still ultimately involved.  It
is certainly possible that there's some common issue involving interrupt
delivery somehow, but I wouldn't get my hopes up.

It also doesn't explain the experience here, where one VM basically
crapped out but only after a migration - and then stayed crapped out.
It would be interesting to hear about your datastore, how busy it is,
what technology, whether you're using thin, etc.  I just have this real
strong feeling that it's some sort of corruption with the vmfs3 and thin
provisioned disk format, but it'd be interesting to know if that's 
totally off-track.

... JG
-- 
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again. - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash

2012-03-29 Thread Mark Felder

On Thu, 29 Mar 2012 19:27:31 -0500, Joe Greco jgr...@ns.sol.net wrote:


It also doesn't explain the experience here, where one VM basically
crapped out but only after a migration - and then stayed crapped out.
It would be interesting to hear about your datastore, how busy it is,
what technology, whether you're using thin, etc.  I just have this real
strong feeling that it's some sort of corruption with the vmfs3 and thin
provisioned disk format, but it'd be interesting to know if that's
totally off-track.


We've ruled out SAN, but we haven't ruled out VMFS. Even FreeBSD Guests on  
standalone ESXi servers with no SAN exhibit this crash.


For the record, we only use thick provisioning and if it was corruption  
I'm not sure what layer the corruption could be at. The crashy servers  
show no abnormalities when I run either `freebsd-update IPS` or  
`pkg_libchk` to confirm checksums of all installed programs. Now the other  
data on there... it's not exactly verified, but our backups via rsnapshot  
seem to prove there is no issue there or we'd have lots of new files each  
run.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash

2012-03-29 Thread Jerome Herman

On 28/03/2012 22:59, Mark Felder wrote:
Alright guys, I'm at the end of my rope here. For those that haven't 
seen my previous emails here's the (not so) quick breakdown:


Overview:

FreeBSD ?? - 7.4 never crash
FreeBSD 8.0 - 8.2 crashes
FreeBSD 8-STABLE, 8.3, and 9.0 are untested (Sorry, not possible in 
our production at this time, and we were hoping we could base some 
stuff on 8.3 for long term stability...)
ESXi: Confirmed ESXi 4.0 - 5.0 has this problem. Haven't tested on 
others.



History:

Over the course of the last 2 years we've been banging our heads on 
the wall. VMWare is done debugging this. They claim it's not a VMWare 
issue. They can't identify what the heck happens. We had a glimmer of 
hope with ESXi 5.0 fixing it because we never saw any crashes in the 
handful of deployments, but our dreams were crushed today -- two days 
before an outage to begin migration to ESXi 5.0 -- when a customer's 
ESXi 5.0 server and FreeBSD 8.2 guest crashed.



Crash Details:

The keyboard/mouse usually stops responding for input on the console; 
normally we can't type in a username or password. However, we can 
switch VTs.


If there's a shell on the console and we can type, we can only run 
things in memory. Any time we try to access the disk it will hang 
indefinitely.


The server still has network access. We can ping it without issue. SSH 
of course kicks you out because it can't do any I/O.


If we were to serve a lightweight http server off a memory backed 
filesystem I'm confident it would run just fine as long as it wasn't 
logging or anything.


On ESXi you see that there is a CPU spike of 100% that goes on 
indefinitely. No idea what the FreeBSD OS itself thinks it is doing 
because we can't run top during the crash.


This crash can affect a server and happen multiple times a week. It 
can also not show up for 180 days or more. But it does happen. The 
server can be 100% idle and crash. We have servers that do more I/O 
than the ones that crash could ever attempt to do and these don't 
crash at all. Completely inexplicable.



Things we've looked into:

Nothing about the installed software matters. We've tried cross 
referencing the crashed servers by the programs they run but the base 
OS is the only common denominator due to the wide variety of servers 
it has affected.


Storage doesn't matter. We've tried different iSCSI SANs, we've tried 
different switches, we've tried local datastores on the ESXi servers 
themselves.


HP servers, Dell servers -- doesn't seem to matter either. (All with 
latest firmwares, BIOSes, etc)


VMWare gave us a ton of debugging tasks, and we've given them 
gigabytes of debugging info and data; they can't find anything.


VMWare tools -- with, without, using open-vm-tools makes no 
difference. I think we've done a fair job ruling out VMWare.



I think we've finally found enough data that this is definitely 
something in the FreeBSD world. I'm going to begin prepping some of 
the known crashy servers with more debugging. Any suggestions on what 
I should build the kernel with? They never do a proper panic, but I 
definitely want to at least *try* to get into the debugger the next 
time it crashes. And when it crashes, what the heck should I be 
running? I've never played with the KDB before...



Thank you for any suggestions and help you can give me
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to 
freebsd-questions-unsubscr...@freebsd.org



Sorry, coming a bit late to the party,

I have seen similar behavior on a few vm. All of them either Debian and 
FreeBSD. Even though CPU indication are not necessarily relevant in a 
VM, vi launched through crontab -e would take insane amount of CPU (up 
to 84%) and Apache was hanging around 350% 400% (quad CPU VM).
Now the thing is that making a VM snapshot and deploying the snapshot a 
while later, or on a different (way less loaded) VMWare platform would 
basically make it perfectly usable again.
Shutting down the VM and starting it again with only one CPU would also 
basically solve the problem. In a way Debian seemed to be able to 
survive the crisis but Disk I/O have latencies of many seconds, 
sometimes minutes. This would happen only on heavily loaded VMWare. In a 
quite similar way older version of Debian never shown the problem.


Can you test whether you have similar behavior on your platform ?
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Off-Topic: Computing for the Blind

2012-03-29 Thread Barbara La Scala
Many thanks to everyone who contacted me, either directly or through the list.  
I now have
plenty of places and ideas to check out to help get my stepfather online.  At 
the moment, 
I'm leaning towards getting him a Mac (since it has a real operating system 
under the
hood) and a suite of text/keyboard friendly apps.

Since I've starting looking into this I've come to realise how much having good 
eyesight is
taken for granted, what with context sensitive menus, touch screens and the 
(ab)use of 
Flash.  Be kind to your retinas and corneas.  They are more useful than you 
might realise.

Thanks again for all the help.
Barbara

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash

2012-03-29 Thread Adrian Chadd
Again, it's starting to sound like an interrupt handling issue which
may or may not be limited to the storage device.

You'll have to engage someone who knows those device drivers and
likely have them add some debugging to the driver which can be easily
flipped on (via binaries in a ramdisk - very important if you can't
run sysctl because your disk IO has locked up!) to see what the
current state of things.

It's likely that the BSD mpt(4) and other storage drivers, and/or our
interrupt handling code, is just slightly different enough to confuse
the snot out of VMWare. I'd first look at the obvious - (eg, if you've
just stopped receiving interrupts, even if new IO is scheduled). I'd
also ask VMware if they have any tools that they can run on a VM to
get the state of the internal emulated driver. For example, register
dumps of the device to see if it's in a hung state, register dumps of
the PIC/APIC to see what state they're in, etc.

Maybe pull in someone like ixsystems and see if they can help debug
this kind of stuff? If you're paying vmware for support, you could
pull them into things with ixsystems and see if the two of them can
help you sort this out?

Thanks,



Adrian
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org