Re: Regression in 8.2-STABLE bge code (from 7.4-STABLE)

2012-03-11 Thread Michael L. Squires

A patch which allows FreeBSD 8.3-STABLE to use the Broadcom GigE bge ports
on the Tyan S4882 quad Opteron motherboard (and almost certainly on the 
S4881 motherboard, which had the same problem with 7.4-STABLE) has been 
developed by YongHyeon Pyun.


The problem involves a bug in the PCI bridge which connects the Broadcom
bge Ethernet ports.

He will shortly be committing the patch to HEAD.

Thank you!

Mike Squires
mikes at siralan.org
UN*X at home
since 1986
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Issue with hast replication

2012-03-11 Thread Phil Regnauld
Mikolaj Golub (trociny) writes:
> 
> 
>  PR> Mar 11 02:02:30 h1 hastd[2282]: [hvol] (primary) Disconnected from 
> tcp4://192.168.1.200.
>  PR> Mar 11 02:02:30 h1 hastd[2282]: [hvol] (primary) Unable to write 
> synchronization data: Cannot allocate memory.
>  PR> Mar 11 02:02:41 h1 hastd[2282]: [hvol] (primary) Unable to send request 
> (Cannot allocate memory): WRITE(31642091520, 131072).
> 
> 31642091520 looks like rather large offset for 10Gb volume...

Sorry, that should have been 100G - I typed from memory instead of 
copy-pasting.

> Just to be more confident that this is a HAST issue could you please try the
> following experiment?
> 
> 1) Stop hastd on h2.
> 
> 2) On h1 run something like below:
> 
>   dd if=/dev/zvol/zfs/hvol bs=131072 | ssh h2 dd bs=131072 
> of=/dev/zvol/zfs/hvol
> 
> (copy hvol from h1 to h2 without hastd to see if it will succeed).
> 
> Note: you will need to recreate HAST provider on secondary after this.

Ok this is interesting.

(For debugging purposes I've renamed the target zvol as "junk", you'll see
why below).

1) As you suggested:

h1# dd if=/dev/zvol/zfs/hvol bs=131072 | ssh h2 dd bs=131072 
of=/dev/zvol/zfs/junk
dd: /dev/zvol/zfs/junk: Invalid argument
0+6 records in
0+5 records out
131072 bytes transferred in 0.002344 secs (55920640 bytes/sec)

To be certain which dd was complaining, I renamed the target zvol.

2) Tried repeatedly, sometimes the number of bytes is a bit different:

0+7 records in
0+6 records out
147456 bytes transferred in 0.002448 secs (60233277 bytes/sec)

And yes, hastd is stopped on h2.

3) I tried dd'ing zero to the zvol locally on h2:

h2# dd if=/dev/zero of=/dev/zvol/zfs/junk bs=131072
^C1817+0 records in
1816+0 records out
238026752 bytes transferred in 1.582006 secs (150458820 bytes/sec)

That works, until I ^C it.

4) I tried redirecting the output of the dd | ssh to a file on the h2 side:

h1# dd if=/dev/zvol/zfs/hvol bs=131072 | ssh h2 dd bs=131072 of=/tmp/x
^C653+0 records in
652+0 records out
85458944 bytes transferred in 2.408074 secs (35488506 bytes/sec)

That works too, until I ^C it.

5) Things get even weirder - if I then go over to h2 and dd the
"/tmp/x" test file over to the zvol:

h2# dd if=x bs=131072 of=/dev/zvol/zfs/junk 
dd: /dev/zvol/zfs/junk: Invalid argument
652+1 records in
652+0 records out
85458944 bytes transferred in 0.444571 secs (192227879 bytes/sec)

Note that the file /tmp/x is 86917120 bytes long.

6) I try to copy more data into /tmp/x - it's now 291946496 (~280 MB)

h2# dd if=x bs=131072 of=/dev/zvol/zfs/junk
2227+1 records in
2227+1 records out
291946496 bytes transferred in 3.564129 secs (81912441 bytes/sec)

No more "invalid argument"...

7) ktrace on the destination dd:

[...]
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
\0"
  5807 dd   RET   read 17992/0x4648
  5807 dd   CALL  write(0x3,0x800c09000,0x4648)
  5807 dd   RET   write -1 errno 22 Invalid argument
  5807 dd   CALL  write(0x2,0x7fffd300,0x4)
  5807 dd   GIO   fd 2 wrote 4 bytes
 "dd: "
  5807 dd   RET   write 4
  5807 dd   CALL  write(0x2,0x7fffd3e0,0x12)
  5807 dd   GIO   fd 2 wrote 18 bytes
   "/dev/zvol/zfs/junk"

truss is a bit more informative:

fstat(0,{ mode=p- ,inode=5,size=16384,blksize=4096 }) = 0 (0x0)
lseek(0,0x0,SEEK_CUR)ERR#29 'Illegal seek'

Illegal seek, eh ? Any clues ?

The boxes are identical (HP DL380 G6), though the RAM config is different.

Summary:

- ssh works fine
- h1 zvol to h2 zvol over ssh fails
- h1 zvol to h2 /tmp/x over ssh is fine
- h2 /dev/zero locally to h2 zvol is fine
- h2 /tmp/x locally to h2 zvol fails at first, but works afterwards...


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Issue with hast replication

2012-03-11 Thread Mikolaj Golub

On Sun, 11 Mar 2012 19:54:57 +0100 Phil Regnauld wrote:

 PR> Hi,

 PR> I've got a fairly simple setup: two hosts running 9.0-R (will upgrade to 
stable
 PR> if told to, but want to check here first), ZFS and HAST. HAST is 
configured to
 PR> run on top of zvols configured on each host, as illustrated:

 PR>   FS  FS
 PR>+--++--+ 
 PR>| hvol | < hastd -> | hvol | 
 PR>+--++--+ 
 PR>| zvol || zvol | 
 PR>+--++--+ 
 PR>| zfs  || zfs  | 
 PR>+--++--+ 
 PR>   h1  h2

 PR> Connection is gigabit to the same switch. No issues with large TCP
 PR> transfers such as SCP/FTP.

 PR> Config is vanilla:

 PR> # zfs create -V 10G zfs/hvol

 PR> hast.conf:

 PR> resource hvol {
 PR> on h1 {
 PR> local /dev/zvol/zfs/hvol
 PR> remote tcp4://192.168.1.100
 PR> }
 PR> on h2 {
 PR> local /dev/zvol/zfs/hvol
 PR> remote tcp4://192.168.1.200
 PR> }
 PR> }


 PR> h1 is behaving fine as primary, either with h2 turned off or in init -
 PR> but as soon as I set the role to secondary for h2, the receiver
 PR> repeatedly crashes and restarts - see the traces below.

 PR> Primary:

 PR> Mar 11 02:02:30 h1 hastd[2282]: [hvol] (primary) Disconnected from 
tcp4://192.168.1.200.
 PR> Mar 11 02:02:30 h1 hastd[2282]: [hvol] (primary) Unable to write 
synchronization data: Cannot allocate memory.
 PR> Mar 11 02:02:41 h1 hastd[2282]: [hvol] (primary) Unable to send request 
(Cannot allocate memory): WRITE(31642091520, 131072).

31642091520 looks like rather large offset for 10Gb volume...

Just to be more confident that this is a HAST issue could you please try the
following experiment?

1) Stop hastd on h2.

2) On h1 run something like below:

  dd if=/dev/zvol/zfs/hvol bs=131072 | ssh h2 dd bs=131072 of=/dev/zvol/zfs/hvol

(copy hvol from h1 to h2 without hastd to see if it will succeed).

Note: you will need to recreate HAST provider on secondary after this.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: What ZFS version will be in 8.3?

2012-03-11 Thread Schaich Alonso
ZFS v28. It was merged into RELENG_8 from current in may last year.

Alonso
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: What ZFS version will be in 8.3?

2012-03-11 Thread Rainer Duffner

Am 11.03.2012 um 20:43 schrieb Steven Hartland:

> Hi guys which version of ZFS support will be included in 8.3?


V28, AFAIK.
Has been available as a back-port for 8.2 for some time.

Hopefully, it's stable. ;-)


Rainer
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


What ZFS version will be in 8.3?

2012-03-11 Thread Steven Hartland

Hi guys which version of ZFS support will be included in 8.3?

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Troube with SSD

2012-03-11 Thread Steven Hartland


- Original Message - 
From: "Willem Jan Withagen" 



Just as a followup.

I reported the above problem

Today it occurred again. But this time I was able to find a firmware
upgrade for the Corsair Force GT from 1.2 to 1.3.3
(Need Win7 to be able to upgrade)

Hopefully that helps, and it does not disconnect about every 4 weeks.


Sandforce based SSD as known for this issue, the later firmware updates
do indeed help with the problem. We've found the 1.3.3 Corsair on
none GT versions to be nice and stable :)

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Issue with hast replication

2012-03-11 Thread Phil Regnauld
Hi,

I've got a fairly simple setup: two hosts running 9.0-R (will upgrade to stable
if told to, but want to check here first), ZFS and HAST. HAST is configured to
run on top of zvols configured on each host, as illustrated:

  FS  FS
   +--++--+ 
   | hvol | < hastd -> | hvol | 
   +--++--+ 
   | zvol || zvol | 
   +--++--+ 
   | zfs  || zfs  | 
   +--++--+ 
  h1  h2

Connection is gigabit to the same switch. No issues with large TCP
transfers such as SCP/FTP.

Config is vanilla:

# zfs create -V 10G zfs/hvol

hast.conf:

resource hvol {
on h1 {
local /dev/zvol/zfs/hvol
remote tcp4://192.168.1.100
}
on h2 {
local /dev/zvol/zfs/hvol
remote tcp4://192.168.1.200
}
}


h1 is behaving fine as primary, either with h2 turned off or in init -
but as soon as I set the role to secondary for h2, the receiver
repeatedly crashes and restarts - see the traces below.

I've seen 

http://lists.freebsd.org/pipermail/freebsd-current/2011-May/024871.html
http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2012-01/msg00510.html

... but in the first case the fix is in 9 since last year, and the second
is referring to async replication - I'm using the default (fullsync).

hastctl status on the primary shows the dirty size diminishing slowly,
but obviously this isn't optimal (and causes freezes on I/O to the primary
hvol, causing all kinds of issues with the consumers of the hvol).

Any idea ? Am I doing something wrong ?


Primary:

Mar 11 02:02:30 h1 hastd[2282]: [hvol] (primary) Disconnected from 
tcp4://192.168.1.200.
Mar 11 02:02:30 h1 hastd[2282]: [hvol] (primary) Unable to write 
synchronization data: Cannot allocate memory.
Mar 11 02:02:41 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot 
allocate memory): WRITE(31642091520, 131072).
Mar 11 02:02:41 h1 hastd[2282]: [hvol] (primary) Disconnected from 
tcp4://192.168.1.200.
Mar 11 02:02:41 h1 hastd[2282]: [hvol] (primary) Unable to write 
synchronization data: Cannot allocate memory.
Mar 11 02:02:48 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot 
allocate memory): WRITE(31649693696, 131072).
Mar 11 02:02:48 h1 hastd[2282]: [hvol] (primary) Disconnected from 
tcp4://192.168.1.200.
Mar 11 02:02:48 h1 hastd[2282]: [hvol] (primary) Unable to write 
synchronization data: Cannot allocate memory.
Mar 11 02:02:59 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot 
allocate memory): WRITE(31691243520, 131072).
Mar 11 02:02:59 h1 hastd[2282]: [hvol] (primary) Disconnected from 
tcp4://192.168.1.200.
Mar 11 02:02:59 h1 hastd[2282]: [hvol] (primary) Unable to write 
synchronization data: Cannot allocate memory.
Mar 11 02:03:13 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot 
allocate memory): WRITE(31783256064, 131072).
Mar 11 02:03:13 h1 hastd[2282]: [hvol] (primary) Disconnected from 
tcp4://192.168.1.200.
Mar 11 02:03:13 h1 hastd[2282]: [hvol] (primary) Unable to write 
synchronization data: Cannot allocate memory.
Mar 11 02:03:18 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot 
allocate memory): WRITE(31782731776, 131072).
Mar 11 02:03:18 h1 hastd[2282]: [hvol] (primary) Disconnected from 
tcp4://192.168.1.200.
Mar 11 02:03:18 h1 hastd[2282]: [hvol] (primary) Unable to write 
synchronization data: Cannot allocate memory.
Mar 11 02:03:28 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot 
allocate memory): WRITE(31803441152, 131072).
Mar 11 02:03:28 h1 hastd[2282]: [hvol] (primary) Disconnected from 
tcp4://192.168.1.200.
Mar 11 02:03:28 h1 hastd[2282]: [hvol] (primary) Unable to write 
synchronization data: Cannot allocate memory.
Mar 11 02:03:42 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot 
allocate memory): WRITE(31881953280, 131072).
Mar 11 02:03:42 h1 hastd[2282]: [hvol] (primary) Disconnected from 
tcp4://192.168.1.200.
Mar 11 02:03:42 h1 hastd[2282]: [hvol] (primary) Unable to write 
synchronization data: Cannot allocate memory.


Secondary:

Mar 11 01:01:30 h2 hastd[2506]: [hvol] (secondary) Worker process exited 
ungracefully (pid=2874, exitcode=75).
Mar 11 01:01:38 h2 hastd[2875]: [hvol] (secondary) Unable to receive request 
header: Socket is not connected.
Mar 11 01:01:44 h2 hastd[2506]: [hvol] (secondary) Worker process exited 
ungracefully (pid=2875, exitcode=75).
Mar 11 01:01:45 h2 hastd[2876]: [hvol] (secondary) Unable to receive request 
header: Socket is not connected.
Mar 11 01:01:50 h2 hastd[2506]: [hvol] (secondary) Worker process exited 
ungracefully (pid=2876, exitcode=75).
Mar 11 01:01:56 h2 hastd[2877]: [hvol] (secondary) Unable to receive request 
header: Socket is not connected.
Mar 11 01:02:01 h2 hastd[2506]: [hvol] (secondary) Worker p

Re: devd(8) based AUTOMOUNTER (version 1.3)

2012-03-11 Thread army.of.root
Thanks for sharing !

Reads awesome.
Even with exFAT integration, my new goto "external disk fs" :)

best regards

On Sun, Mar 4, 2012 at 10:49 AM, vermaden  wrote:
> Already at 1.3.1 ...
>
> Fixed the 'detach' section (s/PREFIX/MNTPREFIX/g).
> Fixed removing directories of manually (properly) unmounted filesystems.
>
> "vermaden"  pisze:
>> Hi,
>>
>> after some 'fun' with MP3 players I have made some modifications and fixes.
>>
>> Here is a list of whats changed:
>>
>> Fixed bug about inproper exFAT detection, now mounts fine.
>> Fixed bug about creating mount dirs for all attached devices no matter if 
>> needed or not.
>> Revised 'detach' section, now removes only directory that is unmounted (if 
>> enabled of course).
>> Simplified FAT/NTFS sections, removed additional check as it break some MP3 
>> players default filesystems automount.
>>
>> The latest 1.3 version can be found here as usual:
>> https://github.com/vermaden/automount/
>>
>> Regards,
>> vermaden
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Troube with SSD

2012-03-11 Thread Willem Jan Withagen
On 2012-02-01 14:40, Willem Jan Withagen wrote:
> Hi,
> 
> I have this ZFS server up for about 27 days, and about 3 weeks ago (was
> not really paying attention) it turns out it lost its SSD that I'm using
> for log and cache. There is also a poor and lonely memory stick for log.
> So the box did not really suffer file loss.
> 
> system is running:
> FreeBSD zfs.digiware.nl 8.2-STABLE FreeBSD 8.2-STABLE #58: Thu Nov 17
> 09:43:46 CET 2011
> r...@zfs.digiware.nl:/home/obj/usr/src/src8/src/sys/ZFS  amd64
> 
> more info like dmesg, pciconf, kernconf, zpool iostat at:
>   http://www.tegenbosch28.nl/FreeBSD/systems/ZFS/
> 
> But it is weird to just lose a SSD from the bus. And it has happened
> before. And you can see that AHCI really banged on the frontdoor...
> 
> The device is a Corsair 60Gb Force GT. And thusfar I have not found any
> suggestions that that serie of devices is prone to doing this.
> 
> It was a real dead device, the only way to get it back:
>   powercycle the device by pulling it, and stick it back
>   then camcontrol rescan
> 
> I've now upgrade it to a 120Gb Corsair, to see if that has the same problem.
> 
> Other FreeBSD-ers have like problems?
> 
> Regards,
> --WjW
> 
> 
> Jan  7 10:04:24 zfs kernel: ahcich3: Timeout on slot 27 port 0
> Jan  7 10:04:24 zfs kernel: ahcich3: is  cs 2000 ss 3800
> rs 3800 tfd c0 serr  cmd 0004dd17
> Jan  7 10:04:56 zfs kernel: ahcich3: AHCI reset: device not ready after
> 31000ms (tfd = 0080)
> Jan  7 10:05:26 zfs kernel: ahcich3: Timeout on slot 29 port 0
> Jan  7 10:05:26 zfs kernel: ahcich3: is  cs 2000 ss 
> rs 2000 tfd 80 serr  cmd 0004dd17
> Jan  7 10:05:57 zfs kernel: ahcich3: AHCI reset: device not ready after
> 31000ms (tfd = 0080)
> Jan  7 10:06:27 zfs kernel: ahcich3: Timeout on slot 29 port 0
> Jan  7 10:06:27 zfs kernel: ahcich3: is  cs 2000 ss 
> rs 2000 tfd 80 serr  cmd 0004dd17
> Jan  7 10:06:27 zfs kernel: (ada2:ahcich3:0:0:0): lost device
> Jan  7 10:06:58 zfs kernel: ahcich3: AHCI reset: device not ready after
> 31000ms (tfd = 0080)
> Jan  7 10:07:28 zfs kernel: ahcich3: Timeout on slot 29 port 0
> Jan  7 10:07:28 zfs kernel: ahcich3: is  cs e000 ss e000
> rs e000 tfd 80 serr  cmd 0004dd17
> Jan  7 10:08:16 zfs kernel: ahcich3: AHCI reset: device not ready after
> 31000ms (tfd = 0080)
> Jan  7 10:08:16 zfs kernel: ahcich3: Poll timeout on slot 31 port 0
> Jan  7 10:08:16 zfs kernel: ahcich3: is  cs 8000 ss 
> rs 8000 tfd 80 serr  cmd 0004df17
> Jan  7 10:08:46 zfs kernel: ahcich3: Timeout on slot 31 port 0
> Jan  7 10:08:46 zfs kernel: ahcich3: is  cs 8000 ss 
> rs 8000 tfd 80 serr  cmd 0004df17
> Jan  7 10:08:48 zfs kernel: (ada2:ahcich3:0:0:0): removing device entry
> Jan  7 10:09:33 zfs kernel: ahcich3: AHCI reset: device not ready after
> 31000ms (tfd = 0080)
> Jan  7 10:09:33 zfs kernel: ahcich3: Poll timeout on slot 31 port 0
> Jan  7 10:09:33 zfs kernel: ahcich3: is  cs 8000 ss 
> rs 8000 tfd 80 serr  cmd 0004df17

Just as a followup.

I reported the above problem

Today it occurred again. But this time I was able to find a firmware
upgrade for the Corsair Force GT from 1.2 to 1.3.3
(Need Win7 to be able to upgrade)

Hopefully that helps, and it does not disconnect about every 4 weeks.

Ciao,
--WjW



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Time Clock Stops in FreeBSD 9.0 guest running under ESXi 5.0

2012-03-11 Thread Ian Lepore
On Sat, 2012-03-10 at 15:07 +0700, Adam Strohl wrote:
> I've now seen this on two different VMs on two different ESXi servers 
> (Xeon based hosts but different hardware otherwise and at different 
> facilities):
> 
> Everything runs fine for weeks then (seemingly) suddenly/randomly the 
> clock STOPS.  In the first case I saw a jump backwards of about 15 
> minutes (and then a 'freeze' of the clock).  The second time just 'time 
> standing still' with no backwards jump.  Logging accuracy is of course 
> questionable given the nature of the issue, but nothing really jumps out 
> (ie; I don't see NTPd adjusting the time just before this happens or 
> anything like that).
> 
> Naturally the clock stopping causes major issues, but the machine does 
> technically stay running.  My open sessions respond, but anything that 
> relies on time moving forward hangs.  I can't even gracefully reboot it 
> because shutdown/etc all rely on time moving forward (heh).
> 
> So I'm not sure if this is a VMWare/ESXi issue or a FreeBSD issue, or 
> some kind of interaction between the two.   I manage lots of VMWare 
> based FreeBSD VMs, but these are the only ESXi 5.0 servers and the only 
> FreeBSD 9.0 VMs.  I have never seen anything quite like this before, and 
> last night as I mentioned above I had it happen for the second time on a 
> different VM + ESXi server combo so I'm not thinking its a fluke 
> anymore.  I've looked for other reports of this both in VMWare and 
> FreeBSD contexts and not seeing anything.
> 
> What is interesting is that the 2 servers that have shown this issue 
> perform similar tasks, which are different from the other VMs which have 
> not shown this issue (yet).  This is 2 VMs out of a dozen VMs spread 
> over two ESXi servers on different coasts.  This might be a coincidence 
> but seems suspicious. These two VMs run these services (where as the 
> other VMs don't):
> 
> - BIND
> - CouchDB
> - MySQL
> - NFS server
> - Dovecot 2.x
> 
> I would also say that these two VMs probably are the most active, have 
> the most RAM and consume the most CPU because of what they do (vs. the 
> others).
> 
> I have disabled NTPd since I am running the OpenVM Tools (which I 
> believe should be keeping the time in sync with the ESXi host, which 
> itself uses NTP), my only guess is maybe there is some kind of collision 
> where NTPd and OpenVMTools were adjusting the time at the same time.  
> I'm playing the waiting game now to see what this brings (again though I 
> am running NTPd and OpenVMTools on all the other VMs which have yet to 
> show this issue).
> 
> Anyone seen anything like this?  Ring any bells?
> 

I've run into the "time standing still" problem, but only on bringing up
FreeBSD on new hardware (usually industrial single-board computers).  In
those cases time never advances beyond the time obtained from the RTC
hardware at boot.  I've never seen it happen that time runs normally for
a while then stops advancing, but I have almost no experience with
FreeBSD as a VM guest OS.

When I have seen the problem, it's always been due to interrupt
problems, such as the timer tick handler getting hung or the selected
timer hardware not generating interrupts.  

It seems unlikely to me that ntpd and the vm tools would be fighting in
a way that caused this symptom.  The way ntpd affects timing is to step
the clock (which gets logged), or to numerically steer the kernel's
timekeeping routines.  The steering is clamped at 500 ppm; to make the
clock appear to stop it would have to steer at 1e6 ppm.  I've always
assumed that VM guest services daemons that handle timekeeping use the
same ntp_adjtime() interface to the kernel timekeeping that ntpd itself
uses, so the same steering limits would apply.

If it happens again, interesting data might be found in the output of:

  sysctl kern.timecounter
  sysctl kern.eventtimer
  vmstat -i
  ntpdc -c kerninfo
  

-- Ian


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"