Re: PAM modules; pthreads not reliably dispatching background threads :(.

2020-02-16 Thread Dr Josef Karthauser
> On 16 Feb 2020, at 19:14, Jan Bramkamp  wrote:
> 
> On 09.02.20 12:25, Dr Josef Karthauser wrote:
>> Hi Folks,
>> 
>> Has anyone got any experience with PAM and pthreads?
>> 
> Is the "host" process multithreaded or at least built with Pthread support?

The pam library (libyubikey) links to libcurl, which uses pthreads. I’ve added 
debug statements and compiled, and it’s definitely not running code in the 
pthread that was dispatched.

Cheers,
Joe


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


PAM modules; pthreads not reliably dispatching background threads :(.

2020-02-09 Thread Dr Josef Karthauser
Hi Folks,

Has anyone got any experience with PAM and pthreads?

We’re using a 2FA module (pam_yubico 
https://www.freebsd.org/cgi/man.cgi?query=pam_yubico=8 
<https://www.freebsd.org/cgi/man.cgi?query=pam_yubico=8>).

It’s proving unreliable. Digging deeper it uses libcurl to communication to an 
HTTP endpoint, and libcurl is using pthreads for to make asynchronous DNS 
lookup requests. 

It seems that the pthreads are not being handled reliably within the PAM 
runtime context - the background threads do not complete or dispatch. It’s very 
strange.

The problem doesn’t happen if we use the same code running from a user land 
process. The pthreads are reliably in this context.

Is there a known issue with PAM and pthreads? Hints are that there are 
(https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=214540 
<https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=214540>). But that problem 
was reported 3 years ago!

HELP!

If you know something I’d really appreciate a steer!

Many thanks,
Joe
— 
Dr Josef Karthauser
Chief Technical Officer 
(01225) 300371 / (07703) 596893 
www.truespeed.com <http://www.truespeed.com/>
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 10.4 release - is the binary update corrupt?

2017-10-08 Thread Dr Josef Karthauser
> On 8 Oct 2017, at 12:04, Matthew Seaman <matt...@freebsd.org> wrote:
> 
>> Am Sonntag, 8. Oktober 2017 schrieb Dr Josef Karthauser:
>>> Hi,
>>> 
>>> I’m having trouble upgrading a 10.3 machine to 10.4: looks like something 
>>> is corrupt:
>>> 
>>> Fetching metadata signature for 10.4-RELEASE from update4.freebsd.org... 
>>> done.
>>> Fetching metadata index... done.
>>> Fetching 1 metadata patches. done.
>>> Applying metadata patches... done.
>>> Fetching 1 metadata files... done.
>>> Inspecting system... done.
>>> Fetching files from 10.3-RELEASE for merging... done.
>>> Preparing to download files... done.
>>> Fetching 38573 
>>> patches.102030405060708090100110120130140…
>>> [cut]
>>> Applying patches... done.
>>> Fetching 9266 files...
>>> gunzip: (stdin): unexpected end of file
>>> efb4027db1ae440353955aa1bcfc9c69d1cafbdb53b4bfc6584d64b1e1bfd209 has 
>>> incorrect hash.
>>> 
>>> Has anyone else also seen this?
>>> 
> 
> With luck you've only got a problem with that one patch file, but if the
> corruption is much wider, then moving aside your existing
> /var/db/freebsd-update and starting again from scratch is probably a
> good idea.
> 
> If you consistently get broken patch files from whichever of the update
> servers you get directed to, that probably means that update server
> needs some TLC.  Please do report that to clusteradm@... While waiting
> for them to sort out the problems, you can play with the 'ServerName'
> parameter in /etc/freebsd-update.conf to point yourself towards some
> other server.

I’ve upgrade from 10.1 today to 10.2, then 10.3; and only had this problem
moving to 10.4. I can’t see how a file could have become corrupt in transit
- TCP protects against that kind of thing. :)

I’ll try moving the damaged file aside and see if that fixes anything and
report back.

Cheers,
Joe
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

10.4 release - is the binary update corrupt?

2017-10-08 Thread Dr Josef Karthauser
Hi,

I’m having trouble upgrading a 10.3 machine to 10.4: looks like something is 
corrupt:

Fetching metadata signature for 10.4-RELEASE from update4.freebsd.org... done.
Fetching metadata index... done.
Fetching 1 metadata patches. done.
Applying metadata patches... done.
Fetching 1 metadata files... done.
Inspecting system... done.
Fetching files from 10.3-RELEASE for merging... done.
Preparing to download files... done.
Fetching 38573 
patches.102030405060708090100110120130140…
[cut]
Applying patches... done.
Fetching 9266 files...
gunzip: (stdin): unexpected end of file
efb4027db1ae440353955aa1bcfc9c69d1cafbdb53b4bfc6584d64b1e1bfd209 has incorrect 
hash.

Has anyone else also seen this?

Cheers,
Joe
— 
Dr Josef Karthauser
Chief Technical Officer
(01225) 300371 / (07703) 596893
www.truespeed.com <http://www.truespeed.com/>
  / theTRUESPEED <http://www.facebook.com/theTRUESPEED> 
  @theTRUESPEED <https://twitter.com/thetruespeed>
 
This email contains TrueSpeed information, which may be privileged or 
confidential. It's meant only for the individual(s) or entity named above. If 
you're not the intended recipient, note that disclosing, copying, distributing 
or using this information is prohibited. If you've received this email in 
error, please let me know immediately on the email address above. Thank you.
We monitor our email system, and may record your emails.

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

IPFW with NAT (breakage with vlanhwtag enabled) Re: IPFW with NAT : Problems with duplicate packets on FreeBSD 10.3-RC3

2016-04-09 Thread Dr Josef Karthauser

> On 8 Apr 2016, at 10:03, Dr Josef Karthauser <j...@truespeed.com> wrote:
> 
>> On 8 Apr 2016, at 06:51, Ian Smith <smi...@nimnet.asn.au 
>> <mailto:smi...@nimnet.asn.au>> wrote:
>> 
>> On Thu, 7 Apr 2016 17:08:38 +0100, Dr Josef Karthauser wrote:
>> 
>>> Looks like the first packet is being retransmitted, which means that 
>>> the nat is probably misconfigured and the TCP connection is broken in
>>> some strange way.
>> 
>>> Does anyone have a clue as to where to look? The ipfw rules are
>>> simple enough - what have I missed?
>> 
>> Do you have TSO enabled on that NIC?  If so, see ipfw(8) BUGS, third 
>> last para.  If not, no idea ..

So, disabling TSO did partially fix the problem; at least the “duplicate data” 
issue.

However, I’ve now added an https service in the jails (an haproxy), and that 
fails a TLS handshake from some hosts.

Bizarrely that problem goes away when I disable hw vlan tag processing 
(-vlanhwtag); that seems weird, and perhaps another bug.

The configuration of my machine is as follows:

  vlan10 (on igb0) [public address] <— [ipfw nat] -> igb1 [private address 
in a jail on the host, also bound to a physical network]

Is there any obvious reason why hardware vlan tagging should get in the way of 
a NAT session? I can’t think why that would be, but disabling it definitely 
fixes the problem.

Joe

— 
Dr Josef Karthauser
Chief Technical Officer
(01225) 300371 / (07703) 596893
www.truespeed.com <http://www.truespeed.com/>
  / theTRUESPEED <http://www.facebook.com/theTRUESPEED>  
  @theTRUESPEED <https://twitter.com/thetruespeed>
 

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: IPFW with NAT : Problems with duplicate packets on FreeBSD 10.3-RC3

2016-04-08 Thread Dr Josef Karthauser
> On 8 Apr 2016, at 06:51, Ian Smith <smi...@nimnet.asn.au> wrote:
> 
> On Thu, 7 Apr 2016 17:08:38 +0100, Dr Josef Karthauser wrote:
> 
> [ AppleMail msgs fail to quote properly in pine, so a partial quote: ]
> 
>> Looks like the first packet is being retransmitted, which means that 
>> the nat is probably misconfigured and the TCP connection is broken in
>> some strange way.
> 
>> Does anyone have a clue as to where to look? The ipfw rules are
>> simple enough - what have I missed?
> 
> Do you have TSO enabled on that NIC?  If so, see ipfw(8) BUGS, third 
> last para.  If not, no idea ..
> 

Thanks Ian,

It was exactly that issue! I wish I had remembered that I’d seen that in the 
man page; would have saved hours of debugging :)

Joe

— 
Dr Josef Karthauser
Chief Technical Officer
(01225) 300371 / (07703) 596893
www.truespeed.com
  / theTRUESPEED 
  @theTRUESPEED
 
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: IPFW with NAT : Problems with duplicate packets on FreeBSD 10.3-RC3

2016-04-07 Thread Dr Josef Karthauser

> On 8 Apr 2016, at 00:11, Dr Josef Karthauser <j...@truespeed.com> wrote:
> 
>> On 7 Apr 2016, at 17:08, Dr Josef Karthauser <j...@truespeed.com 
>> <mailto:j...@truespeed.com>> wrote:
>> 
>> Looks like the first packet is being retransmitted, which means that the nat 
>> is probably misconfigured and the TCP connection is broken in some strange 
>> way.
>> 
>> Does anyone have a clue as to where to look? The ipfw rules are simple 
>> enough - what have I missed?
> 
> Ok, the packet definitely isn’t being retransmitted. I’ve done a tcpdump/pcap 
> capture and taken a look and I get a packet that I’ve included below.
> 
> It’s got a 'HTTP/1.1 200 OK’ inserted mid-flow right in the middle of an HTTP 
> response. Looking at this I’d be inclined to think it’s a bug in the 
> webserver/tomcat, however, what’s strange is that if I ‘curl' the jailed web 
> server directly from the host machine on the private IP address (bypassing 
> the NAT), the HTTP response  received is perfectly fine. It’s only when I do 
> an HTTP request to the public IP address and go through the NAT that I 
> experience the problem.
> 
> How could this happen? Is it a buggy packet reassembly in the kernel perhaps?
> 

Adding: "ipfw add reass all from any to any” to the beginning of the ipfw rule 
set doesn’t make any difference to the behaviour. 

Joe

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: IPFW with NAT : Problems with duplicate packets on FreeBSD 10.3-RC3

2016-04-07 Thread Dr Josef Karthauser

> On 7 Apr 2016, at 17:08, Dr Josef Karthauser <j...@truespeed.com> wrote:
> 
> Looks like the first packet is being retransmitted, which means that the nat 
> is probably misconfigured and the TCP connection is broken in some strange 
> way.
> 
> Does anyone have a clue as to where to look? The ipfw rules are simple enough 
> - what have I missed?

Ok, the packet definitely isn’t being retransmitted. I’ve done a tcpdump/pcap 
capture and taken a look and I get a packet that I’ve included below.

It’s got a 'HTTP/1.1 200 OK’ inserted mid-flow right in the middle of an HTTP 
response. Looking at this I’d be inclined to think it’s a bug in the 
webserver/tomcat, however, what’s strange is that if I ‘curl' the jailed web 
server directly from the host machine on the private IP address (bypassing the 
NAT), the HTTP response  received is perfectly fine. It’s only when I do an 
HTTP request to the public IP address and go through the NAT that I experience 
the problem.

How could this happen? Is it a buggy packet reassembly in the kernel perhaps?

Joe

p.s here’s the strange packet with an HTTP response injected in the middle of a 
HTML stream:


23:01:07.204016 IP (tos 0x0, ttl 64, id 4190, offset 0, flags [DF], proto TCP 
(6), length 1500)
31.210.26.216.8080 > infiniverse.karthauser.co.uk.62475: Flags [.], cksum 
0xda1c (incorrect -> 0x7ff7), seq 8689:10137, ack 86, win 1040, options 
[nop,nop,TS val 124159447 ecr 1737359970], length 1448
.g.).
.f..g..b   Other Documentation

http://tomcat.apache.org/connectors-doc/;>Tomcat Connectors
http://tomcat.apache.org/connectors-doc/;>mod_jk Documentation
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Type: text/html;charset=ISO-8859-1
Transfer-Encoding: chunked
Date: Thu, 07 Apr 2016 23:01:05 GMT

2000






Apache Tomcat/7.0.68








http://tomcat.apache.org/;>Home
Documentation
Configuration
Examples
http://wiki.apache.org/tomcat/FrontPage;>Wiki
http://tomcat.apache.org/lists.html;>Mailing Lists
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

IPFW with NAT : Problems with duplicate packets on FreeBSD 10.3-RC3

2016-04-07 Thread Dr Josef Karthauser
rontPage;>Wiki
[CUT]


Other Documentation

http://tomcat.apache.org/connectors-doc/;>Tomcat Connectors
http://tomcat.apache.org/connectors-doc/;>mod_jk Documentation
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Type: text/html;charset=ISO-8859-1
Transfer-Encoding: chunked
Date: Thu, 07 Apr 2016 16:02:02 GMT

2000






Apache Tomcat/7.0.68








[CUT]




Server Status
* Malformed encoding found in chunked-encoding
* Closing connection 0
curl: (56) Malformed encoding found in chunked-encoding
 phoenix:~ joe$ 


Looks like the first packet is being retransmitted, which means that the nat is 
probably misconfigured and the TCP connection is broken in some strange way.

Does anyone have a clue as to where to look? The ipfw rules are simple enough - 
what have I missed?

Thanks,
Joe

p.s.

I also have one_pass disabled:

# sysctl net.inet.ip.fw.one_pass
net.inet.ip.fw.one_pass: 0
 
— 
Dr Josef Karthauser
Chief Technical Officer
(01225) 300371 / (07703) 596893
www.truespeed.com <http://www.truespeed.com/>
  / theTRUESPEED <http://www.facebook.com/theTRUESPEED> 
  @theTRUESPEED <https://twitter.com/thetruespeed>
 
This email contains TrueSpeed information, which may be privileged or 
confidential. It's meant only for the individual(s) or entity named above. If 
you're not the intended recipient, note that disclosing, copying, distributing 
or using this information is prohibited. If you've received this email in 
error, please let me know immediately on the email address above. Thank you.
We monitor our email system, and may record your emails.

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Drive failures with ada on FreeBSD-9.1, driver bug or wiring issue?

2013-07-18 Thread Dr Josef Karthauser
Hi there,

I'm scratching my head. I've just migrated to a super micro chassis and at the 
same time gone from FreeBSD 9.0 to 9.1-RELEASE.

The machine in question is running a ZFS mirror configuration on two ada 
devices (with a 8gb gmirror carved out for swap).

Since doing so I've been having strange drop outs on the drives; the just 
disappear from the bus like so:

(ada2:ahcich2:0:0:0): removing device entry
(aprobe0:ahcich2:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
(aprobe0:ahcich2:0:0:0): CAM status: ATA Status Error
(aprobe0:ahcich2:0:0:0): ATA status: d1 (BSY DRDY SERV ERR), error: 04 (ABRT )
(aprobe0:ahcich2:0:0:0): RES: d1 04 ff ff ff ff ff ff ff ff ff
(aprobe0:ahcich2:0:0:0): Error 5, Retries exhausted
(aprobe0:ahcich2:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
(aprobe0:ahcich2:0:0:0): CAM status: ATA Status Error
(aprobe0:ahcich2:0:0:0): ATA status: d1 (BSY DRDY SERV ERR), error: 04 (ABRT )
(aprobe0:ahcich2:0:0:0): RES: d1 04 ff ff ff ff ff ff ff ff ff
(aprobe0:ahcich2:0:0:0): Error 5, Retries exhausted


At first I though it was a failing drive - one of the drives did this, and I 
limped on a single drive for a week until I could get someone up to the rack to 
plug a third drive in.  We resilvered the zpool onto the new device and ran 
with the failed drive still plugged in (but not responding to a reset on the 
ada bus with camcontrol) for a week or so.

Then, the new drive dropped out in exactly the same way, followed in short 
order by the remaining original drive!!!

After rebooting the machine, and observing all three drives probing and 
available, I resilvered the gmirror and zpool again on the two devices expected 
that I thought were reliable, but before the resilvering was completed the new 
drive dropped out again.

I'm scratching my head now. I can't imagine that it's a wiring problem, as they 
are all on individual SATA buses and individually cabled.

Smart isn't reporting an drive issues either…. :/

So, I'm wondering, is it a driver issuer with 9.1-RELEASE, if I upgrade to 
9-RELENG would I expect that to resolve the problem?  (Have there been any 
reported ada bus issuer reported since last December?)

The hardware in question is:

ahci0: Intel Cougar Point AHCI SATA controller port 
0xf050-0xf057,0xf040-0xf043,0xf030-0xf037,0xf020-0xf023,0xf000-0xf01f mem 
0xdfb02000-0xdfb027ff irq 19 at device 31.2 on pci0
ahci0: AHCI v1.30 with 6 3Gbps ports, Port Multiplier not supported
ahcich0: AHCI channel at channel 0 on ahci0
ahcich1: AHCI channel at channel 1 on ahci0
ahcich2: AHCI channel at channel 2 on ahci0
ahcich3: AHCI channel at channel 3 on ahci0
ahcich4: AHCI channel at channel 4 on ahci0
ahcich5: AHCI channel at channel 5 on ahci0
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: WDC WD1000FYPS-01ZKB0 02.01B01 ATA-8 SATA 2.x device
ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
ada0: Previously was known as ad4
ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
ada1: WDC WD1000FYPS-01ZKB0 02.01B01 ATA-8 SATA 2.x device
ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
ada1: Previously was known as ad6
ada2 at ahcich2 bus 0 scbus2 target 0 lun 0
ada2: WDC WD1000FYPS-01ZKB0 02.01B01 ATA-8 SATA 2.x device
ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada2: Command Queueing enabled
ada2: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
ada2: Previously was known as ad8


Any ideas would be greatly welcomed.

Thanks,
Joe

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Drive failures with ada on FreeBSD-9.1, driver bug or wiring issue?

2013-07-18 Thread Dr Josef Karthauser

On 18 Jul 2013, at 13:07, Andrea Venturoli m...@netfence.it wrote:

 Perhaps they are WD Green drives?

They're WD RE2-GP 1 TB drives (model WD1000FYPS) , not sure if that's green or 
not.

 In that case, other than quoting Bob's suggestion about avoiding them, 
 there's something you can do:
 a) turn off the drives' power-saving features (this is done through a DOS 
 utility you can download);
 b) try different controllers and/or different OS releases.

I'm committed to FreeBSD, as the machine is already rolled out and in a data 
centre ;).

 You'll find a lot on this problem if you search the web.
 There's also a report of mine you can search on this ML, regarding FreeBSD 
 specifically.

I'll see if I can find it. Thanks.

Joe

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Drive failures with ada on FreeBSD-9.1, driver bug or wiring issue?

2013-07-18 Thread Dr Josef Karthauser
On 18 Jul 2013, at 08:33, Steven Hartland kill...@multiplay.co.uk wrote:

 What chassis is this?

Hey Steven,

It's a Supermicro CSE-813MTQ-350CB.

Cheers,
Joe

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Drive failures with ada on FreeBSD-9.1, driver bug or wiring issue?

2013-07-18 Thread Dr Josef Karthauser

On 18 Jul 2013, at 20:31, Charles Swiger cswi...@mac.com wrote:

 Hi--
 
 On Jul 18, 2013, at 12:13 PM, Dr Josef Karthauser j...@karthauser.co.uk 
 wrote:
 On 18 Jul 2013, at 13:07, Andrea Venturoli m...@netfence.it wrote:
 
 Perhaps they are WD Green drives?
 
 They're WD RE2-GP 1 TB drives (model WD1000FYPS) , not sure if that's green 
 or not.
 
 Yes, those are WDC's Green drives, although they are also the higher grade 
 version as compared to standard desktop drives which are supposed to have 
 firmware which plays nice with RAID (TLER, time-limited error recovery).
 
 Updating the firmware and increasing the timeout before these spin down 
 automagically is likely to help, but as Andrea noted, such drives do have 
 quite a history of timeout problems due to excessive head parking and their 
 power conservation attempts.

We also wondered whether it was the motherboard, and so we've replaced it! Hope 
that that works!

But, from what's being said here, it looks like that might not be the case. :/ 
Although, we've been up for 5 days now with no recurrences of the previous 
issue.

Joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Drive failures with ada on FreeBSD-9.1, driver bug or wiring issue?

2013-07-18 Thread Dr Josef Karthauser
On 18 Jul 2013, at 20:31, Charles Swiger cswi...@mac.com wrote:
 On Jul 18, 2013, at 12:13 PM, Dr Josef Karthauser j...@karthauser.co.uk 
 wrote:
 On 18 Jul 2013, at 13:07, Andrea Venturoli m...@netfence.it wrote:
 
 Perhaps they are WD Green drives?
 
 They're WD RE2-GP 1 TB drives (model WD1000FYPS) , not sure if that's green 
 or not.
 
 Yes, those are WDC's Green drives, although they are also the higher grade 
 version as compared to standard desktop drives which are supposed to have 
 firmware which plays nice with RAID (TLER, time-limited error recovery).
 
 Updating the firmware and increasing the timeout before these spin down 
 automagically is likely to help, but as Andrea noted, such drives do have 
 quite a history of timeout problems due to excessive head parking and their 
 power conservation attempts.

They're currently on firmware 02.01B01, btw. Not sure if that's the latest or 
not.

Joe

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Help! :( ZFS panic on boot, importing pool after server crash.

2013-06-14 Thread Dr Josef Karthauser
Hi, I'm a bit at the end of my tether.

We had a ZFS panic last night on a machine that hosts all my mail and web; it 
was rebooted and it now panics mounting the ZFS root filesystem.

The call stack info is:

solaris assert: ss == NULL, file: 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensource/uts/common/fs/zfs/space_map.c,
 line: 109

kdb_backtrace
panic
space_map_add
space_map_load
metaslab_activate
metaslab_allocate
zio_dva_allocate
zio_execute
taskqueue_run_locked
taskqueue_thread_loop
fork_exit
fork_trampoline

I can boot from the live DVD filesystem, but I can only mount the pool 
read-only without getting the same kernel panic.  This is with FreeBSD 9.0.

The machine is remote, and I don't have access other than through a DRAC 
console port (so I can't cut and paste; sorry for the poor stack trace).

Is anyone here in the position to advice me how I might process to get this 
machine mounting and running again in multi-user mode?

Thanks so much.
Joe

p.s. the config, btw, is a ZFS mirror on two ad devices. It's got a ZFS root 
file system.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Help! :( ZFS panic on boot, importing pool after server crash.

2013-06-14 Thread Dr Josef Karthauser
On 14 Jun 2013, at 12:00, Volodymyr Kostyrko c.kw...@gmail.com wrote:

 14.06.2013 12:55, Dr Josef Karthauser:
 Hi, I'm a bit at the end of my tether.

 p.s. the config, btw, is a ZFS mirror on two ad devices. It's got a ZFS root 
 file system.
 
 If you are fairly sure about your devices you can:
 
 1. Remove second disk from pool or create another pool on top of it.
 
 2. Recreate all FS structure on the second disk. You can dump al your FS with 
 something like:
 

Great. Thanks for that.

Have you got a hint as to how I can get access to the root file system? It's 
currently set to have a legacy mount point.  Which means that when I import the 
pool:

# zfs import -o readonly=on -o altroot=/tmp/zfs -f poolname

the root filesystem is missing.  Then if I try and set the mount point:

#zfs set mountpoint=/tmp/zfs2 poolname

it just sits there; probably because the command is blocking on the R/O pool, 
or something.

How do I temporarily remount the root filesystem so that I can get access to 
the files?

Thanks,
Joe

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Checksum errors across ZFS array

2012-07-20 Thread Dr Josef Karthauser
On 19 Jul 2012, at 18:15, James Snow wrote:

 On Thu, Jul 19, 2012 at 06:05:32PM +0100, Dr Joe Karthauser wrote:
 
 Hi James,
 
 It's almost definitely a memory problem. I'd change it ASAP if I were
 you.
 
 I lost about 70mb from my zfs pool for this very reason just a few
 weeks ago. Luckily I had enough snapshots from before the rot set in
 to recover most of what I lost.
 
 Thanks for the input. I will run a memory test against it.
 
 If I may, why almost definitely a memory problem and not an issue with
 the controller? (Or did you mean the controller memory?)

Hey Snow,

Ok, it's not definitely. Of course, it could be anything. But, memory is where 
I'd look first.

Take care though, my system which had been working fine for about a year when I 
noticed the ZFS rot (which all appears to be recent in time). I ran memcheck+ 
on it for 8 hours or so, and it showed no errors at all. However, when I 
replaced the memory with a different vendor the problems went away. (Reboots 
and power off/on restarts hadn't fixed the problem before!).

So, take care if the memory doesn't report any failures, it might still be 
faulty.

Joe

p.s. It was my fault that I wasn't running ECC memory on the system! :/.
 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


RE: Status of ZFS in -stable?

2008-06-01 Thread Dr Josef Karthauser
 -Original Message-
 From: [EMAIL PROTECTED] [mailto:owner-freebsd-
 [EMAIL PROTECTED] On Behalf Of Zaphod Beeblebrox
 Sent: 21 May 2008 01:48
 To: Freddie Cash
 Cc: freebsd-stable@freebsd.org
 Subject: Re: Status of ZFS in -stable?
 
  Correct.  If I don't use verify ... the backup proceeds normally, but
 it's
 corrupt.  If I turn on verify, the backup stops when it detects the
 corruption (somewhere about halfway through).  This is with XP using a
 Samba-shared ZFS filesystem.  When XP uses a samba shared UFS
 filesystem,
 all is good.  Additionally, I tried telling samba _not_ to use mmap()
 (there's an option), but this didn't fix things.

My guess is that it is still using mmap() somehow; there are definitely
corruption problems with zfs and mmap().

Joe

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: cvsup.uk.FreeBSD.org

2008-05-15 Thread Dr Josef Karthauser
 -Original Message-
 From: [EMAIL PROTECTED] [mailto:owner-freebsd-
 [EMAIL PROTECTED] On Behalf Of Tony Finch
 Sent: 12 May 2008 17:06
 To: Dr Joe Karthauser
 Cc: [EMAIL PROTECTED]; freebsd-stable@freebsd.org
 Subject: Re: cvsup.uk.FreeBSD.org
 
 On Sun, 11 May 2008, Dr Joe Karthauser wrote:
 
  I have reclassified this faulty mirror as cvsup1 and made cvsup a
 cname to
  cvsup3, which is the most recent addition and best hardware
 available. In
  the future we will always point to the most available machine in this
 way.
 
 Looks like I'm getting a bit more traffic than before - peaking at over
 100 logins an hour.

As a matter of interest, do you know what the peak bandwidth usage is?

Joe 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]