Re: poweroff (shutdown -p) is broken

2013-04-03 Thread Jaakko Heinonen
On 2013-04-03, Andriy Gapon wrote:
 on 03/04/2013 02:15 deeptech71 said the following:
  As of r248872, my system, when ordered to power off, stalls at the Uptime:
  [...] message. Before that revision, the Uptime message would be 
  followed by
  several additional messages -- something related to usb controllers -- 
  before
  powering off.
 
 You need to break into the ddb and examine where exactly the shutdown thread 
 is
 stuck.

I can confirm the problem. It hangs while trying to spin down an ada(4)
disk.

Tracing pid 1 tid 12 td 0xc72bbc00
sched_switch(c72bbc00,0,104,1b5,c6946188,...) at sched_switch+0x456/frame 
0xc6ec4a98
mi_switch(104,0,c0f36d00,1f5,5c,...) at mi_switch+0x20b/frame 0xc6ec4ac8
sleepq_switch(c72bbc00,0,c0f36d00,26b,0,...) at sleepq_switch+0x1a5/frame 
0xc6ec4af0
sleepq_wait(c72b573c,5c,c0d92504,0,0,...) at sleepq_wait+0x6b/frame 0xc6ec4b0c
_sleep(c72b573c,c732c974,5c,c0d92504,0,...) at _sleep+0x3f0/frame 0xc6ec4b64
cam_periph_getccb(c72b5700,480,c0d94762,c0,ea,...) at 
cam_periph_getccb+0xd7/frame 0xc6ec4b9c
adaspindown(c6ec4c2c,c0a130b6,0,4008,c0f306e7,...) at adaspindown+0xf1/frame 
0xc6ec4bcc
adashutdown(0,4008,c0f306e7,1bc,c09e261a,...) at adashutdown+0x29/frame 
0xc6ec4bd4
kern_reboot(4008,0,c0f306e7,be,bfbfd9a0,...) at kern_reboot+0x706/frame 
0xc6ec4c2c
sys_reboot(c72bbc00,c6ec4ccc,c72bbcb4,c12164a0,202,...) at 
sys_reboot+0x6c/frame 0xc6ec4c4c
syscall(c6ec4d08) at syscall+0x2ab/frame 0xc6ec4cfc
Xint0x80_syscall() at Xint0x80_syscall+0x21/frame 0xc6ec4cfc
--- syscall (55, FreeBSD ELF32, sys_reboot), eip = 0x805c0a7, esp = 0xbfbfd86c, 
ebp = 0xbfbfd948 ---

-- 
Jaakko
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Any objections/comments on axing out old ATA stack?

2013-04-03 Thread Alexander Motin

On 02.04.2013 21:39, Matthias Andree wrote:

Am 31.03.2013 23:02, schrieb Scott Long:


So what I hear you and Matthias saying, I believe, is that it should be easier 
to
force disks to fall back to non-NCQ mode, and/or have a more responsive
black-list for problematic controllers.  Would this help the situation?  It's 
hard to
justify holding back overall forward progress because of some bad controllers;
we do several Tbps off of AHCI controllers with NCQ enabled on FreeBSD 9.x,
enough to make up a sizable percentage of the internet's traffic, and we see no
problems.  How can we move forward but also take care of you guys with
problematic hardware?


Well, I am running the driver fine off of my WD Caviar RE3 disk, and the
problematic drive also works just fine with Windows and Linux, so it
must be something between the problematic drive and the FreeBSD driver.

I would like to see any of this, in decreasing order of precedence:

- debugged driver

- assistance/instructions on helping how to debug the driver/trace NCQ
stuff/...  (as in Jeremy Chadwick's followup in this same thread - this
helps, I will attempt to procure the required information; back then,
reducing the number of tags to 31 was ineffective, including an error
message and getting a value of 32 when reading the setting back)


Unfortunately, I don't know how to debug that. Command timeouts reported 
on the lists before are the kind of errors that are most difficult to 
diagnose since the controller gives no information to do that. We just 
see that sent commands are no longer completing. May be it is some 
incompatibility of specific drive and HBA firmwares, triggered by some 
innocent specifics of our ATA stack, GEOM or filesystems implementation. 
All I can propose is to try to identify such cases and add some quirks 
to workaround it, like disabling NCQ or limiting number of tags. I am 
not sure what else can we do about it without some controlled lab 
environment with affected hardware and SATA analyzer.



- user-space contingency features, such as letting camcontrol limit
the number of open NCQ tags, or disable NCQ, either on a per-drive basis


I've merged support for that to 8/9-STABLE about 9 months ago:
`camcontrol tags ada0 -v -N X` should change number of simultaneously 
used tags,
`camcontrol negotiate ada0 -T (en|dis)able` should enable/disable use of 
NCQ.
I just did some tests on HEAD and these commands seems like working. If 
you can reproduce the problem, it would be nice to collect information 
how these changes affect it.



I am capable of debugging C - mostly with gdb command-line, and
graphical Windows IDEs - but am unfamiliar with FreeBSD kernel
debugging. If necessary, I can pull up a second console, but the PC that
is affected is legacy-free, so serial port only works through a
serial/USB converter.



--
Alexander Motin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: poweroff (shutdown -p) is broken

2013-04-03 Thread Alexander Motin

On 03.04.2013 02:15, deeptech71 wrote:

As of r248872, my system, when ordered to power off, stalls at the
Uptime: [...] message. Before that revision, the Uptime message
would be followed by several additional messages -- something related to
usb controllers -- before powering off.


Could you give any more information about your system and the problem? 
What disks and controllers do you have and which drivers do you use? 
Full verbose kernel messages from boot up to the hang (if you can set up 
serial console) could be interesting.


--
Alexander Motin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


RE: rebooting nvidia + keyboard issues

2013-04-03 Thread Thomas Sparrevohn
I have had the same problem - it seems like a sysctl call provokes a
overrun in a strlen call. It is not reproducible with a GENERIC kernel with
debugging in my case. However a simple workaround is to compile
nvidia-driver with gcc.  

-Original Message-
From: owner-freebsd-curr...@freebsd.org
[mailto:owner-freebsd-curr...@freebsd.org] On Behalf Of Waitman Gobble
Sent: 31 March 2013 08:25
To: curr...@freebsd.org
Subject: rebooting nvidia + keyboard issues

Hi,

After updating my machine tonight with 
 uname -a
FreeBSD dx.burplex.com 10.0-CURRENT FreeBSD 10.0-CURRENT #0 r248937: Sat Mar
30 21:53:14 PDT 2013 r...@dx.burplex.com:/usr/obj/usr/src/sys/FURAHA
amd64

I noticed the machine rebooting randomly every 20 seconds to 5 minutes.
Disabling the nvidia driver seems to fix the problem, and I was able to
update after applying ports/177459 patch. The updated nvidia driver seems to
have solved the rebooting issue. (it could (also?) be related to linux.ko?)
If people are using the nvidia driver and are experiencing a constant reboot
issue, it might be good to pop in that patch ASAP.

The problem I am noticing now is keyboard related. Booting to single user
mode, I cannot type anything at the login prompt with an attached USB
keyboard. However in single user mode a PS2 keyboard will allow me to login.
I would not say it's not working.. the keyboard functions fine until hitting
the login prompt. There are no errors and everything appears to be working
fine, however I cannot login. Numlock key lights on and off when pressed. 

Also, if I boot into multi user mode, I cannot type anything at the login
prompt when a PS2 keyboard is attached. But the USB keyboard will work in
mutli user mode.

Thank you,

-- 
Waitman Gobble
San Jose California USA

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: poweroff (shutdown -p) is broken

2013-04-03 Thread Alexander Motin

On 03.04.2013 12:32, Alexander Motin wrote:

On 03.04.2013 02:15, deeptech71 wrote:

As of r248872, my system, when ordered to power off, stalls at the
Uptime: [...] message. Before that revision, the Uptime message
would be followed by several additional messages -- something related to
usb controllers -- before powering off.


Could you give any more information about your system and the problem?
What disks and controllers do you have and which drivers do you use?
Full verbose kernel messages from boot up to the hang (if you can set up
serial console) could be interesting.


I was able to reproduce the problem on legacy mode SATA channel, shared 
by two disks. I think recent commit just triggered some existing bug. I 
will start further debugging immediately to fix it ASAP.


--
Alexander Motin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: poweroff (shutdown -p) is broken

2013-04-03 Thread Alexander Motin

On 03.04.2013 14:21, Alexander Motin wrote:

On 03.04.2013 12:32, Alexander Motin wrote:

On 03.04.2013 02:15, deeptech71 wrote:

As of r248872, my system, when ordered to power off, stalls at the
Uptime: [...] message. Before that revision, the Uptime message
would be followed by several additional messages -- something related to
usb controllers -- before powering off.


Could you give any more information about your system and the problem?
What disks and controllers do you have and which drivers do you use?
Full verbose kernel messages from boot up to the hang (if you can set up
serial console) could be interesting.


I was able to reproduce the problem on legacy mode SATA channel, shared
by two disks. I think recent commit just triggered some existing bug. I
will start further debugging immediately to fix it ASAP.


I'm sorry, it was my fault. Legacy channels appeared more sensitive to 
the cause of the issue, while more modern SATA and SAS controllers I've 
tested hidden it. r249048 should fix the problem.


--
Alexander Motin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: panic at serial boot

2013-04-03 Thread Konstantin Belousov
On Tue, Apr 02, 2013 at 11:06:06PM +0200, m...@kernel32.de wrote:
 Hi again,
 
 Am 2013-04-02 21:52, schrieb Konstantin Belousov:
  On Tue, Apr 02, 2013 at 08:23:20PM +0200, m...@kernel32.de wrote:
 
  Try breaking into the debugger and see where it progresses. To do 
  this,
  you would need to boot with the 'boot -d' command from the loader 
  prompt,
  then do 'w kbd_break_to_debugger 1', then ctrl-alt-esc when you want 
  to
  activate the debugger. In the debugger, start with the 'ps' command.
 
 
 After the beastie menu I went to the loader prompt and did boot -d.
 I was sent to the debugger and did w kbd_break_to_debugger 1.
 
 This is the output I get.
 
 INT 13 08: Success, count = 1, BPT = :  
 GDB: no debug ports present
 KDB: debugger backends: ddbhift limit: 0x82
 KDB: current backend: ddbt15 = f000f859  int1e = f000ef6d
 KDB: enter: Boot flags requested debuggerint1e = f000ef6d
 [ thread pid 0 tid 0 ] booting...
 Stopped at  kdb_enter+0x3e: movq$0,kdb_why
 db w kbd_break_to_debugger 1
 Symbol not found
It is kdb_break_to debugger.
Sorry.


pgpOmtE9pMURd.pgp
Description: PGP signature


Re: RE: rebooting nvidia + keyboard issues

2013-04-03 Thread Waitman Gobble
On Wed, 3 Apr 2013 10:35:47 +0100, Thomas Sparrevohn 
thomas.sparrev...@btinternet.com wrote: 

I have had the same problem - it seems like a sysctl call provokes a
overrun in a strlen call. It is not reproducible with a GENERIC kernel with
debugging in my case. However a simple workaround is to compile
nvidia-driver with gcc.  

-Original Message-
From: owner-freebsd-curr...@freebsd.org
[mailto:owner-freebsd-curr...@freebsd.org] On Behalf Of Waitman Gobble
Sent: 31 March 2013 08:25
To: curr...@freebsd.org
Subject: rebooting nvidia + keyboard issues

Hi,

After updating my machine tonight with 
 uname -a
FreeBSD dx.burplex.com 10.0-CURRENT FreeBSD 10.0-CURRENT #0 r248937: Sat Mar
30 21:53:14 PDT 2013 r...@dx.burplex.com:/usr/obj/usr/src/sys/FURAHA
amd64

I noticed the machine rebooting randomly every 20 seconds to 5 minutes.
Disabling the nvidia driver seems to fix the problem, and I was able to
update after applying ports/177459 patch. The updated nvidia driver seems to
have solved the rebooting issue. (it could (also?) be related to linux.ko?)
If people are using the nvidia driver and are experiencing a constant reboot
issue, it might be good to pop in that patch ASAP.

The problem I am noticing now is keyboard related. Booting to single user
mode, I cannot type anything at the login prompt with an attached USB
keyboard. However in single user mode a PS2 keyboard will allow me to login.
I would not say it's not working.. the keyboard functions fine until hitting
the login prompt. There are no errors and everything appears to be working
fine, however I cannot login. Numlock key lights on and off when pressed. 

Also, if I boot into multi user mode, I cannot type anything at the login
prompt when a PS2 keyboard is attached. But the USB keyboard will work in
mutli user mode.

Thank you,

-- 
Waitman Gobble
San Jose California USA

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Thank you for the information, I'll give it a try and see what happens. I 
noticed two other ports that don't even seem to build with clang, but generally 
everything else seems to be working.

I noticed a previous message from a day or two ago regarding gcc and nvidia, 
but I lost the message. I'm in the middle of moving my mail system to a 'more' 
horizontally based system, so it's kind of like building a boat around you 
while floating down a river.
 
Anyhow, I read an article this morning about AMD opening up drivers, that 
sounds cool, going to check it out.

--
Waitman Gobble
San Jose California USA
+1.5108307875


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Patch ath3kfw to accept other device ids; blacklist ath bluetooth devices

2013-04-03 Thread Adrian Chadd
Hi,

Here you go:

http://people.freebsd.org/~adrian/ath/20130401-ath-bluetooth.diff

The ath3k driver in linux does a fair bit more than ath3kfw:

* if it's a subset of chips that needs firmware, it squirts ath3k-1.fw onto it
* there's a subset of chips that get ROM/RAM patches; so if it's one
of those, squeeze the patch on;
* it also will configure the AR3012 NIC between config and normal
mode when it's loading in those patches.

So it's likely that for full support, we're going to have to port over
this functionality and throw those firmware / patch / config files
somewhere to be uploaded when the relevant device is attached.

Thanks,



Adrian
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RE: rebooting nvidia + keyboard issues

2013-04-03 Thread Adrian Chadd
Hi,

can you guys please ensure a PR is filed with all the information
you've just included?

the clang team would likely love to have this much information in a bug report.

Thanks!



adrian


On 3 April 2013 10:50, Waitman Gobble uzi...@da3m0n8t3r.com wrote:
 On Wed, 3 Apr 2013 10:35:47 +0100, Thomas Sparrevohn 
 thomas.sparrev...@btinternet.com wrote:

I have had the same problem - it seems like a sysctl call provokes a
overrun in a strlen call. It is not reproducible with a GENERIC kernel with
debugging in my case. However a simple workaround is to compile
nvidia-driver with gcc.

-Original Message-
From: owner-freebsd-curr...@freebsd.org
[mailto:owner-freebsd-curr...@freebsd.org] On Behalf Of Waitman Gobble
Sent: 31 March 2013 08:25
To: curr...@freebsd.org
Subject: rebooting nvidia + keyboard issues

Hi,

After updating my machine tonight with
 uname -a
FreeBSD dx.burplex.com 10.0-CURRENT FreeBSD 10.0-CURRENT #0 r248937: Sat Mar
30 21:53:14 PDT 2013 r...@dx.burplex.com:/usr/obj/usr/src/sys/FURAHA
amd64

I noticed the machine rebooting randomly every 20 seconds to 5 minutes.
Disabling the nvidia driver seems to fix the problem, and I was able to
update after applying ports/177459 patch. The updated nvidia driver seems to
have solved the rebooting issue. (it could (also?) be related to linux.ko?)
If people are using the nvidia driver and are experiencing a constant reboot
issue, it might be good to pop in that patch ASAP.

The problem I am noticing now is keyboard related. Booting to single user
mode, I cannot type anything at the login prompt with an attached USB
keyboard. However in single user mode a PS2 keyboard will allow me to login.
I would not say it's not working.. the keyboard functions fine until hitting
the login prompt. There are no errors and everything appears to be working
fine, however I cannot login. Numlock key lights on and off when pressed.

Also, if I boot into multi user mode, I cannot type anything at the login
prompt when a PS2 keyboard is attached. But the USB keyboard will work in
mutli user mode.

Thank you,

--
Waitman Gobble
San Jose California USA

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


 Thank you for the information, I'll give it a try and see what happens. I 
 noticed two other ports that don't even seem to build with clang, but 
 generally everything else seems to be working.

 I noticed a previous message from a day or two ago regarding gcc and nvidia, 
 but I lost the message. I'm in the middle of moving my mail system to a 
 'more' horizontally based system, so it's kind of like building a boat around 
 you while floating down a river.

 Anyhow, I read an article this morning about AMD opening up drivers, that 
 sounds cool, going to check it out.

 --
 Waitman Gobble
 San Jose California USA
 +1.5108307875


 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Patch ath3kfw to accept other device ids; blacklist ath bluetooth devices

2013-04-03 Thread maksim yevmenkin
Hi Adrian !

Thank you for your work. I briefly looked at it and it seems fine to me.  I'm 
not able to give it a proper review as I'm traveling internationally currently. 
Having said all that, I think it would be reasonable to commit it as is into 
head. 

Thanks,
Max


On Apr 3, 2013, at 10:52 AM, Adrian Chadd adrian.ch...@gmail.com wrote:

 Hi,
 
 Here you go:
 
 http://people.freebsd.org/~adrian/ath/20130401-ath-bluetooth.diff
 
 The ath3k driver in linux does a fair bit more than ath3kfw:
 
 * if it's a subset of chips that needs firmware, it squirts ath3k-1.fw onto it
 * there's a subset of chips that get ROM/RAM patches; so if it's one
 of those, squeeze the patch on;
 * it also will configure the AR3012 NIC between config and normal
 mode when it's loading in those patches.
 
 So it's likely that for full support, we're going to have to port over
 this functionality and throw those firmware / patch / config files
 somewhere to be uploaded when the relevant device is attached.
 
 Thanks,
 
 
 
 Adrian
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Patch ath3kfw to accept other device ids; blacklist ath bluetooth devices

2013-04-03 Thread Adrian Chadd
On 3 April 2013 11:16, maksim yevmenkin maksim.yevmen...@gmail.com wrote:
 Hi Adrian !

 Thank you for your work. I briefly looked at it and it seems fine to me.  I'm 
 not able to give it a proper review as I'm traveling internationally 
 currently. Having said all that, I think it would be reasonable to commit it 
 as is into head.



Thanks!



Adrian
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Any objections/comments on axing out old ATA stack?

2013-04-03 Thread Matthias Andree
I have just sent more information to the PR at
http://www.freebsd.org/cgi/query-pr.cgi?pr=157397

The short summary (more info in the PR) is:

- limiting tags to 31 does not help

- disabling NCQ appears to help in initial testing, but warrants more
testing

- error happens during WRITE_FPDMA_QUEUED,

- File system in question is SU+J UFS2 mounted on /usr, and I can for
instance rm -rf /usr/obj or just log into GNOME and try to open a
gnome-terminal to trigger stalls;

- Linux uses 31 tags (for different reason) and has no drive quirks, but
a controller quirk;

for Jeremy's topic #6, regarding the ATI/AMD SB7x0 that I am using, it
might be worthwhile investigating the AHCI_HFLAG_IGN_SERR_INTERNAL flag
- it gets set by Linux on the SB700 that my computer is using, see
ahci_error_intr() in libahci.h - I am not going to interpret that for
lack of expertise, but it does affect error handling and appears to
ignore a certain condition.

Why only my Samsung HDD drive triggers this but not the WD drive, I do
not know yet.

Hope that helps a bit.




signature.asc
Description: OpenPGP digital signature


Re: Any objections/comments on axing out old ATA stack?

2013-04-03 Thread Matthias Andree
Am 04.04.2013 01:38, schrieb Jeremy Chadwick:

...

 While skimming Linux libata code and commits in the past, the only
 glaringly obvious bug/issue I see is with SB600/SB700 chipsets (the
 hardware revision apparently matters) and port multiplier (PMP) support
 and soft resets.
 
 Are you using a port multiplier?  I doubt it, but I have to ask.

I am not using a PMP as far as I know (unless one is buried on my Asus
M4A78T-E main board). It would seem the drives are directly attached to
the south bridge's SATA ports.

 Why only my Samsung HDD drive triggers this but not the WD drive, I do
 not know yet.
 
 Please provide gpart show -p ada1 output, both here and in the PR,
 if you could.

=63  1953525105ada1  MBR  (931G)
  63   209714337  ada1s1  freebsd  [active]  (100G)
   209714400 800  - free -  (400k)
   2097152007168  ada1s2  ntfs  (34G)
   281395200   15405  - free -  (7.5M)
   281410605   488263545  ada1s3  linux-data  (232G)
   769674150  1183851018  - free -  (564G)

HTH

Best regards
Matthias



signature.asc
Description: OpenPGP digital signature


Re: Any objections/comments on axing out old ATA stack?

2013-04-03 Thread Jeremy Chadwick
On Thu, Apr 04, 2013 at 12:15:32AM +0200, Matthias Andree wrote:
 I have just sent more information to the PR at
 http://www.freebsd.org/cgi/query-pr.cgi?pr=157397
 
 The short summary (more info in the PR) is:
 
 - limiting tags to 31 does not help
 
 - disabling NCQ appears to help in initial testing, but warrants more
 testing
 
 - error happens during WRITE_FPDMA_QUEUED,

This is an NCQ-based write LBA request.  There are many non-NCQ
equivalents of this, ATA-protocol-wise (too many to list here), but the
most likely non-NCQ ATA command you'd see is WRITE_DMA48.

 - File system in question is SU+J UFS2 mounted on /usr, and I can for
 instance rm -rf /usr/obj or just log into GNOME and try to open a
 gnome-terminal to trigger stalls;
 
 - Linux uses 31 tags (for different reason) and has no drive quirks, but
 a controller quirk;
 
 for Jeremy's topic #6, regarding the ATI/AMD SB7x0 that I am using, it
 might be worthwhile investigating the AHCI_HFLAG_IGN_SERR_INTERNAL flag
 - it gets set by Linux on the SB700 that my computer is using, see
 ahci_error_intr() in libahci.h - I am not going to interpret that for
 lack of expertise, but it does affect error handling and appears to
 ignore a certain condition.

Alexander could expand on this, but the name of the flag implies that
there are certain conditions where the SATA-level SERR condition gets
ignored (IGN).

While skimming Linux libata code and commits in the past, the only
glaringly obvious bug/issue I see is with SB600/SB700 chipsets (the
hardware revision apparently matters) and port multiplier (PMP) support
and soft resets.

Are you using a port multiplier?  I doubt it, but I have to ask.

 Why only my Samsung HDD drive triggers this but not the WD drive, I do
 not know yet.

Please provide gpart show -p ada1 output, both here and in the PR,
if you could.

I have a gut feeling I know what the issue is (and if it is what I think
it is, it's actually happening all the time, just that NCQ exacerbates
it given how command queueing works), but I won't know for sure until I
see the output.

Thanks.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Mountain View, CA, US|
| Making life hard for others since 1977. PGP 4BD6C0CB |
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Any objections/comments on axing out old ATA stack?

2013-04-03 Thread Jeremy Chadwick
On Thu, Apr 04, 2013 at 02:19:16AM +0200, Matthias Andree wrote:
 Am 04.04.2013 01:38, schrieb Jeremy Chadwick:
 
 ...
 
  While skimming Linux libata code and commits in the past, the only
  glaringly obvious bug/issue I see is with SB600/SB700 chipsets (the
  hardware revision apparently matters) and port multiplier (PMP) support
  and soft resets.
  
  Are you using a port multiplier?  I doubt it, but I have to ask.
 
 I am not using a PMP as far as I know (unless one is buried on my Asus
 M4A78T-E main board). It would seem the drives are directly attached to
 the south bridge's SATA ports.

Then the answer is nope, you're not using a PM.  Details:

http://www.serialata.org/technology/port_multipliers.asp
http://en.wikipedia.org/wiki/Port_multiplier

  Why only my Samsung HDD drive triggers this but not the WD drive, I do
  not know yet.
  
  Please provide gpart show -p ada1 output, both here and in the PR,
  if you could.
 
 =63  1953525105ada1  MBR  (931G)
   63   209714337  ada1s1  freebsd  [active]  (100G)
209714400 800  - free -  (400k)
2097152007168  ada1s2  ntfs  (34G)
281395200   15405  - free -  (7.5M)
281410605   488263545  ada1s3  linux-data  (232G)
769674150  1183851018  - free -  (564G)

This is what I was worried about.  Referring to your camcontrol
identify output:

 device model SAMSUNG HD103SI
 sector size logical 512, physical 512, offset 0

Hear me out entirely on this one.

My theory is that your hard disk actually uses 4096-byte sectors but is
too old to provide ATA IDENTIFY semantics to delineate between logical
vs. physical sector size.  In other words, only logical is provided,
thus logical=physical in the eyes of all software; smartctl will show
you the exact same thing too.

There are drives like this in the wild, both SSDs as well as MHDDs.
For example, the Intel 320-series SSD behaves this way too (providing
only logical size).

Do not let the capacity/size of the drive be the deciding factor; your
drive is 1TB, but I also have many 1TB MHDDs that use 4096-byte sectors.

Seagate/Samsung's specification** for the HD103SI states, and I quote:
Byte per Sensor: 512 bytes.  Yes, it says Sensor.  Whether or not
this documentation is correct/accurate is unknown, and when vendors have
typos in their own specification docs, I cannot help but to honour the
possibility of the information being wrong.  So I'm unsure if this drive
uses 512-byte sectors or 4096-byte sectors.

That said: in your gpart show ada1 output, none of your partitions
(FreeBSD, NTFS, nor Linux) appear to be aligned to 4096-byte boundaries.
Ideally you'd want to have these aligned to 1MB or 2MByte boundaries in
the case you ever move to an SSD.  You're also using the MBR scheme,
which does not tend to play well with alignment.

Comparatively, your WD5002ABYS drive **does** use 512-byte sectors (I
know this for a fact).

The problem here is that I cannot guarantee you that alignment is
the problem.  The performance impact of writes to partitions which are
non-aligned is quite high, and NCQ just exacerbates this problem.  I
would love to tell you switch to GPT and follow Warren Block's
document*** but if your NTFS partition is Windows and is a Windows version
older than Windows 7 GPT is not supported.

One piece of evidence that refutes my theory is that if Windows and/or
Linux partition are something you boot into and use often, I would
imagine NCQ would be used in both of those environments and would suffer
from the same issue.  Although Windows tends to hide all sorts of
transient errors from the user (sigh), Linux tends to be like FreeBSD
with regards to such issues (on the console anyway; you wouldn't see
such messages normally inside of X).

If you have the time and want to put forth the effort, I would recommend
backing up all your data on ada1, zero the first and last 1MByte of the
drive, and then try following Warren Block's guide.  I'd just recommend
doing this:

gpart create -s gpt ada1
gpart add -t freebsd-ufs -b 2m ada1
newfs -U -j /dev/ada1p1   (or remove -j if you don't want to use SUJ)

I picked an alignment value of 2MBytes since it's both 4K-aligned and is
generally safe for things like newer SSDs that have larger NAND erase
block size (I am not going to get into a discussion about that here, so
please stay focused.  :-) )

If the problem is gone after that (it should be easy to induce by
writing tons and tons of data to the drive), then we can safely say that
the drive uses 4096-byte sectors and need to add it to the quirks list
in ata_da.c.

If the problem remains after that, then further investigation is needed,
and we can safely rule out alignment.  Welcome to all the pain/effort
one has to go through when troubleshooting things like this.  :-)

Another thing: in your PR you state:

 - I am running with kern.cam.ada.default_timeout=5 which makes the
 computer recover faster

I can definitely imagine cases where