Kernel Failure - 3.4.24 Similar USB MO To 3.4.89 Kernel Failure

2014-05-16 Thread John L. Males
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello,

Please CC me in on replies as I am not part of the LKML.

As the prior round of discussion about this ongoing USB Kernel
problem was with Sebastian, I have CC'ed Sebastian in on this
posting as well.  Again this is because the Linux Kernel
information suggests CCing in someone that might be able to
assist for the area of concern.  My hope is that this will
assist in determining who should be the kernel developer that
needs to look at these Kernel failures and the crash/opps if
need be.

I have a very very busy and unpredictable schedule, so I would
ask for patience in a reply from me if one is so needed.

For the last few years I have had about a half dozen Kernel
failures that all appear to be related to USB devices being
plugged in.

The last occurrence a few months ago to the one today actually
caused a kernel crash/opps to the console resulting in the only
option was to power off the machine and power it back on. I
took a high quality DSLR image of the screen which clearly
has important information roll off as the screen was not
large enough to hold the information.  I also searched high
and low using a my tablet for a few days to see if I could
find out how I might be able to secure the information that
rolled off the screen, not to mention have it in a easy to
use form for the Kernel developers to work with.  I have
looked since powering up the machine from that event and many
times since and can only find references, as then, to using a
second machine connected to the machine had had the Kernel
crash/opps via serial using a debugger.  I do not have the
kernel experience or such at this point to know how to do this
and reading suggested some one or few Kernel options were
needed in the Kernel for this serial debugging approach to
work.  So on that note if anyone can advise me if there is a
way to find where a kernel crash/opps is stored that one can
collect and send to the Kernel Developers I would be most
appreciative.  I have and still do make efforts to find the
information.  It is possible I am not using the correct search
terms or know where I need to look to read the about the
information.

About 14:49 EDT my system experienced yet another Linux Kernel
failure.  Again it was related to inserting a basic USB, not a
MP3 player USB, just a plain data USB.  This followed my
removing a different USB after issuing a pumount command that
returned as successful.  I have attached a copy of the kernel
failure details.

If there is a desire to see the DSLR screen image of the prior
kernel crash/opps please advise me to do so.

Please be aware I do not use any drivers other than those in
the Linux Kernel other than those in the stock Kernel.  I do
not need any unique drivers for my machine or the devices I use
with my laptop.  Also be aware all of the 3.x Linux Kernels I
have used are from Kernel.org and I compile these myself using
the same configuration file plus any additional config file item
options I set that are added to the next 3.4.x kernel version
I compile. This means there is no reason for my kernel to ever
be tainted.  If my Linux 3.4.x Kernel is listed as tainted, it
is the stock Linux Kernel that has so decided for some reason.


Regards,

John L. Males
Toronto, Ontario
Canada
16 May 2014 17:30 -0400 EDT




2014-05-16 16:56:58.344920846-0400-EDT Time: 1400273818

16 May 16:56:58 ntpdate[14149]: ntpdate 4.2.6p2@1.2194-o Sun
Oct 17 13:35:14 UTC 2010 (1)

16 May 16:57:12 ntpdate[14154]: step time server 208.80.96.70
offset 0.003026 sec

Linux 3.4.89-kernel.org-jlm-010-amd64 #1 SMP PREEMPT Wed May 7
22:33:10 EDT 2014

Modified Debian GNU/Linux 6.0.3 (squeeze)
(Alternative to Debian determined, work in progress)

cat /proc/cpuinfo (Selected):

model name  : Intel(R) Core(TM)2 CPU T5600  @
1.83GHz

vmstat -s:

  3452464 K total memory
  3381088 K used memory
  2608984 K active memory
   570068 K inactive memory
71376 K free memory
 2796 K buffer memory
   106480 K swap cache
  8225244 K total swap
  1875240 K used swap
  6350004 K free swap
 36725845 non-nice user cpu ticks
   692898 nice user cpu ticks
  4757452 system cpu ticks
 78815904 idle cpu ticks
  2909319 IO-wait cpu ticks
 5590 IRQ cpu ticks
  1678486 softirq cpu ticks
0 stolen cpu ticks
 81758774 pages paged in
 66779328 pages paged out
  6643777 pages swapped in
  5417469 pages swapped out
431124356 interrupts
567863734 CPU context switches
   1399647013 boot time
   175501 forks

/proc/vmstat (Selected):

pgpgin 81758774
pgpgout 66779328
pswpin 6643777
pswpout 5417469
pgfree 776294670
pgfault 546643863
pgmajfault 2018217

/proc/meminfo (Selected):

Mlocked:6604 kB
VmallocTotal:   34359738367 kB
VmallocChunk:   34359322080 kB
HugePages_Total:   0

vmstat --partition /dev

Kernel Failure - 3.4.24 Similar USB MO To 3.4.89 Kernel Failure

2014-05-16 Thread John L. Males
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello,

Please CC me in on replies as I am not part of the LKML.

As the prior round of discussion about this ongoing USB Kernel
problem was with Sebastian, I have CC'ed Sebastian in on this
posting as well.  Again this is because the Linux Kernel
information suggests CCing in someone that might be able to
assist for the area of concern.  My hope is that this will
assist in determining who should be the kernel developer that
needs to look at these Kernel failures and the crash/opps if
need be.

I have a very very busy and unpredictable schedule, so I would
ask for patience in a reply from me if one is so needed.

For the last few years I have had about a half dozen Kernel
failures that all appear to be related to USB devices being
plugged in.

The last occurrence a few months ago to the one today actually
caused a kernel crash/opps to the console resulting in the only
option was to power off the machine and power it back on. I
took a high quality DSLR image of the screen which clearly
has important information roll off as the screen was not
large enough to hold the information.  I also searched high
and low using a my tablet for a few days to see if I could
find out how I might be able to secure the information that
rolled off the screen, not to mention have it in a easy to
use form for the Kernel developers to work with.  I have
looked since powering up the machine from that event and many
times since and can only find references, as then, to using a
second machine connected to the machine had had the Kernel
crash/opps via serial using a debugger.  I do not have the
kernel experience or such at this point to know how to do this
and reading suggested some one or few Kernel options were
needed in the Kernel for this serial debugging approach to
work.  So on that note if anyone can advise me if there is a
way to find where a kernel crash/opps is stored that one can
collect and send to the Kernel Developers I would be most
appreciative.  I have and still do make efforts to find the
information.  It is possible I am not using the correct search
terms or know where I need to look to read the about the
information.

About 14:49 EDT my system experienced yet another Linux Kernel
failure.  Again it was related to inserting a basic USB, not a
MP3 player USB, just a plain data USB.  This followed my
removing a different USB after issuing a pumount command that
returned as successful.  I have attached a copy of the kernel
failure details.

If there is a desire to see the DSLR screen image of the prior
kernel crash/opps please advise me to do so.

Please be aware I do not use any drivers other than those in
the Linux Kernel other than those in the stock Kernel.  I do
not need any unique drivers for my machine or the devices I use
with my laptop.  Also be aware all of the 3.x Linux Kernels I
have used are from Kernel.org and I compile these myself using
the same configuration file plus any additional config file item
options I set that are added to the next 3.4.x kernel version
I compile. This means there is no reason for my kernel to ever
be tainted.  If my Linux 3.4.x Kernel is listed as tainted, it
is the stock Linux Kernel that has so decided for some reason.


Regards,

John L. Males
Toronto, Ontario
Canada
16 May 2014 17:30 -0400 EDT




2014-05-16 16:56:58.344920846-0400-EDT Time: 1400273818

16 May 16:56:58 ntpdate[14149]: ntpdate 4.2.6p2@1.2194-o Sun
Oct 17 13:35:14 UTC 2010 (1)

16 May 16:57:12 ntpdate[14154]: step time server 208.80.96.70
offset 0.003026 sec

Linux 3.4.89-kernel.org-jlm-010-amd64 #1 SMP PREEMPT Wed May 7
22:33:10 EDT 2014

Modified Debian GNU/Linux 6.0.3 (squeeze)
(Alternative to Debian determined, work in progress)

cat /proc/cpuinfo (Selected):

model name  : Intel(R) Core(TM)2 CPU T5600  @
1.83GHz

vmstat -s:

  3452464 K total memory
  3381088 K used memory
  2608984 K active memory
   570068 K inactive memory
71376 K free memory
 2796 K buffer memory
   106480 K swap cache
  8225244 K total swap
  1875240 K used swap
  6350004 K free swap
 36725845 non-nice user cpu ticks
   692898 nice user cpu ticks
  4757452 system cpu ticks
 78815904 idle cpu ticks
  2909319 IO-wait cpu ticks
 5590 IRQ cpu ticks
  1678486 softirq cpu ticks
0 stolen cpu ticks
 81758774 pages paged in
 66779328 pages paged out
  6643777 pages swapped in
  5417469 pages swapped out
431124356 interrupts
567863734 CPU context switches
   1399647013 boot time
   175501 forks

/proc/vmstat (Selected):

pgpgin 81758774
pgpgout 66779328
pswpin 6643777
pswpout 5417469
pgfree 776294670
pgfault 546643863
pgmajfault 2018217

/proc/meminfo (Selected):

Mlocked:6604 kB
VmallocTotal:   34359738367 kB
VmallocChunk:   34359322080 kB
HugePages_Total:   0

vmstat --partition /dev

Linux Kernel Commit 025cee7f8fef02af09b03c8e1cd9843cb32adf9b

2013-02-28 Thread John L. Males
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Dave,

With respect to Linux Kernel Commit
025cee7f8fef02af09b03c8e1cd9843cb32adf9b Change Log:

<http://www.kernel.org/pub/linux/kernel/v3.0/ChangeLog-3.4.34>

comment "It has probably been the cause of a number of subtle
bugs over the years, although the conditions to excite them
would have been hard to trigger."

Would the Linux Kernel under "unique" Virtual Memory Subsystem
stress excite some of these "subtle bugs" in Linux Kernels
prior to this change set for Commit
025cee7f8fef02af09b03c8e1cd9843cb32adf9b?  I am not asking for
you to identify the subtle bugs.  It is obvious it would be
difficult to determine from reported and unreported cases such
bugs. Just your sense or first hand knowledge if a Linux Kernel
under "unique" stress including Virtual Memory Subsystem stress
would be a key element of?

I am one of those who seems to often cause system kernels of
any OS stress due to my very high power user use of Operating
Systems and hence why my question.


Regards,

John L. Males
Toronto, Ontario
Canada
28 February 2013 17:11
<mailto:jlma...@gmail.com>


==
2013-02-28 16:57:24.110757989-0500-EST

28 Feb 16:57:24 ntpdate[27847]: ntpdate 4.2.6p2@1.2194-o Sun
Oct 17 13:35:14 UTC 2010 (1)

28 Feb 16:58:59 ntpdate[27852]: step time server 132.246.11.228
offset -0.000138 sec

Linux 3.4.24-kernel.org-jlm-010-amd64 #1 SMP PREEMPT Sun Dec 23
10:06:41 EST 2012

Modified Debian GNU/Linux 6.0.3 (squeeze)
(Evaluating alternatives to Debian)

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAlEv1h8ACgkQ+V/XUtB6aBDSzQCgiu+hskZzz2bMfLG5u+Ao9YzJ
hwIAn1IY5jnq0sJjIe0nxFnA+LKrGtAh
=1KDN
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Linux Kernel Commit 025cee7f8fef02af09b03c8e1cd9843cb32adf9b

2013-02-28 Thread John L. Males
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Dave,

With respect to Linux Kernel Commit
025cee7f8fef02af09b03c8e1cd9843cb32adf9b Change Log:

http://www.kernel.org/pub/linux/kernel/v3.0/ChangeLog-3.4.34

comment It has probably been the cause of a number of subtle
bugs over the years, although the conditions to excite them
would have been hard to trigger.

Would the Linux Kernel under unique Virtual Memory Subsystem
stress excite some of these subtle bugs in Linux Kernels
prior to this change set for Commit
025cee7f8fef02af09b03c8e1cd9843cb32adf9b?  I am not asking for
you to identify the subtle bugs.  It is obvious it would be
difficult to determine from reported and unreported cases such
bugs. Just your sense or first hand knowledge if a Linux Kernel
under unique stress including Virtual Memory Subsystem stress
would be a key element of?

I am one of those who seems to often cause system kernels of
any OS stress due to my very high power user use of Operating
Systems and hence why my question.


Regards,

John L. Males
Toronto, Ontario
Canada
28 February 2013 17:11
mailto:jlma...@gmail.com


==
2013-02-28 16:57:24.110757989-0500-EST

28 Feb 16:57:24 ntpdate[27847]: ntpdate 4.2.6p2@1.2194-o Sun
Oct 17 13:35:14 UTC 2010 (1)

28 Feb 16:58:59 ntpdate[27852]: step time server 132.246.11.228
offset -0.000138 sec

Linux 3.4.24-kernel.org-jlm-010-amd64 #1 SMP PREEMPT Sun Dec 23
10:06:41 EST 2012

Modified Debian GNU/Linux 6.0.3 (squeeze)
(Evaluating alternatives to Debian)

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAlEv1h8ACgkQ+V/XUtB6aBDSzQCgiu+hskZzz2bMfLG5u+Ao9YzJ
hwIAn1IY5jnq0sJjIe0nxFnA+LKrGtAh
=1KDN
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re[04]: Kernel Failure - 3.4.24

2013-01-30 Thread John L. Males
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Sebastian,

Message replied to:

Date: Tue, 29 Jan 2013 22:32:53 +0100
From: Sebastian Andrzej Siewior 
To: jlma...@gmail.com
Cc: linux-kernel@vger.kernel.org
Subject: Re: Kernel Failure - 3.4.24


> On 01/28/2013 08:57 PM, John L. Males wrote:
> > I was not suggesting you are responsible for the bug at
> > all. On
> Okay then :)
> 
> > I have no custom patches to the kernel.
> okay. 
> 
> > I looked at the RedHat bug 468794.  The bug seems to
> > indicate it was never fixed.  The bug was reported against 
> > 2.6.27.4-47.rc3.fc10.i686 #1 on 2008-10-27 21:34:04 EDT and
> > was closed 2009-12-18 01:40:33 EST.  The differences are a
> > bug of at least 5 years ago and a 2.6 kernel verses 5 years
> > later and current at time stable kernel 3.4.24 from
> > kernel.org with no patches I applied when this kernel
> > failure I encountered occurred.  If this is the same bug
> > then there is a bug that may have been about for a while or
> > perhaps a regression.  The fact is the RedHat bug 468794
> > was never fixed.
> 
> Lets see. According to the backtrace it seems that the kernel
> was not able to write the buffer back to disk. The RH bug
> says that someone unplugged the device without an unmount of
> the disk.

Yes I read that someone unplugged the device without an unmount
of the device in the RedHat log.

> 
> My question are:
> - what were you doing by the time this happened?

I plugged the USB device into my laptop, then removed it.  There
was no user activity related to activity on the device.  If
there was activity, as opposed to user based activity, to
warrant the kernel needing to write a buffer to the USB flash
drive it was not a result of any user activity to the USB
drive.  Based on your findings in the back trace the kernel was
not able to write a buffer to the USB device from what happened
at the time.  I would be concerned that the kernel thought
there was a buffer to write when the user, which was me,
performed no activity upon the USB device.  The person who owns
the USB device knows next to nothing about computers, let alone
Windows or Linux, so I would be the only one performing any
actions related to the device.


> - can you reproduce it (reliably)?

No, I did try exactly what I did when the kernel failure
happened and sadly could not recreate the issue.  I know how
important that question is and tried a few times to cause the
problem.  I was hoping the kernel failure information would
have information indicating the cause of the failure.  I often
placed my system in hibernate such that my system will go a
month or bit more before I will reboot my kernel or to boot a
newly compiled kernel.  I know for a while in the early 3.2.x
releases doing so caused the kernel some issues and the system
would need to be rebooted or would just reboot on its own.
There are a number of variable to this.  I do not know if this
USB failure is an artifact of that often necessary practive I
have to place my system in hibernate almost daily, sometimes a
few times during the day.

As an FYI I had a full kernel opps a few 3.2.x versions ago.
It was my first one in years.  I was hoping there would be a
file of the information that displayed on the screen.  My
research after the kernel opps suggests one has to write down
the information on the screen from a kernel opps, which I did
not do as I did not think I would need to anymore.  The reason I
mention this is that kernel opps was with a USB device as
well.  The difference was it was a USB Wireless BGN device that
I have used many times over the last 12 months with a number
of 3.2.x kernels with no kernel opps/failure, just odd
functional issues that seem to resolve in later kernel
versions. The kernel opps that occurred with this Wireless BGN
device only occurred once with that exact older 3.2.x kernel
version and I have no clue why.  I have no information I know
of about that kernel opps that might help with this kernel
failure.  I did not know I needed to write down the screen from
the opps.  I therefore cannot provide the kernel opps
information that might share some common findings with the
kernel failure of this issue.  I suspect there may be nothing
in common, but without the kernel opps information we will not
know for certain. 

The USB device was a MP3 player that acts like a flash USB
drive when it is plugged into a computer.  This means one can
copy to/from, rename, delete files using the command line or
any file manager one uses.

> - Is this *new* meaning is there a kernel where did not
> happen?

I am not sure where the "new" reference you are referring to is
from.  That said, the only time this person's MP3 player/USB
flash was used was with the kernel.org 3.2.24 kernel I noted.

The only other USB problem I had was once with a USB Wireless
BGN d

Re[04]: Kernel Failure - 3.4.24

2013-01-30 Thread John L. Males
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Sebastian,

Message replied to:

Date: Tue, 29 Jan 2013 22:32:53 +0100
From: Sebastian Andrzej Siewior bige...@linutronix.de
To: jlma...@gmail.com
Cc: linux-kernel@vger.kernel.org
Subject: Re: Kernel Failure - 3.4.24


 On 01/28/2013 08:57 PM, John L. Males wrote:
  I was not suggesting you are responsible for the bug at
  all. On
 Okay then :)
 
  I have no custom patches to the kernel.
 okay. 
 
  I looked at the RedHat bug 468794.  The bug seems to
  indicate it was never fixed.  The bug was reported against 
  2.6.27.4-47.rc3.fc10.i686 #1 on 2008-10-27 21:34:04 EDT and
  was closed 2009-12-18 01:40:33 EST.  The differences are a
  bug of at least 5 years ago and a 2.6 kernel verses 5 years
  later and current at time stable kernel 3.4.24 from
  kernel.org with no patches I applied when this kernel
  failure I encountered occurred.  If this is the same bug
  then there is a bug that may have been about for a while or
  perhaps a regression.  The fact is the RedHat bug 468794
  was never fixed.
 
 Lets see. According to the backtrace it seems that the kernel
 was not able to write the buffer back to disk. The RH bug
 says that someone unplugged the device without an unmount of
 the disk.

Yes I read that someone unplugged the device without an unmount
of the device in the RedHat log.

 
 My question are:
 - what were you doing by the time this happened?

I plugged the USB device into my laptop, then removed it.  There
was no user activity related to activity on the device.  If
there was activity, as opposed to user based activity, to
warrant the kernel needing to write a buffer to the USB flash
drive it was not a result of any user activity to the USB
drive.  Based on your findings in the back trace the kernel was
not able to write a buffer to the USB device from what happened
at the time.  I would be concerned that the kernel thought
there was a buffer to write when the user, which was me,
performed no activity upon the USB device.  The person who owns
the USB device knows next to nothing about computers, let alone
Windows or Linux, so I would be the only one performing any
actions related to the device.


 - can you reproduce it (reliably)?

No, I did try exactly what I did when the kernel failure
happened and sadly could not recreate the issue.  I know how
important that question is and tried a few times to cause the
problem.  I was hoping the kernel failure information would
have information indicating the cause of the failure.  I often
placed my system in hibernate such that my system will go a
month or bit more before I will reboot my kernel or to boot a
newly compiled kernel.  I know for a while in the early 3.2.x
releases doing so caused the kernel some issues and the system
would need to be rebooted or would just reboot on its own.
There are a number of variable to this.  I do not know if this
USB failure is an artifact of that often necessary practive I
have to place my system in hibernate almost daily, sometimes a
few times during the day.

As an FYI I had a full kernel opps a few 3.2.x versions ago.
It was my first one in years.  I was hoping there would be a
file of the information that displayed on the screen.  My
research after the kernel opps suggests one has to write down
the information on the screen from a kernel opps, which I did
not do as I did not think I would need to anymore.  The reason I
mention this is that kernel opps was with a USB device as
well.  The difference was it was a USB Wireless BGN device that
I have used many times over the last 12 months with a number
of 3.2.x kernels with no kernel opps/failure, just odd
functional issues that seem to resolve in later kernel
versions. The kernel opps that occurred with this Wireless BGN
device only occurred once with that exact older 3.2.x kernel
version and I have no clue why.  I have no information I know
of about that kernel opps that might help with this kernel
failure.  I did not know I needed to write down the screen from
the opps.  I therefore cannot provide the kernel opps
information that might share some common findings with the
kernel failure of this issue.  I suspect there may be nothing
in common, but without the kernel opps information we will not
know for certain. 

The USB device was a MP3 player that acts like a flash USB
drive when it is plugged into a computer.  This means one can
copy to/from, rename, delete files using the command line or
any file manager one uses.

 - Is this *new* meaning is there a kernel where did not
 happen?

I am not sure where the new reference you are referring to is
from.  That said, the only time this person's MP3 player/USB
flash was used was with the kernel.org 3.2.24 kernel I noted.

The only other USB problem I had was once with a USB Wireless
BGN device that has see alot of activity on my system and had
one opps on a 3.2.x kernel prior to 3.2.24 and again only once
on that kernel version. 

 
 Sebastian

I know you know

Re[02]: Kernel Failure - 3.4.24

2013-01-28 Thread John L. Males
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Sebastian,


Message replied to:

Date: Sun, 27 Jan 2013 17:14:14 +0100
From: Sebastian Andrzej Siewior 
To: jlma...@gmail.com
Cc: linux-kernel@vger.kernel.org
Subject: Re: Kernel Failure - 3.4.24


> On 01/17/2013 12:42 AM, John L. Males wrote:
> > Hello,
> 
> Hi,
> 
> > I copied Sebastian in on the post my review of Changelogs
> > suggests Sebastian is the one who will want to know about
> > this kernel failure or will know who should be.
> 
> I did what? I reviewed patches since they went in this
> problem occurs?

I was not suggesting you are responsible for the bug at all. On
various links I looked at for reporting kernel bugs it is
suggested it is best to copy in a kernel person in when posting
to the LKML otherwise the posting will likely not receive a
response.  When I looked through recent ChangeLogs your name was
often related to USB issues. As I believed this kernel failure
was a USB related I copied you in. I was expecting upon your
review of the kernel failure you determined that you knew who
best to look at the kernel failure you could then direct
the issue to that person.  Doing so I would have been be very
grateful.  I am not a kernel developer and I therefore do not
know how to read these failures, let alone know what code
failed.  I also therefore do not know who this kernel failure
should be directed to.  I was hoping your higher Linux kernel
and LKML knowledge would assist me in ensuring this kernel
failure was directed to the correct kernel developer. 

> 
> > I am not on the LKML so it would be appreciated if I was CC
> > on any related replies to this kernel failure which
> > appeared to occur when a USB MP3 player was inserted or
> > removed from the HP NC6400 laptop.
> 
> Next time you post logs try to format them like this (or
> people might ignore it because it is too hard to read):

Thank you for the formatting tip.

> 
> > The kernel failure trace information was:
> > 
> > Kernel failure message 1: [327619.690505] [ cut
> > here ] [327619.690518] WARNING: at
> > fs/buffer.c:1106 mark_buffer_dirty +0x8b/0xb0()
> > [327619.690523]
> Hardware name: HP Compaq nc6400 (RM100AW#ABA)
> [327619.690526] Modules linked in: nls_utf8
> > nls_cp437 vfat fat usb_storage option usb_wwan usbserial 
> > snd_hrtimer ip6table_filter ip6_tables iptable_filter
> > ip_tables kvm_intel kvm ebtable_nat ebtables x_tables
> > cpufreq_userspace cpufreq_stats cpufreq_powersave
> > cpufreq_conservative bridge stp bnep rfcomm bluetooth crc16
> > ppdev lp binfmt_misc i915 drm_kms_helper drm i2c_algo_bit
> > i2c_core uinput fuse loop snd_hda_codec_si3054
> > snd_hda_codec_analog snd_hda_intel snd_hda_codec
> > tpm_infineon arc4 snd_hwdep snd_pcm_oss snd_mixer_oss
> > snd_pcm iwl3945 snd_seq_dummy iwlegacy snd_seq_oss usbhid
> > snd_seq_midi snd_rawmidi coretemp hid snd_seq_midi_event
> > snd_seq mac80211 pcmcia snd_timer irda snd_seq_device
> > microcode snd cfg80211 yenta_socket psmouse joydev
> > parport_pc tifm_7xx1 tpm_tis soundcore pcmcia_rsrc
> > tifm_core pcspkr evdev snd_page_alloc pcmcia_core parport
> > tpm crc_ccitt hp_wmi hp_accel lis3lv02d sparse_keymap
> > serio_raw acpi_cpufreq tpm_bios rng_core rfkill mperf
> > input_polldev wmi battery ac container power_supply
> > processor video button ext2 mbcache dm_mod btrfs
> > zlib_deflate crc32c libcrc32c sg sr_mod cdrom sd_mod
> > crc_t10dif ata_generic pata_acpi uhci_hcd ata_piix libata
> > ehci_hcd sdhci_pci scsi_mod ide_pci_generic tg3 ide_core
> > sdhci mmc_core libphy usbcore thermal usb_common fan
> > thermal_sys
> 
> [last unloaded: scsi_wait_scan]
> 
> [327619.690737] Pid: 31574, comm: sync Tainted: G  W
> 3.4.24-kernel.org-jlm-010-amd64 #1
> 
> This line looks like you have custom patches on your tree.

I have no custom patches to the kernel.

For the last several kernels over about 12 months I download the
kernel source directly from kernel.org.  The first time I
downloaded a kernel from kernel.org I used the kernel
configuration GUI and configured the kernel from scratch.
Thereafter as was the case with this kernel I use "make
oldconfig" using the last kernel configuration I used with the
new items added since the last kernel I compiled that "make
oldconfig" identifies.

> 
> [327619.690741] Call Trace:
> [327619.690751] [] warn_slowpath_common
> +0x7f/0xc0 [327619.690757] []
> warn_slowpath_null+0x1a/0x20 [327619.690762]
> [] mark_buffer_dirty+0x8b/0xb0
> [327619.690774] [] ext2_sync_super
> +0x94/0x100 [ext2] [327619.690784] []
> ext2_sync_fs+0x69/0x80 [ext2] [327619.690790]
> [] ? __sync_filesystem+0x90/0x90
> [327619.690795] [

Re[02]: Kernel Failure - 3.4.24

2013-01-28 Thread John L. Males
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Sebastian,


Message replied to:

Date: Sun, 27 Jan 2013 17:14:14 +0100
From: Sebastian Andrzej Siewior bige...@linutronix.de
To: jlma...@gmail.com
Cc: linux-kernel@vger.kernel.org
Subject: Re: Kernel Failure - 3.4.24


 On 01/17/2013 12:42 AM, John L. Males wrote:
  Hello,
 
 Hi,
 
  I copied Sebastian in on the post my review of Changelogs
  suggests Sebastian is the one who will want to know about
  this kernel failure or will know who should be.
 
 I did what? I reviewed patches since they went in this
 problem occurs?

I was not suggesting you are responsible for the bug at all. On
various links I looked at for reporting kernel bugs it is
suggested it is best to copy in a kernel person in when posting
to the LKML otherwise the posting will likely not receive a
response.  When I looked through recent ChangeLogs your name was
often related to USB issues. As I believed this kernel failure
was a USB related I copied you in. I was expecting upon your
review of the kernel failure you determined that you knew who
best to look at the kernel failure you could then direct
the issue to that person.  Doing so I would have been be very
grateful.  I am not a kernel developer and I therefore do not
know how to read these failures, let alone know what code
failed.  I also therefore do not know who this kernel failure
should be directed to.  I was hoping your higher Linux kernel
and LKML knowledge would assist me in ensuring this kernel
failure was directed to the correct kernel developer. 

 
  I am not on the LKML so it would be appreciated if I was CC
  on any related replies to this kernel failure which
  appeared to occur when a USB MP3 player was inserted or
  removed from the HP NC6400 laptop.
 
 Next time you post logs try to format them like this (or
 people might ignore it because it is too hard to read):

Thank you for the formatting tip.

 
  The kernel failure trace information was:
  
  Kernel failure message 1: [327619.690505] [ cut
  here ] [327619.690518] WARNING: at
  fs/buffer.c:1106 mark_buffer_dirty +0x8b/0xb0()
  [327619.690523]
 Hardware name: HP Compaq nc6400 (RM100AW#ABA)
 [327619.690526] Modules linked in: nls_utf8
  nls_cp437 vfat fat usb_storage option usb_wwan usbserial 
  snd_hrtimer ip6table_filter ip6_tables iptable_filter
  ip_tables kvm_intel kvm ebtable_nat ebtables x_tables
  cpufreq_userspace cpufreq_stats cpufreq_powersave
  cpufreq_conservative bridge stp bnep rfcomm bluetooth crc16
  ppdev lp binfmt_misc i915 drm_kms_helper drm i2c_algo_bit
  i2c_core uinput fuse loop snd_hda_codec_si3054
  snd_hda_codec_analog snd_hda_intel snd_hda_codec
  tpm_infineon arc4 snd_hwdep snd_pcm_oss snd_mixer_oss
  snd_pcm iwl3945 snd_seq_dummy iwlegacy snd_seq_oss usbhid
  snd_seq_midi snd_rawmidi coretemp hid snd_seq_midi_event
  snd_seq mac80211 pcmcia snd_timer irda snd_seq_device
  microcode snd cfg80211 yenta_socket psmouse joydev
  parport_pc tifm_7xx1 tpm_tis soundcore pcmcia_rsrc
  tifm_core pcspkr evdev snd_page_alloc pcmcia_core parport
  tpm crc_ccitt hp_wmi hp_accel lis3lv02d sparse_keymap
  serio_raw acpi_cpufreq tpm_bios rng_core rfkill mperf
  input_polldev wmi battery ac container power_supply
  processor video button ext2 mbcache dm_mod btrfs
  zlib_deflate crc32c libcrc32c sg sr_mod cdrom sd_mod
  crc_t10dif ata_generic pata_acpi uhci_hcd ata_piix libata
  ehci_hcd sdhci_pci scsi_mod ide_pci_generic tg3 ide_core
  sdhci mmc_core libphy usbcore thermal usb_common fan
  thermal_sys
 
 [last unloaded: scsi_wait_scan]
 
 [327619.690737] Pid: 31574, comm: sync Tainted: G  W
 3.4.24-kernel.org-jlm-010-amd64 #1
 
 This line looks like you have custom patches on your tree.

I have no custom patches to the kernel.

For the last several kernels over about 12 months I download the
kernel source directly from kernel.org.  The first time I
downloaded a kernel from kernel.org I used the kernel
configuration GUI and configured the kernel from scratch.
Thereafter as was the case with this kernel I use make
oldconfig using the last kernel configuration I used with the
new items added since the last kernel I compiled that make
oldconfig identifies.

 
 [327619.690741] Call Trace:
 [327619.690751] [8105177f] warn_slowpath_common
 +0x7f/0xc0 [327619.690757] [810517da]
 warn_slowpath_null+0x1a/0x20 [327619.690762]
 [811b67fb] mark_buffer_dirty+0x8b/0xb0
 [327619.690774] [a028b734] ext2_sync_super
 +0x94/0x100 [ext2] [327619.690784] [a028b809]
 ext2_sync_fs+0x69/0x80 [ext2] [327619.690790]
 [811b4480] ? __sync_filesystem+0x90/0x90
 [327619.690795] [811b4453] __sync_filesystem
 +0x63/0x90 [327619.690801] [811b449f] sync_one_sb
 +0x1f/0x30 [327619.690807] [81188c77] iterate_supers
 +0xb7/0xf0 [327619.690812] [811b44fa] sys_sync
 +0x4a/0x70 [327619.690819] [814c51a9]
 system_call_fastpath+0x16/0x1b [327619.690942] ---[ end trace
 7e4761e5ee97ad0c

Kernel Failure - 3.4.24

2013-01-16 Thread John L. Males
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello,

I copied Sebastian in on the post my review of Changelogs
suggests Sebastian is the one who will want to know about this
kernel failure or will know who should be.

I am not on the LKML so it would be appreciated if I was CC on
any related replies to this kernel failure which appeared to
occur when a USB MP3 player was inserted or removed from the HP
NC6400 laptop.

The kernel failure trace information was:

Kernel failure message 1:
[327619.690505] [ cut here ]
[327619.690518] WARNING: at fs/buffer.c:1106 mark_buffer_dirty
+0x8b/0xb0() [327619.690523] Hardware name: HP Compaq nc6400
(RM100AW#ABA) [327619.690526] Modules linked in: nls_utf8
nls_cp437 vfat fat usb_storage option usb_wwan usbserial
snd_hrtimer ip6table_filter ip6_tables iptable_filter ip_tables
kvm_intel kvm ebtable_nat ebtables x_tables cpufreq_userspace
cpufreq_stats cpufreq_powersave cpufreq_conservative bridge stp
bnep rfcomm bluetooth crc16 ppdev lp binfmt_misc i915
drm_kms_helper drm i2c_algo_bit i2c_core uinput fuse loop
snd_hda_codec_si3054 snd_hda_codec_analog snd_hda_intel
snd_hda_codec tpm_infineon arc4 snd_hwdep snd_pcm_oss
snd_mixer_oss snd_pcm iwl3945 snd_seq_dummy iwlegacy
snd_seq_oss usbhid snd_seq_midi snd_rawmidi coretemp hid
snd_seq_midi_event snd_seq mac80211 pcmcia snd_timer irda
snd_seq_device microcode snd cfg80211 yenta_socket psmouse
joydev parport_pc tifm_7xx1 tpm_tis soundcore pcmcia_rsrc
tifm_core pcspkr evdev snd_page_alloc pcmcia_core parport tpm
crc_ccitt hp_wmi hp_accel lis3lv02d sparse_keymap serio_raw
acpi_cpufreq tpm_bios rng_core rfkill mperf input_polldev wmi
battery ac container power_supply processor video button ext2
mbcache dm_mod btrfs zlib_deflate crc32c libcrc32c sg sr_mod
cdrom sd_mod crc_t10dif ata_generic pata_acpi uhci_hcd ata_piix
libata ehci_hcd sdhci_pci scsi_mod ide_pci_generic tg3 ide_core
sdhci mmc_core libphy usbcore thermal usb_common fan
thermal_sys [last unloaded: scsi_wait_scan] [327619.690737]
Pid: 31574, comm: sync Tainted: GW
3.4.24-kernel.org-jlm-010-amd64 #1 [327619.690741] Call Trace:
[327619.690751]  [] warn_slowpath_common
+0x7f/0xc0 [327619.690757]  []
warn_slowpath_null+0x1a/0x20 [327619.690762]
[] mark_buffer_dirty+0x8b/0xb0
[327619.690774]  [] ext2_sync_super
+0x94/0x100 [ext2] [327619.690784]  []
ext2_sync_fs+0x69/0x80 [ext2] [327619.690790]
[] ? __sync_filesystem+0x90/0x90
[327619.690795]  [] __sync_filesystem
+0x63/0x90 [327619.690801]  [] sync_one_sb
+0x1f/0x30 [327619.690807]  [] iterate_supers
+0xb7/0xf0 [327619.690812]  [] sys_sync
+0x4a/0x70 [327619.690819]  []
system_call_fastpath+0x16/0x1b [327619.690942] ---[ end trace
7e4761e5ee97ad0c ]---


If you need additional information please advise.


Regards,

John L. Males
Toronto, Ontario
Canada
16 January 2013 18:42


==
2013-01-16 18:19:55.448036096-0500-EST

16 Jan 18:19:55 ntpdate[17025]: ntpdate 4.2.6p2@1.2194-o Sun
Oct 17 13:35:14 UTC 2010 (1)

16 Jan 18:20:09 ntpdate[17030]: step time server 192.75.12.10
offset -3.181109 sec

Linux 3.4.24-kernel.org-jlm-010-amd64 #1 SMP PREEMPT Sun Dec 23
10:06:41 EST 2012

Modified Debian GNU/Linux 6.0.3 (squeeze)
(Evaluating alternatives to Debian)

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAlD3Ot0ACgkQ+V/XUtB6aBBNzwCdFcVnM+qtUoIpArxVlr8fAN/E
Se0Anj/z1hTzMmktfTHMeuDQHNj6GMh1
=auyh
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Kernel Failure - 3.4.24

2013-01-16 Thread John L. Males
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello,

I copied Sebastian in on the post my review of Changelogs
suggests Sebastian is the one who will want to know about this
kernel failure or will know who should be.

I am not on the LKML so it would be appreciated if I was CC on
any related replies to this kernel failure which appeared to
occur when a USB MP3 player was inserted or removed from the HP
NC6400 laptop.

The kernel failure trace information was:

Kernel failure message 1:
[327619.690505] [ cut here ]
[327619.690518] WARNING: at fs/buffer.c:1106 mark_buffer_dirty
+0x8b/0xb0() [327619.690523] Hardware name: HP Compaq nc6400
(RM100AW#ABA) [327619.690526] Modules linked in: nls_utf8
nls_cp437 vfat fat usb_storage option usb_wwan usbserial
snd_hrtimer ip6table_filter ip6_tables iptable_filter ip_tables
kvm_intel kvm ebtable_nat ebtables x_tables cpufreq_userspace
cpufreq_stats cpufreq_powersave cpufreq_conservative bridge stp
bnep rfcomm bluetooth crc16 ppdev lp binfmt_misc i915
drm_kms_helper drm i2c_algo_bit i2c_core uinput fuse loop
snd_hda_codec_si3054 snd_hda_codec_analog snd_hda_intel
snd_hda_codec tpm_infineon arc4 snd_hwdep snd_pcm_oss
snd_mixer_oss snd_pcm iwl3945 snd_seq_dummy iwlegacy
snd_seq_oss usbhid snd_seq_midi snd_rawmidi coretemp hid
snd_seq_midi_event snd_seq mac80211 pcmcia snd_timer irda
snd_seq_device microcode snd cfg80211 yenta_socket psmouse
joydev parport_pc tifm_7xx1 tpm_tis soundcore pcmcia_rsrc
tifm_core pcspkr evdev snd_page_alloc pcmcia_core parport tpm
crc_ccitt hp_wmi hp_accel lis3lv02d sparse_keymap serio_raw
acpi_cpufreq tpm_bios rng_core rfkill mperf input_polldev wmi
battery ac container power_supply processor video button ext2
mbcache dm_mod btrfs zlib_deflate crc32c libcrc32c sg sr_mod
cdrom sd_mod crc_t10dif ata_generic pata_acpi uhci_hcd ata_piix
libata ehci_hcd sdhci_pci scsi_mod ide_pci_generic tg3 ide_core
sdhci mmc_core libphy usbcore thermal usb_common fan
thermal_sys [last unloaded: scsi_wait_scan] [327619.690737]
Pid: 31574, comm: sync Tainted: GW
3.4.24-kernel.org-jlm-010-amd64 #1 [327619.690741] Call Trace:
[327619.690751]  [8105177f] warn_slowpath_common
+0x7f/0xc0 [327619.690757]  [810517da]
warn_slowpath_null+0x1a/0x20 [327619.690762]
[811b67fb] mark_buffer_dirty+0x8b/0xb0
[327619.690774]  [a028b734] ext2_sync_super
+0x94/0x100 [ext2] [327619.690784]  [a028b809]
ext2_sync_fs+0x69/0x80 [ext2] [327619.690790]
[811b4480] ? __sync_filesystem+0x90/0x90
[327619.690795]  [811b4453] __sync_filesystem
+0x63/0x90 [327619.690801]  [811b449f] sync_one_sb
+0x1f/0x30 [327619.690807]  [81188c77] iterate_supers
+0xb7/0xf0 [327619.690812]  [811b44fa] sys_sync
+0x4a/0x70 [327619.690819]  [814c51a9]
system_call_fastpath+0x16/0x1b [327619.690942] ---[ end trace
7e4761e5ee97ad0c ]---


If you need additional information please advise.


Regards,

John L. Males
Toronto, Ontario
Canada
16 January 2013 18:42


==
2013-01-16 18:19:55.448036096-0500-EST

16 Jan 18:19:55 ntpdate[17025]: ntpdate 4.2.6p2@1.2194-o Sun
Oct 17 13:35:14 UTC 2010 (1)

16 Jan 18:20:09 ntpdate[17030]: step time server 192.75.12.10
offset -3.181109 sec

Linux 3.4.24-kernel.org-jlm-010-amd64 #1 SMP PREEMPT Sun Dec 23
10:06:41 EST 2012

Modified Debian GNU/Linux 6.0.3 (squeeze)
(Evaluating alternatives to Debian)

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAlD3Ot0ACgkQ+V/XUtB6aBBNzwCdFcVnM+qtUoIpArxVlr8fAN/E
Se0Anj/z1hTzMmktfTHMeuDQHNj6GMh1
=auyh
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problems with SCSI tape rewind / verify on 2.4.29

2005-03-02 Thread John L. Males
Andrew/Kai,

> List:   linux-kernel
> Subject:Re: Problems with SCSI tape rewind / verify on 2.4.29
> From:   Andrew Morton 
> Date:   2005-03-02 22:17:11
> Message-ID: <20050302141711.00ec7147.akpm () osdl ! org>
> [Download message RAW]
> 
> Kai Makisara <[EMAIL PROTECTED]> wrote:
> >
> > f seek with tape is changed back to returning success, this would
> > enable correct tar --verify at the beginning of the tape. However,
> > I am not sure what happens if we are not at the beginning. I will
> > investigate this and suggest a long term fix to the tar people (a
> > fix that should be compatible with all Unix tape semantics I know)
> > and also suggest possible fixes to st (this may include automatic
> > writing of a filemark when BSF is used after writes).

Kai,

I have a second problem that is perhaps another case of kernel and tar
combined effect problem.  I have not had time to test with the 2.6.7
and 2.6.9 knoppix based kernels to see if same problem as >= 2.4.26
has.  Can you hold out about 3-4 days for me to do the test and report
the issue to Marcelo to screen first, Kai?  I have feeling what I
experienced in testing change to st.c Marcelo suggested that caused me
to try 2.4.26 again and fail on this new issue has some bearing on
tape positioning you want to check out.

> 
> Yes, please let's get a tar fix in the pipeline.
> 
> GNU tar must run on a lot of operating systems.  It's odd.
> 
> > If you think want to make st return success for seeks even if
> > nothing happens (as it did earlier), I don't have anything against
> > that. It would 

I think it is important if an error is enccountered a non-successful
return (code) is returned.  If an action is required that requires no
action as it is at the place/state/position being requested it is
reasonable to return a successful return (code). 

> > solve the practical problem several people have reported recently.
> > (My recommendation for the people seeing this problem is to do
> > verification separately with 'tar -d'.)

As aside, I have tried the tar -d option as well and it worked, but
was my understanding the --verify does a data readback compare of the
files in the tar, whereas the tar -d option only compares if files
names in tar to directory?  That to me means a big difference in
confidance the tar backup is ok, as I look to have readback verify to
increase confidance of backup success.

> 
> Yes, I think we need to grit our teeth and do this.  I'll stick a
> comment in there.


Regards,

John L. Males
Willowdale, Ontario
Canada
02 March 2005 (17:45 -) 18:51

==


"Bmer ... Boom Boom, how are you Boom Boom"
"Meow, meoaa" as Boomer loudly announces
 intent Boomer is coming for attention
Loved to kneed arm and lick arm with Boomers very large
 tongue
Able to catch, or at least hit, almost any object in flight
 withing reach of front paws
Boomer 1985 (Born), Adopted 04 September 1991
04 September 1991 - 08 February 2000 18:50

"How are you Mr. Sylvester?"
"... Grunt Grunt" ... quick licks of nose
Rolls over for pet and stomac rub when Dad arrives home
 and grunting
Runs back and forth from study, tilts head as glowing green
 eyes stare for "attention please", grunts and meows,
 repeats run, tilt head and stare few times for good
 measure, grunts and meows
Lays on floor just outside study to guard Dad
Loved to groom Miss Mahogany, and let Mahogany cuddle beside
Sylvester 1989 (estimated Born)
Found in building mail area noon hour 09 Feburary 1992
09 February 1992 - 19 January 2003 23:25

"Hello Miss Chicago 'White Sox', how are you 'Chico'?"
"Grunt" (thank you) ... as put out food for Chicago
"ME" So loud the world stops
A very determined Miss "White Sox"
AKA "Chico" ... Cheryl Crawford used as nickname
Loved to chase kibble slid down hall floor,
 bat about and then eat
Loved to hook paw in dish to toss out a single kibble
 at time, dart at as moved, then eat ... "Crunches"
Chicago "White Sox", "Chico" August 1989 (born),
 adopted 04 February 1991
05 October 2004 06:52 Quite "Grunts" 
  as lay Chicago on bed for last time
04 February 1991 - 05 October 2004 07:32



pgp7yXKaQLjXX.pgp
Description: PGP signature


Re: Problems with SCSI tape rewind / verify on 2.4.29

2005-03-02 Thread John L. Males
Marcelo,

My couple cents worth:

> On Wed, Mar 02, 2005 at 11:17:19PM +0200, Kai Makisara wrote:
> > On Wed, 2 Mar 2005, Marcelo Tosatti wrote:
> > 
> > > On Wed, Mar 02, 2005 at 11:15:42AM -, Mark Yeatman wrote:
> > > > Hi
> > > > 
> > > > Never had to log a bug before, hope this is correctly done.
> > > > 
> > > > Thanks
> > > > 
> > > > Mark
> > > > 
> > > > Detail
> > > > 
> > > > [1.] One line summary of the problem:
> > > > SCSI tape drive is refusing to rewind after backup to allow
> > > > verify and causing illegal seek error
> > > > 
> > > > [2.] Full description of the problem/report:
> > > > On backup the tape drive is reporting the following error and
> > > > failing it's backups.
> > > > 
> > > > tar: /dev/st0: Warning: Cannot seek: Illegal seek
> > > > 
> > > > I have traced this back to failing at an upgrade of the kernel
> > > > to 2.4.29 on Feb 8th. The backups have not worked since.
> > > > Replacement Drives have been tried and cables to no avail. I
> > > > noticed in the the changelog that a patch by Solar Designer to
> > > > the Scsi tape return code had been made. 
> > 
> > BTW, this "fix" by Solar Designer introduces a bug to 2.4.29: a
> > tape driver is supposed to return ENOMEM in the case that was
> > changed to return EIO ;-(
> 
> Reverted.
> 
> > > v2.6 also contains the same problem BTW.
> > > 
> > > Try this:
> > > 
> > > --- a/drivers/scsi/st.c.orig  2005-03-02 09:02:13.637158144 -0300
> > > +++ b/drivers/scsi/st.c   2005-03-02 09:02:20.208159200 -0300
> > > @@ -3778,7 +3778,6 @@
> > >   read:   st_read,
> > >   write:  st_write,
> > >   ioctl:  st_ioctl,
> > > - llseek: no_llseek,
> > >   open:   st_open,
> > >   flush:  st_flush,
> > >   release:st_release,
> > 
> > This change covers up the problem. The real bug is in tar. The
> > following code is from tar is supposed to reposition the tape to
> > the beginning of the file jus written:
> > 
> > #ifdef MTIOCTOP
> >   {
> > struct mtop operation;
> > int status;
> > 
> > operation.mt_op = MTBSF;
> > operation.mt_count = 1;
> > if (status = rmtioctl (archive, MTIOCTOP, (char *)
> > ), status 
> > < 0)
> >   {
> > if (errno != EIO
> > || (status = rmtioctl (archive, MTIOCTOP, (char *) 
> > ),
> > status < 0))
> >   {
> > #endif
> > if (rmtlseek (archive, (off_t) 0, SEEK_SET) != 0)
> >   {
> > /* Lseek failed.  Try a different method.  */
> > seek_warn (archive_name_array[0]);
> > return;
> >   }
> > #ifdef MTIOCTOP
> >   }
> >   }
> >   }
> > #endif
> > 
> > 
> > Here is output from strace showing what happens with 'tar -c -W'
> > applied at the beginning of the tape (this is using kernel
> > 2.6.11-rc4 but the same probably happens with 2.4.29):
> > ...
> > ioctl(3, MGSL_IOCGPARAMS or MTIOCTOP or SNDCTL_MIDI_MPUMODE, 
> > 0x7fffecd0) = -1 EIO (Input/output error)
> > ioctl(3, MGSL_IOCGPARAMS or MTIOCTOP or SNDCTL_MIDI_MPUMODE, 
> > 0x7fffecd0) = -1 EIO (Input/output error)
> > lseek(3, 0, SEEK_SET)   = -1 ESPIPE (Illegal seek)
> > 
> > So, both tape positioning commands fail and the code falls back to
> > lseek. Earlier it has returned success even though it has not done
> > anything (this was on purpose because it is the way some other
> > Unices behave and with reason). In that case this tar succeeded
> > but it was pure luck. The first BSF did position the tape
> > correctly although it did fail.
> > 
> > The 2.6 st driver does contain this near the beginning of
> > st_open():
> > 
> > nonseekable_open(inode, filp);
> > 
> > This probably makes lseek fail. This code has been in st.c since
> > 2.6.8.
> 
> Thanks for the cluebat Kai, is this problem fixed in newer versions
> of tar? 

My testing last week or so has been with the latest tar, tar-1.15.1-2,
tar 1.14 and 1.13 had same lseek --verify issues.

> 
> I suspect v2.4 should work with older versions of tar, so we should
> keep "lseek&quo

Re: Problems with SCSI tape rewind / verify on 2.4.29

2005-03-02 Thread John L. Males
Sorry gents,

Let me correct this one more time.


Regards,

John L. Males
Willowdale, Ontario
Canada
02 March 2005 16:26


** Reply Seperator **

On (Wed) 2005-03-02 16:15:07 -0500 
John L. Males wrote in Message-ID:
[EMAIL PROTECTED]

To: [EMAIL PROTECTED]
From: John L. Males <[EMAIL PROTECTED]>
Subject: Re: Re[03]: Problems with SCSI tape rewind / verify on 2.4.29
Date: Wed, 2 Mar 2005 16:15:07 -0500

> Marcelo,
> 
> Sorry gents, seems the LKML used to handel the RE numbering in long
> past when I last mailed to LKML, bit not now, so resending this
> eMail to ensure goes back to orignal thread so all the eMail
> discussion is in one eMail thread.
> 
> My applogies if this caused confusion on the LKML.
> 
> 
> Regards,
> 
> John L. Males
> Willowdale, Ontario
> Canada
> 02 March 2005 (16:06 -) 16:15
> 
> 
> ** Reply Seperator **
> 
> On (Wed) 2005-03-02 15:46:26 -0500 
> John L. Males wrote in Message-ID:
> [EMAIL PROTECTED]
> 
> To: Marcelo Tosatti <[EMAIL PROTECTED]>
> From: John L. Males <[EMAIL PROTECTED]>
> Subject: Re[03]: Problems with SCSI tape rewind / verify on 2.4.29
> Date: Wed, 2 Mar 2005 15:46:26 -0500
> 
> > Hi Marcello,
> > 
> > 
> > ** Reply Seperator **
> > 
> > On (Wed) 2005-03-02 11:34:41 -0300 
> > Marcelo Tosatti wrote in Message-ID:
> > [EMAIL PROTECTED]
> > 
> > To: Gene Heskett <[EMAIL PROTECTED]>
> > From: Marcelo Tosatti <[EMAIL PROTECTED]>
> > Subject: Re: Problems with SCSI tape rewind / verify on 2.4.29
> > Date: Wed, 2 Mar 2005 11:34:41 -0300
> > 
> > > 
> > > n Wed, Mar 02, 2005 at 12:08:51PM -0500, Gene Heskett wrote:
> > > > On Wednesday 02 March 2005 07:03, Marcelo Tosatti wrote:
> > > > >On Wed, Mar 02, 2005 at 11:15:42AM -, Mark Yeatman wrote:
> > > > >> Hi
> > > > >>
> > > > >> Never had to log a bug before, hope this is correctly done.
> > > > >>
> > > > >> Thanks
> > > > >>
> > > > >> Mark
> > > > >>
> > > > >> Detail
> > > > >>
> > > > >> [1.] One line summary of the problem:
> > > > >> SCSI tape drive is refusing to rewind after backup to allow
> > > > >verify> and causing illegal seek error
> > 
> > In my experiences with this problem that I am sure is exactly the
> > same issue, the tape in fact does rewind after creating the tar to
> > then perfore the --verify option.  The illegal see error seems to
> > arise after the rewind or more correctly bsf commands may be being
> > used (but not confirmed this yet, another test I need to do in few
> > days).  For sure the tape is positionted back.  I know as I have
> > also done this test with larger directories and know and have
> > heard the tape take long time to naturally get back to the start
> > of tar file to be able to perform the --verify option.  That said
> > I am using a DLT drive. Perhpas with different drivers or tape
> > drivers this issue may have variations on theme and behaviour with
> > net result the same error message and/or root cause.
> > 
> > > > >>
> > > > >> [2.] Full description of the problem/report:
> > > > >> On backup the tape drive is reporting the following error
> > > > >and> failing it's backups.
> > > > >>
> > > > >> tar: /dev/st0: Warning: Cannot seek: Illegal seek
> > > > >>
> > > > >> I have traced this back to failing at an upgrade of the
> > > > >kernel to> 2.4.29 on Feb 8th. The backups have not worked
> > > > >since. Replacement> Drives have been tried and cables to no
> > > > >avail. I noticed in the> the changelog that a patch by Solar
> > > > >Designer to the Scsi tape> return code had been made.
> > 
> > Last kernel to work correctly in 2.4 branch was 2.4.26.  Kernel
> > versions 2.4.27, 2.4.28 and 2.4.29 all fail based on my experience
> > with DLT SCSI based tape.
> > 
> > > > >
> > > > >v2.6 also contains the same problem BTW.
> > > > >
> > > > >Try this:
> > > > >
> > > > >--- a/drivers/scsi/st.c.orig 2005-03-02 09:02:13.637158144
> > > > >-0300+++ b/drivers/scsi/st.c 2005-03-02 09:02:20.208159200
> > > > >-0300@@ -3778,7 +3778,6 @@
> > > > >  read:  st_read,
> > > 

Re: Re[03]: Problems with SCSI tape rewind / verify on 2.4.29

2005-03-02 Thread John L. Males
Marcelo,

Sorry gents, seems the LKML used to handel the RE numbering in long
past when I last mailed to LKML, bit not now, so resending this eMail
to ensure goes back to orignal thread so all the eMail discussion is
in one eMail thread.

My applogies if this caused confusion on the LKML.


Regards,

John L. Males
Willowdale, Ontario
Canada
02 March 2005 (16:06 -) 16:15


** Reply Seperator **

On (Wed) 2005-03-02 15:46:26 -0500 
John L. Males wrote in Message-ID:
[EMAIL PROTECTED]

To: Marcelo Tosatti <[EMAIL PROTECTED]>
From: John L. Males <[EMAIL PROTECTED]>
Subject: Re[03]: Problems with SCSI tape rewind / verify on 2.4.29
Date: Wed, 2 Mar 2005 15:46:26 -0500

> Hi Marcello,
> 
> 
> ** Reply Seperator **
> 
> On (Wed) 2005-03-02 11:34:41 -0300 
> Marcelo Tosatti wrote in Message-ID:
> [EMAIL PROTECTED]
> 
> To: Gene Heskett <[EMAIL PROTECTED]>
> From: Marcelo Tosatti <[EMAIL PROTECTED]>
> Subject: Re: Problems with SCSI tape rewind / verify on 2.4.29
> Date: Wed, 2 Mar 2005 11:34:41 -0300
> 
> > 
> > n Wed, Mar 02, 2005 at 12:08:51PM -0500, Gene Heskett wrote:
> > > On Wednesday 02 March 2005 07:03, Marcelo Tosatti wrote:
> > > >On Wed, Mar 02, 2005 at 11:15:42AM -, Mark Yeatman wrote:
> > > >> Hi
> > > >>
> > > >> Never had to log a bug before, hope this is correctly done.
> > > >>
> > > >> Thanks
> > > >>
> > > >> Mark
> > > >>
> > > >> Detail
> > > >>
> > > >> [1.] One line summary of the problem:
> > > >> SCSI tape drive is refusing to rewind after backup to allow
> > > >verify> and causing illegal seek error
> 
> In my experiences with this problem that I am sure is exactly the
> same issue, the tape in fact does rewind after creating the tar to
> then perfore the --verify option.  The illegal see error seems to
> arise after the rewind or more correctly bsf commands may be being
> used (but not confirmed this yet, another test I need to do in few
> days).  For sure the tape is positionted back.  I know as I have
> also done this test with larger directories and know and have heard
> the tape take long time to naturally get back to the start of tar
> file to be able to perform the --verify option.  That said I am
> using a DLT drive. Perhpas with different drivers or tape drivers
> this issue may have variations on theme and behaviour with net
> result the same error message and/or root cause.
> 
> > > >>
> > > >> [2.] Full description of the problem/report:
> > > >> On backup the tape drive is reporting the following error and
> > > >> failing it's backups.
> > > >>
> > > >> tar: /dev/st0: Warning: Cannot seek: Illegal seek
> > > >>
> > > >> I have traced this back to failing at an upgrade of the
> > > >kernel to> 2.4.29 on Feb 8th. The backups have not worked
> > > >since. Replacement> Drives have been tried and cables to no
> > > >avail. I noticed in the> the changelog that a patch by Solar
> > > >Designer to the Scsi tape> return code had been made.
> 
> Last kernel to work correctly in 2.4 branch was 2.4.26.  Kernel
> versions 2.4.27, 2.4.28 and 2.4.29 all fail based on my experience
> with DLT SCSI based tape.
> 
> > > >
> > > >v2.6 also contains the same problem BTW.
> > > >
> > > >Try this:
> > > >
> > > >--- a/drivers/scsi/st.c.orig 2005-03-02 09:02:13.637158144
> > > >-0300+++ b/drivers/scsi/st.c 2005-03-02 09:02:20.208159200
> > > >-0300@@ -3778,7 +3778,6 @@
> > > >  read:  st_read,
> > > >  write:  st_write,
> > > >  ioctl:  st_ioctl,
> > > >- llseek:  no_llseek,
> > > >  open:  st_open,
> > > >  flush:  st_flush,
> > > >  release: st_release,
> > > >-
> > > 
> > > Interesting Marcelo.  How long has this been true in 2.6?
> 
> In the 2.6 tree the tar --verify works with 2.6.7, but fails with
> 2.6.9. I am unable to test 2.6.8, but based on research of the code
> changes of 2.6.8 compared to the changes made in 2.4.27 re llseek I
> would expect 2.6.8 to fail as well with my DLT SCSI tape. 
> 
> > 
> > Actually I just checked and it seems v2.6 is not using
> > "no_llseek".
> > 
> > However John L. Males reports the same problem with v2.6 - John,
> > care to retest with v2.6.10 ?
> 
> My ability to test a 2.6.x kernel is limited to what 2.6.x kerne

Re[03]: Problems with SCSI tape rewind / verify on 2.4.29

2005-03-02 Thread John L. Males
Hi Marcello,


** Reply Seperator **

On (Wed) 2005-03-02 11:34:41 -0300 
Marcelo Tosatti wrote in Message-ID: [EMAIL PROTECTED]

To: Gene Heskett <[EMAIL PROTECTED]>
From: Marcelo Tosatti <[EMAIL PROTECTED]>
Subject: Re: Problems with SCSI tape rewind / verify on 2.4.29
Date: Wed, 2 Mar 2005 11:34:41 -0300

> 
> n Wed, Mar 02, 2005 at 12:08:51PM -0500, Gene Heskett wrote:
> > On Wednesday 02 March 2005 07:03, Marcelo Tosatti wrote:
> > >On Wed, Mar 02, 2005 at 11:15:42AM -, Mark Yeatman wrote:
> > >> Hi
> > >>
> > >> Never had to log a bug before, hope this is correctly done.
> > >>
> > >> Thanks
> > >>
> > >> Mark
> > >>
> > >> Detail
> > >>
> > >> [1.] One line summary of the problem:
> > >> SCSI tape drive is refusing to rewind after backup to allow
> > >verify> and causing illegal seek error

In my experiences with this problem that I am sure is exactly the same
issue, the tape in fact does rewind after creating the tar to then
perfore the --verify option.  The illegal see error seems to arise
after the rewind or more correctly bsf commands may be being used (but
not confirmed this yet, another test I need to do in few days).  For
sure the tape is positionted back.  I know as I have also done this
test with larger directories and know and have heard the tape take
long time to naturally get back to the start of tar file to be able to
perform the --verify option.  That said I am using a DLT drive. 
Perhpas with different drivers or tape drivers this issue may have
variations on theme and behaviour with net result the same error
message and/or root cause.

> > >>
> > >> [2.] Full description of the problem/report:
> > >> On backup the tape drive is reporting the following error and
> > >> failing it's backups.
> > >>
> > >> tar: /dev/st0: Warning: Cannot seek: Illegal seek
> > >>
> > >> I have traced this back to failing at an upgrade of the kernel
> > >to> 2.4.29 on Feb 8th. The backups have not worked since.
> > >Replacement> Drives have been tried and cables to no avail. I
> > >noticed in the> the changelog that a patch by Solar Designer to
> > >the Scsi tape> return code had been made.

Last kernel to work correctly in 2.4 branch was 2.4.26.  Kernel
versions 2.4.27, 2.4.28 and 2.4.29 all fail based on my experience
with DLT SCSI based tape.

> > >
> > >v2.6 also contains the same problem BTW.
> > >
> > >Try this:
> > >
> > >--- a/drivers/scsi/st.c.orig 2005-03-02 09:02:13.637158144 -0300
> > >+++ b/drivers/scsi/st.c 2005-03-02 09:02:20.208159200 -0300
> > >@@ -3778,7 +3778,6 @@
> > >  read:  st_read,
> > >  write:  st_write,
> > >  ioctl:  st_ioctl,
> > >- llseek:  no_llseek,
> > >  open:  st_open,
> > >  flush:  st_flush,
> > >  release: st_release,
> > >-
> > 
> > Interesting Marcelo.  How long has this been true in 2.6?

In the 2.6 tree the tar --verify works with 2.6.7, but fails with
2.6.9. I am unable to test 2.6.8, but based on research of the code
changes of 2.6.8 compared to the changes made in 2.4.27 re llseek I
would expect 2.6.8 to fail as well with my DLT SCSI tape. 

> 
> Actually I just checked and it seems v2.6 is not using "no_llseek".
> 
> However John L. Males reports the same problem with v2.6 - John,
> care to retest with v2.6.10 ?

My ability to test a 2.6.x kernel is limited to what 2.6.x kernel I
can find on a livecd.  The 2.6.7 and 2.6.9 kernel tests I conducted
were using Knoppix 3.6 and 3.7.  I do not have means at this time, nor
time, to build up a dedicated drive to test 2.6.x kernels.  If someone
knows of or can build a 2.6.10 kernel on a live CD I will be happy to
do the test.  That said, I looked at the patch for 2.6.10 and seems
alot of changes were made to st.c in 2.6.10.  I did not see, but could
of missed in looking, any lseek related change in 2.6.10.  Given how
it seems the test I ran with the change in st.c Marcell suggested what
is the expected thought on this issue with 2.6.10?  I am just asking
from curiousity.  Again, if someone can tell me of a live cd or can
easly make a live cd with the 2.6.10 kernel I can test this issue wiht
2.6.10.  Perhaps there is someone else with a DLT/SCSI tape driver
that could test this tar --verify issue on 2.6.10?

> 
> > I thought I had an amanda problem, and eventually went to virtual 
> > tapes on disk, largely because of this.  However, I have to say it
> > is working better than tapes ever did here.  Unforch, that 200GB
> > disk is certainly a single point of failu

Re[03]: Problems with SCSI tape rewind / verify on 2.4.29

2005-03-02 Thread John L. Males
Hi Marcello,


** Reply Seperator **

On (Wed) 2005-03-02 11:34:41 -0300 
Marcelo Tosatti wrote in Message-ID: [EMAIL PROTECTED]

To: Gene Heskett [EMAIL PROTECTED]
From: Marcelo Tosatti [EMAIL PROTECTED]
Subject: Re: Problems with SCSI tape rewind / verify on 2.4.29
Date: Wed, 2 Mar 2005 11:34:41 -0300

 
 n Wed, Mar 02, 2005 at 12:08:51PM -0500, Gene Heskett wrote:
  On Wednesday 02 March 2005 07:03, Marcelo Tosatti wrote:
  On Wed, Mar 02, 2005 at 11:15:42AM -, Mark Yeatman wrote:
   Hi
  
   Never had to log a bug before, hope this is correctly done.
  
   Thanks
  
   Mark
  
   Detail
  
   [1.] One line summary of the problem:
   SCSI tape drive is refusing to rewind after backup to allow
  verify and causing illegal seek error

In my experiences with this problem that I am sure is exactly the same
issue, the tape in fact does rewind after creating the tar to then
perfore the --verify option.  The illegal see error seems to arise
after the rewind or more correctly bsf commands may be being used (but
not confirmed this yet, another test I need to do in few days).  For
sure the tape is positionted back.  I know as I have also done this
test with larger directories and know and have heard the tape take
long time to naturally get back to the start of tar file to be able to
perform the --verify option.  That said I am using a DLT drive. 
Perhpas with different drivers or tape drivers this issue may have
variations on theme and behaviour with net result the same error
message and/or root cause.

  
   [2.] Full description of the problem/report:
   On backup the tape drive is reporting the following error and
   failing it's backups.
  
   tar: /dev/st0: Warning: Cannot seek: Illegal seek
  
   I have traced this back to failing at an upgrade of the kernel
  to 2.4.29 on Feb 8th. The backups have not worked since.
  Replacement Drives have been tried and cables to no avail. I
  noticed in the the changelog that a patch by Solar Designer to
  the Scsi tape return code had been made.

Last kernel to work correctly in 2.4 branch was 2.4.26.  Kernel
versions 2.4.27, 2.4.28 and 2.4.29 all fail based on my experience
with DLT SCSI based tape.

  
  v2.6 also contains the same problem BTW.
  
  Try this:
  
  --- a/drivers/scsi/st.c.orig 2005-03-02 09:02:13.637158144 -0300
  +++ b/drivers/scsi/st.c 2005-03-02 09:02:20.208159200 -0300
  @@ -3778,7 +3778,6 @@
read:  st_read,
write:  st_write,
ioctl:  st_ioctl,
  - llseek:  no_llseek,
open:  st_open,
flush:  st_flush,
release: st_release,
  -
  
  Interesting Marcelo.  How long has this been true in 2.6?

In the 2.6 tree the tar --verify works with 2.6.7, but fails with
2.6.9. I am unable to test 2.6.8, but based on research of the code
changes of 2.6.8 compared to the changes made in 2.4.27 re llseek I
would expect 2.6.8 to fail as well with my DLT SCSI tape. 

 
 Actually I just checked and it seems v2.6 is not using no_llseek.
 
 However John L. Males reports the same problem with v2.6 - John,
 care to retest with v2.6.10 ?

My ability to test a 2.6.x kernel is limited to what 2.6.x kernel I
can find on a livecd.  The 2.6.7 and 2.6.9 kernel tests I conducted
were using Knoppix 3.6 and 3.7.  I do not have means at this time, nor
time, to build up a dedicated drive to test 2.6.x kernels.  If someone
knows of or can build a 2.6.10 kernel on a live CD I will be happy to
do the test.  That said, I looked at the patch for 2.6.10 and seems
alot of changes were made to st.c in 2.6.10.  I did not see, but could
of missed in looking, any lseek related change in 2.6.10.  Given how
it seems the test I ran with the change in st.c Marcell suggested what
is the expected thought on this issue with 2.6.10?  I am just asking
from curiousity.  Again, if someone can tell me of a live cd or can
easly make a live cd with the 2.6.10 kernel I can test this issue wiht
2.6.10.  Perhaps there is someone else with a DLT/SCSI tape driver
that could test this tar --verify issue on 2.6.10?

 
  I thought I had an amanda problem, and eventually went to virtual 
  tapes on disk, largely because of this.  However, I have to say it
  is working better than tapes ever did here.  Unforch, that 200GB
  disk is certainly a single point of failure I don't relish
  thinking about...
 
 :)


Regards,

John L. Males
Willowdale, Ontario
Canada
02 March 2005 (15:00 -) 15:46


==


Bmer ... Boom Boom, how are you Boom Boom
Meow, meoaa as Boomer loudly announces
 intent Boomer is coming for attention
Loved to kneed arm and lick arm with Boomers very large
 tongue
Able to catch, or at least hit, almost any object in flight
 withing reach of front paws
Boomer 1985 (Born), Adopted 04 September 1991
04 September 1991 - 08 February 2000 18:50

How are you Mr. Sylvester?
... Grunt Grunt ... quick licks of nose
Rolls over for pet and stomac rub when Dad arrives home

Re: Re[03]: Problems with SCSI tape rewind / verify on 2.4.29

2005-03-02 Thread John L. Males
Marcelo,

Sorry gents, seems the LKML used to handel the RE numbering in long
past when I last mailed to LKML, bit not now, so resending this eMail
to ensure goes back to orignal thread so all the eMail discussion is
in one eMail thread.

My applogies if this caused confusion on the LKML.


Regards,

John L. Males
Willowdale, Ontario
Canada
02 March 2005 (16:06 -) 16:15


** Reply Seperator **

On (Wed) 2005-03-02 15:46:26 -0500 
John L. Males wrote in Message-ID:
[EMAIL PROTECTED]

To: Marcelo Tosatti [EMAIL PROTECTED]
From: John L. Males [EMAIL PROTECTED]
Subject: Re[03]: Problems with SCSI tape rewind / verify on 2.4.29
Date: Wed, 2 Mar 2005 15:46:26 -0500

 Hi Marcello,
 
 
 ** Reply Seperator **
 
 On (Wed) 2005-03-02 11:34:41 -0300 
 Marcelo Tosatti wrote in Message-ID:
 [EMAIL PROTECTED]
 
 To: Gene Heskett [EMAIL PROTECTED]
 From: Marcelo Tosatti [EMAIL PROTECTED]
 Subject: Re: Problems with SCSI tape rewind / verify on 2.4.29
 Date: Wed, 2 Mar 2005 11:34:41 -0300
 
  
  n Wed, Mar 02, 2005 at 12:08:51PM -0500, Gene Heskett wrote:
   On Wednesday 02 March 2005 07:03, Marcelo Tosatti wrote:
   On Wed, Mar 02, 2005 at 11:15:42AM -, Mark Yeatman wrote:
Hi
   
Never had to log a bug before, hope this is correctly done.
   
Thanks
   
Mark
   
Detail
   
[1.] One line summary of the problem:
SCSI tape drive is refusing to rewind after backup to allow
   verify and causing illegal seek error
 
 In my experiences with this problem that I am sure is exactly the
 same issue, the tape in fact does rewind after creating the tar to
 then perfore the --verify option.  The illegal see error seems to
 arise after the rewind or more correctly bsf commands may be being
 used (but not confirmed this yet, another test I need to do in few
 days).  For sure the tape is positionted back.  I know as I have
 also done this test with larger directories and know and have heard
 the tape take long time to naturally get back to the start of tar
 file to be able to perform the --verify option.  That said I am
 using a DLT drive. Perhpas with different drivers or tape drivers
 this issue may have variations on theme and behaviour with net
 result the same error message and/or root cause.
 
   
[2.] Full description of the problem/report:
On backup the tape drive is reporting the following error and
failing it's backups.
   
tar: /dev/st0: Warning: Cannot seek: Illegal seek
   
I have traced this back to failing at an upgrade of the
   kernel to 2.4.29 on Feb 8th. The backups have not worked
   since. Replacement Drives have been tried and cables to no
   avail. I noticed in the the changelog that a patch by Solar
   Designer to the Scsi tape return code had been made.
 
 Last kernel to work correctly in 2.4 branch was 2.4.26.  Kernel
 versions 2.4.27, 2.4.28 and 2.4.29 all fail based on my experience
 with DLT SCSI based tape.
 
   
   v2.6 also contains the same problem BTW.
   
   Try this:
   
   --- a/drivers/scsi/st.c.orig 2005-03-02 09:02:13.637158144
   -0300+++ b/drivers/scsi/st.c 2005-03-02 09:02:20.208159200
   -0300@@ -3778,7 +3778,6 @@
 read:  st_read,
 write:  st_write,
 ioctl:  st_ioctl,
   - llseek:  no_llseek,
 open:  st_open,
 flush:  st_flush,
 release: st_release,
   -
   
   Interesting Marcelo.  How long has this been true in 2.6?
 
 In the 2.6 tree the tar --verify works with 2.6.7, but fails with
 2.6.9. I am unable to test 2.6.8, but based on research of the code
 changes of 2.6.8 compared to the changes made in 2.4.27 re llseek I
 would expect 2.6.8 to fail as well with my DLT SCSI tape. 
 
  
  Actually I just checked and it seems v2.6 is not using
  no_llseek.
  
  However John L. Males reports the same problem with v2.6 - John,
  care to retest with v2.6.10 ?
 
 My ability to test a 2.6.x kernel is limited to what 2.6.x kernel I
 can find on a livecd.  The 2.6.7 and 2.6.9 kernel tests I conducted
 were using Knoppix 3.6 and 3.7.  I do not have means at this time,
 nor time, to build up a dedicated drive to test 2.6.x kernels.  If
 someone knows of or can build a 2.6.10 kernel on a live CD I will be
 happy to do the test.  That said, I looked at the patch for 2.6.10
 and seems alot of changes were made to st.c in 2.6.10.  I did not
 see, but could of missed in looking, any lseek related change in
 2.6.10.  Given how it seems the test I ran with the change in st.c
 Marcell suggested what is the expected thought on this issue with
 2.6.10?  I am just asking from curiousity.  Again, if someone can
 tell me of a live cd or can easly make a live cd with the 2.6.10
 kernel I can test this issue wiht 2.6.10.  Perhaps there is someone
 else with a DLT/SCSI tape driver that could test this tar --verify
 issue on 2.6.10?
 
  
   I thought I had an amanda problem, and eventually went to
   virtual tapes on disk, largely because of this.  However, I have
   to say it is working better than tapes ever

Re: Problems with SCSI tape rewind / verify on 2.4.29

2005-03-02 Thread John L. Males
Sorry gents,

Let me correct this one more time.


Regards,

John L. Males
Willowdale, Ontario
Canada
02 March 2005 16:26


** Reply Seperator **

On (Wed) 2005-03-02 16:15:07 -0500 
John L. Males wrote in Message-ID:
[EMAIL PROTECTED]

To: [EMAIL PROTECTED]
From: John L. Males [EMAIL PROTECTED]
Subject: Re: Re[03]: Problems with SCSI tape rewind / verify on 2.4.29
Date: Wed, 2 Mar 2005 16:15:07 -0500

 Marcelo,
 
 Sorry gents, seems the LKML used to handel the RE numbering in long
 past when I last mailed to LKML, bit not now, so resending this
 eMail to ensure goes back to orignal thread so all the eMail
 discussion is in one eMail thread.
 
 My applogies if this caused confusion on the LKML.
 
 
 Regards,
 
 John L. Males
 Willowdale, Ontario
 Canada
 02 March 2005 (16:06 -) 16:15
 
 
 ** Reply Seperator **
 
 On (Wed) 2005-03-02 15:46:26 -0500 
 John L. Males wrote in Message-ID:
 [EMAIL PROTECTED]
 
 To: Marcelo Tosatti [EMAIL PROTECTED]
 From: John L. Males [EMAIL PROTECTED]
 Subject: Re[03]: Problems with SCSI tape rewind / verify on 2.4.29
 Date: Wed, 2 Mar 2005 15:46:26 -0500
 
  Hi Marcello,
  
  
  ** Reply Seperator **
  
  On (Wed) 2005-03-02 11:34:41 -0300 
  Marcelo Tosatti wrote in Message-ID:
  [EMAIL PROTECTED]
  
  To: Gene Heskett [EMAIL PROTECTED]
  From: Marcelo Tosatti [EMAIL PROTECTED]
  Subject: Re: Problems with SCSI tape rewind / verify on 2.4.29
  Date: Wed, 2 Mar 2005 11:34:41 -0300
  
   
   n Wed, Mar 02, 2005 at 12:08:51PM -0500, Gene Heskett wrote:
On Wednesday 02 March 2005 07:03, Marcelo Tosatti wrote:
On Wed, Mar 02, 2005 at 11:15:42AM -, Mark Yeatman wrote:
 Hi

 Never had to log a bug before, hope this is correctly done.

 Thanks

 Mark

 Detail

 [1.] One line summary of the problem:
 SCSI tape drive is refusing to rewind after backup to allow
verify and causing illegal seek error
  
  In my experiences with this problem that I am sure is exactly the
  same issue, the tape in fact does rewind after creating the tar to
  then perfore the --verify option.  The illegal see error seems to
  arise after the rewind or more correctly bsf commands may be being
  used (but not confirmed this yet, another test I need to do in few
  days).  For sure the tape is positionted back.  I know as I have
  also done this test with larger directories and know and have
  heard the tape take long time to naturally get back to the start
  of tar file to be able to perform the --verify option.  That said
  I am using a DLT drive. Perhpas with different drivers or tape
  drivers this issue may have variations on theme and behaviour with
  net result the same error message and/or root cause.
  

 [2.] Full description of the problem/report:
 On backup the tape drive is reporting the following error
and failing it's backups.

 tar: /dev/st0: Warning: Cannot seek: Illegal seek

 I have traced this back to failing at an upgrade of the
kernel to 2.4.29 on Feb 8th. The backups have not worked
since. Replacement Drives have been tried and cables to no
avail. I noticed in the the changelog that a patch by Solar
Designer to the Scsi tape return code had been made.
  
  Last kernel to work correctly in 2.4 branch was 2.4.26.  Kernel
  versions 2.4.27, 2.4.28 and 2.4.29 all fail based on my experience
  with DLT SCSI based tape.
  

v2.6 also contains the same problem BTW.

Try this:

--- a/drivers/scsi/st.c.orig 2005-03-02 09:02:13.637158144
-0300+++ b/drivers/scsi/st.c 2005-03-02 09:02:20.208159200
-0300@@ -3778,7 +3778,6 @@
  read:  st_read,
  write:  st_write,
  ioctl:  st_ioctl,
- llseek:  no_llseek,
  open:  st_open,
  flush:  st_flush,
  release: st_release,
-

Interesting Marcelo.  How long has this been true in 2.6?
  
  In the 2.6 tree the tar --verify works with 2.6.7, but fails with
  2.6.9. I am unable to test 2.6.8, but based on research of the
  code changes of 2.6.8 compared to the changes made in 2.4.27 re
  llseek I would expect 2.6.8 to fail as well with my DLT SCSI tape.
  
  
   
   Actually I just checked and it seems v2.6 is not using
   no_llseek.
   
   However John L. Males reports the same problem with v2.6 - John,
   care to retest with v2.6.10 ?
  
  My ability to test a 2.6.x kernel is limited to what 2.6.x kernel
  I can find on a livecd.  The 2.6.7 and 2.6.9 kernel tests I
  conducted were using Knoppix 3.6 and 3.7.  I do not have means at
  this time, nor time, to build up a dedicated drive to test 2.6.x
  kernels.  If someone knows of or can build a 2.6.10 kernel on a
  live CD I will be happy to do the test.  That said, I looked at
  the patch for 2.6.10 and seems alot of changes were made to st.c
  in 2.6.10.  I did not see, but could of missed in looking, any
  lseek related change in 2.6.10.  Given how it seems the test I ran

Re: Problems with SCSI tape rewind / verify on 2.4.29

2005-03-02 Thread John L. Males
Marcelo,

My couple cents worth:

 On Wed, Mar 02, 2005 at 11:17:19PM +0200, Kai Makisara wrote:
  On Wed, 2 Mar 2005, Marcelo Tosatti wrote:
  
   On Wed, Mar 02, 2005 at 11:15:42AM -, Mark Yeatman wrote:
Hi

Never had to log a bug before, hope this is correctly done.

Thanks

Mark

Detail

[1.] One line summary of the problem:
SCSI tape drive is refusing to rewind after backup to allow
verify and causing illegal seek error

[2.] Full description of the problem/report:
On backup the tape drive is reporting the following error and
failing it's backups.

tar: /dev/st0: Warning: Cannot seek: Illegal seek

I have traced this back to failing at an upgrade of the kernel
to 2.4.29 on Feb 8th. The backups have not worked since.
Replacement Drives have been tried and cables to no avail. I
noticed in the the changelog that a patch by Solar Designer to
the Scsi tape return code had been made. 
  
  BTW, this fix by Solar Designer introduces a bug to 2.4.29: a
  tape driver is supposed to return ENOMEM in the case that was
  changed to return EIO ;-(
 
 Reverted.
 
   v2.6 also contains the same problem BTW.
   
   Try this:
   
   --- a/drivers/scsi/st.c.orig  2005-03-02 09:02:13.637158144 -0300
   +++ b/drivers/scsi/st.c   2005-03-02 09:02:20.208159200 -0300
   @@ -3778,7 +3778,6 @@
 read:   st_read,
 write:  st_write,
 ioctl:  st_ioctl,
   - llseek: no_llseek,
 open:   st_open,
 flush:  st_flush,
 release:st_release,
  
  This change covers up the problem. The real bug is in tar. The
  following code is from tar is supposed to reposition the tape to
  the beginning of the file jus written:
  
  #ifdef MTIOCTOP
{
  struct mtop operation;
  int status;
  
  operation.mt_op = MTBSF;
  operation.mt_count = 1;
  if (status = rmtioctl (archive, MTIOCTOP, (char *)
  operation), status 
   0)
{
  if (errno != EIO
  || (status = rmtioctl (archive, MTIOCTOP, (char *) 
  operation),
  status  0))
{
  #endif
  if (rmtlseek (archive, (off_t) 0, SEEK_SET) != 0)
{
  /* Lseek failed.  Try a different method.  */
  seek_warn (archive_name_array[0]);
  return;
}
  #ifdef MTIOCTOP
}
}
}
  #endif
  
  
  Here is output from strace showing what happens with 'tar -c -W'
  applied at the beginning of the tape (this is using kernel
  2.6.11-rc4 but the same probably happens with 2.4.29):
  ...
  ioctl(3, MGSL_IOCGPARAMS or MTIOCTOP or SNDCTL_MIDI_MPUMODE, 
  0x7fffecd0) = -1 EIO (Input/output error)
  ioctl(3, MGSL_IOCGPARAMS or MTIOCTOP or SNDCTL_MIDI_MPUMODE, 
  0x7fffecd0) = -1 EIO (Input/output error)
  lseek(3, 0, SEEK_SET)   = -1 ESPIPE (Illegal seek)
  
  So, both tape positioning commands fail and the code falls back to
  lseek. Earlier it has returned success even though it has not done
  anything (this was on purpose because it is the way some other
  Unices behave and with reason). In that case this tar succeeded
  but it was pure luck. The first BSF did position the tape
  correctly although it did fail.
  
  The 2.6 st driver does contain this near the beginning of
  st_open():
  
  nonseekable_open(inode, filp);
  
  This probably makes lseek fail. This code has been in st.c since
  2.6.8.
 
 Thanks for the cluebat Kai, is this problem fixed in newer versions
 of tar? 

My testing last week or so has been with the latest tar, tar-1.15.1-2,
tar 1.14 and 1.13 had same lseek --verify issues.

 
 I suspect v2.4 should work with older versions of tar, so we should
 keep lseek working to make it happy. What is your opinion?


I would agree in the lseek sense.  I feel that if there is some bad
behaviour in tar it should be reported to the tar folks to be fixed so
long term things are done correctly and over time the kernel
worarounds can be depreciated.


Regards

John L. Males
Willowdale, Ontario
Canada
02 March 2005 (17:04 -) 17:25


==


Bmer ... Boom Boom, how are you Boom Boom
Meow, meoaa as Boomer loudly announces
 intent Boomer is coming for attention
Loved to kneed arm and lick arm with Boomers very large
 tongue
Able to catch, or at least hit, almost any object in flight
 withing reach of front paws
Boomer 1985 (Born), Adopted 04 September 1991
04 September 1991 - 08 February 2000 18:50

How are you Mr. Sylvester?
... Grunt Grunt ... quick licks of nose
Rolls over for pet and stomac rub when Dad arrives home
 and grunting
Runs back and forth from study, tilts head as glowing green
 eyes stare for attention please, grunts and meows,
 repeats run, tilt head and stare few times for good

Re: Problems with SCSI tape rewind / verify on 2.4.29

2005-03-02 Thread John L. Males
Andrew/Kai,

 List:   linux-kernel
 Subject:Re: Problems with SCSI tape rewind / verify on 2.4.29
 From:   Andrew Morton akpm () osdl ! org
 Date:   2005-03-02 22:17:11
 Message-ID: 20050302141711.00ec7147.akpm () osdl ! org
 [Download message RAW]
 
 Kai Makisara [EMAIL PROTECTED] wrote:
 
  f seek with tape is changed back to returning success, this would
  enable correct tar --verify at the beginning of the tape. However,
  I am not sure what happens if we are not at the beginning. I will
  investigate this and suggest a long term fix to the tar people (a
  fix that should be compatible with all Unix tape semantics I know)
  and also suggest possible fixes to st (this may include automatic
  writing of a filemark when BSF is used after writes).

Kai,

I have a second problem that is perhaps another case of kernel and tar
combined effect problem.  I have not had time to test with the 2.6.7
and 2.6.9 knoppix based kernels to see if same problem as = 2.4.26
has.  Can you hold out about 3-4 days for me to do the test and report
the issue to Marcelo to screen first, Kai?  I have feeling what I
experienced in testing change to st.c Marcelo suggested that caused me
to try 2.4.26 again and fail on this new issue has some bearing on
tape positioning you want to check out.

 
 Yes, please let's get a tar fix in the pipeline.
 
 GNU tar must run on a lot of operating systems.  It's odd.
 
  If you think want to make st return success for seeks even if
  nothing happens (as it did earlier), I don't have anything against
  that. It would 

I think it is important if an error is enccountered a non-successful
return (code) is returned.  If an action is required that requires no
action as it is at the place/state/position being requested it is
reasonable to return a successful return (code). 

  solve the practical problem several people have reported recently.
  (My recommendation for the people seeing this problem is to do
  verification separately with 'tar -d'.)

As aside, I have tried the tar -d option as well and it worked, but
was my understanding the --verify does a data readback compare of the
files in the tar, whereas the tar -d option only compares if files
names in tar to directory?  That to me means a big difference in
confidance the tar backup is ok, as I look to have readback verify to
increase confidance of backup success.

 
 Yes, I think we need to grit our teeth and do this.  I'll stick a
 comment in there.


Regards,

John L. Males
Willowdale, Ontario
Canada
02 March 2005 (17:45 -) 18:51

==


Bmer ... Boom Boom, how are you Boom Boom
Meow, meoaa as Boomer loudly announces
 intent Boomer is coming for attention
Loved to kneed arm and lick arm with Boomers very large
 tongue
Able to catch, or at least hit, almost any object in flight
 withing reach of front paws
Boomer 1985 (Born), Adopted 04 September 1991
04 September 1991 - 08 February 2000 18:50

How are you Mr. Sylvester?
... Grunt Grunt ... quick licks of nose
Rolls over for pet and stomac rub when Dad arrives home
 and grunting
Runs back and forth from study, tilts head as glowing green
 eyes stare for attention please, grunts and meows,
 repeats run, tilt head and stare few times for good
 measure, grunts and meows
Lays on floor just outside study to guard Dad
Loved to groom Miss Mahogany, and let Mahogany cuddle beside
Sylvester 1989 (estimated Born)
Found in building mail area noon hour 09 Feburary 1992
09 February 1992 - 19 January 2003 23:25

Hello Miss Chicago 'White Sox', how are you 'Chico'?
Grunt (thank you) ... as put out food for Chicago
ME So loud the world stops
A very determined Miss White Sox
AKA Chico ... Cheryl Crawford used as nickname
Loved to chase kibble slid down hall floor,
 bat about and then eat
Loved to hook paw in dish to toss out a single kibble
 at time, dart at as moved, then eat ... Crunches
Chicago White Sox, Chico August 1989 (born),
 adopted 04 February 1991
05 October 2004 06:52 Quite Grunts 
  as lay Chicago on bed for last time
04 February 1991 - 05 October 2004 07:32



pgp7yXKaQLjXX.pgp
Description: PGP signature


Linux Kernel 2.2.19 Available Memory Bug

2001-07-20 Thread John L. Males

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello,

Please note I am not on the Kernel Mailing List so do try to copy me
in with any reply, questions, clarifications, confirmations, et al on
this bug report.

Before I forget I am using SuSE 6.4 with most of the updates from the
base applied, most meaning I generally lag a bit behind at times.  I
am NOT using the SuSE kernel due to bugs introduced into the SuSE
kernel with the SuSE patches/enhancements.  I am using the Linus
2.2.19 Kernel with the OpenWall patches.  System is AMD K6-2 500
based system.  The version of gcc was 2.95.2 to compile the kernel.

The bug I am reporting is that when one sets the amount of memory,
i.e. 128M, 256M; at the time of booting the 2.2.19 kernel the "Total
Memory" as reported by KDE, "free", etc is short by a important
amount.  To be more specific I will detail the results of "free"
below against the "mem" value passed to the kernel.  Please note for
the purposes of this test I always had 256MB or ram (2x128MB)
installed in my system.  The BIOS reports total system memory as
262144K.

"mem=256m"
***

KDE reports 251.09 Total System memory, or 263290880 bytes.

"free -m" indicates "Total Memory" as 251
"free -k" indicates "Total Memory" as 257120
"free -k" indicates "Total Memory" as 263290880

The exact same vaules as noted above are indicated for "mem=262144k",
and "mem=268435546" (256 X 1024 x 1024).

"mem=128m"
***

"free -m" indicates "Total Memory" as 124
"free -k" indicates "Total Memory" as 127344
"free -k" indicates "Total Memory" as 130400256


Regards,

John L. Males
Software I.Q. Consulting
Toronto, Ontario
Canada
20 July 2001 03:47
mailto:[EMAIL PROTECTED]
mailto:[EMAIL PROTECTED]

-BEGIN PGP SIGNATURE-
Version: PGPfreeware 6.5.8 for non-commercial use 
<http://www.pgp.com>

iQA/AwUBO1fwKPLzhJbmoDZ+EQKQowCfcqeGPdpduaFpTQO1P9XaOlJccHEAn20p
v0V59vV7rrFEvMQCLwzXyO2V
=Ezn3
-END PGP SIGNATURE-

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Linux Kernel 2.2.19 Available Memory Bug

2001-07-20 Thread John L. Males

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello,

Please note I am not on the Kernel Mailing List so do try to copy me
in with any reply, questions, clarifications, confirmations, et al on
this bug report.

Before I forget I am using SuSE 6.4 with most of the updates from the
base applied, most meaning I generally lag a bit behind at times.  I
am NOT using the SuSE kernel due to bugs introduced into the SuSE
kernel with the SuSE patches/enhancements.  I am using the Linus
2.2.19 Kernel with the OpenWall patches.  System is AMD K6-2 500
based system.  The version of gcc was 2.95.2 to compile the kernel.

The bug I am reporting is that when one sets the amount of memory,
i.e. 128M, 256M; at the time of booting the 2.2.19 kernel the Total
Memory as reported by KDE, free, etc is short by a important
amount.  To be more specific I will detail the results of free
below against the mem value passed to the kernel.  Please note for
the purposes of this test I always had 256MB or ram (2x128MB)
installed in my system.  The BIOS reports total system memory as
262144K.

mem=256m
***

KDE reports 251.09 Total System memory, or 263290880 bytes.

free -m indicates Total Memory as 251
free -k indicates Total Memory as 257120
free -k indicates Total Memory as 263290880

The exact same vaules as noted above are indicated for mem=262144k,
and mem=268435546 (256 X 1024 x 1024).

mem=128m
***

free -m indicates Total Memory as 124
free -k indicates Total Memory as 127344
free -k indicates Total Memory as 130400256


Regards,

John L. Males
Software I.Q. Consulting
Toronto, Ontario
Canada
20 July 2001 03:47
mailto:[EMAIL PROTECTED]
mailto:[EMAIL PROTECTED]

-BEGIN PGP SIGNATURE-
Version: PGPfreeware 6.5.8 for non-commercial use 
http://www.pgp.com

iQA/AwUBO1fwKPLzhJbmoDZ+EQKQowCfcqeGPdpduaFpTQO1P9XaOlJccHEAn20p
v0V59vV7rrFEvMQCLwzXyO2V
=Ezn3
-END PGP SIGNATURE-

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/