Kernel Failure - 3.4.24 Similar USB MO To 3.4.89 Kernel Failure
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello, Please CC me in on replies as I am not part of the LKML. As the prior round of discussion about this ongoing USB Kernel problem was with Sebastian, I have CC'ed Sebastian in on this posting as well. Again this is because the Linux Kernel information suggests CCing in someone that might be able to assist for the area of concern. My hope is that this will assist in determining who should be the kernel developer that needs to look at these Kernel failures and the crash/opps if need be. I have a very very busy and unpredictable schedule, so I would ask for patience in a reply from me if one is so needed. For the last few years I have had about a half dozen Kernel failures that all appear to be related to USB devices being plugged in. The last occurrence a few months ago to the one today actually caused a kernel crash/opps to the console resulting in the only option was to power off the machine and power it back on. I took a high quality DSLR image of the screen which clearly has important information roll off as the screen was not large enough to hold the information. I also searched high and low using a my tablet for a few days to see if I could find out how I might be able to secure the information that rolled off the screen, not to mention have it in a easy to use form for the Kernel developers to work with. I have looked since powering up the machine from that event and many times since and can only find references, as then, to using a second machine connected to the machine had had the Kernel crash/opps via serial using a debugger. I do not have the kernel experience or such at this point to know how to do this and reading suggested some one or few Kernel options were needed in the Kernel for this serial debugging approach to work. So on that note if anyone can advise me if there is a way to find where a kernel crash/opps is stored that one can collect and send to the Kernel Developers I would be most appreciative. I have and still do make efforts to find the information. It is possible I am not using the correct search terms or know where I need to look to read the about the information. About 14:49 EDT my system experienced yet another Linux Kernel failure. Again it was related to inserting a basic USB, not a MP3 player USB, just a plain data USB. This followed my removing a different USB after issuing a pumount command that returned as successful. I have attached a copy of the kernel failure details. If there is a desire to see the DSLR screen image of the prior kernel crash/opps please advise me to do so. Please be aware I do not use any drivers other than those in the Linux Kernel other than those in the stock Kernel. I do not need any unique drivers for my machine or the devices I use with my laptop. Also be aware all of the 3.x Linux Kernels I have used are from Kernel.org and I compile these myself using the same configuration file plus any additional config file item options I set that are added to the next 3.4.x kernel version I compile. This means there is no reason for my kernel to ever be tainted. If my Linux 3.4.x Kernel is listed as tainted, it is the stock Linux Kernel that has so decided for some reason. Regards, John L. Males Toronto, Ontario Canada 16 May 2014 17:30 -0400 EDT 2014-05-16 16:56:58.344920846-0400-EDT Time: 1400273818 16 May 16:56:58 ntpdate[14149]: ntpdate 4.2.6p2@1.2194-o Sun Oct 17 13:35:14 UTC 2010 (1) 16 May 16:57:12 ntpdate[14154]: step time server 208.80.96.70 offset 0.003026 sec Linux 3.4.89-kernel.org-jlm-010-amd64 #1 SMP PREEMPT Wed May 7 22:33:10 EDT 2014 Modified Debian GNU/Linux 6.0.3 (squeeze) (Alternative to Debian determined, work in progress) cat /proc/cpuinfo (Selected): model name : Intel(R) Core(TM)2 CPU T5600 @ 1.83GHz vmstat -s: 3452464 K total memory 3381088 K used memory 2608984 K active memory 570068 K inactive memory 71376 K free memory 2796 K buffer memory 106480 K swap cache 8225244 K total swap 1875240 K used swap 6350004 K free swap 36725845 non-nice user cpu ticks 692898 nice user cpu ticks 4757452 system cpu ticks 78815904 idle cpu ticks 2909319 IO-wait cpu ticks 5590 IRQ cpu ticks 1678486 softirq cpu ticks 0 stolen cpu ticks 81758774 pages paged in 66779328 pages paged out 6643777 pages swapped in 5417469 pages swapped out 431124356 interrupts 567863734 CPU context switches 1399647013 boot time 175501 forks /proc/vmstat (Selected): pgpgin 81758774 pgpgout 66779328 pswpin 6643777 pswpout 5417469 pgfree 776294670 pgfault 546643863 pgmajfault 2018217 /proc/meminfo (Selected): Mlocked:6604 kB VmallocTotal: 34359738367 kB VmallocChunk: 34359322080 kB HugePages_Total: 0 vmstat --partition /dev
Kernel Failure - 3.4.24 Similar USB MO To 3.4.89 Kernel Failure
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello, Please CC me in on replies as I am not part of the LKML. As the prior round of discussion about this ongoing USB Kernel problem was with Sebastian, I have CC'ed Sebastian in on this posting as well. Again this is because the Linux Kernel information suggests CCing in someone that might be able to assist for the area of concern. My hope is that this will assist in determining who should be the kernel developer that needs to look at these Kernel failures and the crash/opps if need be. I have a very very busy and unpredictable schedule, so I would ask for patience in a reply from me if one is so needed. For the last few years I have had about a half dozen Kernel failures that all appear to be related to USB devices being plugged in. The last occurrence a few months ago to the one today actually caused a kernel crash/opps to the console resulting in the only option was to power off the machine and power it back on. I took a high quality DSLR image of the screen which clearly has important information roll off as the screen was not large enough to hold the information. I also searched high and low using a my tablet for a few days to see if I could find out how I might be able to secure the information that rolled off the screen, not to mention have it in a easy to use form for the Kernel developers to work with. I have looked since powering up the machine from that event and many times since and can only find references, as then, to using a second machine connected to the machine had had the Kernel crash/opps via serial using a debugger. I do not have the kernel experience or such at this point to know how to do this and reading suggested some one or few Kernel options were needed in the Kernel for this serial debugging approach to work. So on that note if anyone can advise me if there is a way to find where a kernel crash/opps is stored that one can collect and send to the Kernel Developers I would be most appreciative. I have and still do make efforts to find the information. It is possible I am not using the correct search terms or know where I need to look to read the about the information. About 14:49 EDT my system experienced yet another Linux Kernel failure. Again it was related to inserting a basic USB, not a MP3 player USB, just a plain data USB. This followed my removing a different USB after issuing a pumount command that returned as successful. I have attached a copy of the kernel failure details. If there is a desire to see the DSLR screen image of the prior kernel crash/opps please advise me to do so. Please be aware I do not use any drivers other than those in the Linux Kernel other than those in the stock Kernel. I do not need any unique drivers for my machine or the devices I use with my laptop. Also be aware all of the 3.x Linux Kernels I have used are from Kernel.org and I compile these myself using the same configuration file plus any additional config file item options I set that are added to the next 3.4.x kernel version I compile. This means there is no reason for my kernel to ever be tainted. If my Linux 3.4.x Kernel is listed as tainted, it is the stock Linux Kernel that has so decided for some reason. Regards, John L. Males Toronto, Ontario Canada 16 May 2014 17:30 -0400 EDT 2014-05-16 16:56:58.344920846-0400-EDT Time: 1400273818 16 May 16:56:58 ntpdate[14149]: ntpdate 4.2.6p2@1.2194-o Sun Oct 17 13:35:14 UTC 2010 (1) 16 May 16:57:12 ntpdate[14154]: step time server 208.80.96.70 offset 0.003026 sec Linux 3.4.89-kernel.org-jlm-010-amd64 #1 SMP PREEMPT Wed May 7 22:33:10 EDT 2014 Modified Debian GNU/Linux 6.0.3 (squeeze) (Alternative to Debian determined, work in progress) cat /proc/cpuinfo (Selected): model name : Intel(R) Core(TM)2 CPU T5600 @ 1.83GHz vmstat -s: 3452464 K total memory 3381088 K used memory 2608984 K active memory 570068 K inactive memory 71376 K free memory 2796 K buffer memory 106480 K swap cache 8225244 K total swap 1875240 K used swap 6350004 K free swap 36725845 non-nice user cpu ticks 692898 nice user cpu ticks 4757452 system cpu ticks 78815904 idle cpu ticks 2909319 IO-wait cpu ticks 5590 IRQ cpu ticks 1678486 softirq cpu ticks 0 stolen cpu ticks 81758774 pages paged in 66779328 pages paged out 6643777 pages swapped in 5417469 pages swapped out 431124356 interrupts 567863734 CPU context switches 1399647013 boot time 175501 forks /proc/vmstat (Selected): pgpgin 81758774 pgpgout 66779328 pswpin 6643777 pswpout 5417469 pgfree 776294670 pgfault 546643863 pgmajfault 2018217 /proc/meminfo (Selected): Mlocked:6604 kB VmallocTotal: 34359738367 kB VmallocChunk: 34359322080 kB HugePages_Total: 0 vmstat --partition /dev
Linux Kernel Commit 025cee7f8fef02af09b03c8e1cd9843cb32adf9b
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Dave, With respect to Linux Kernel Commit 025cee7f8fef02af09b03c8e1cd9843cb32adf9b Change Log: <http://www.kernel.org/pub/linux/kernel/v3.0/ChangeLog-3.4.34> comment "It has probably been the cause of a number of subtle bugs over the years, although the conditions to excite them would have been hard to trigger." Would the Linux Kernel under "unique" Virtual Memory Subsystem stress excite some of these "subtle bugs" in Linux Kernels prior to this change set for Commit 025cee7f8fef02af09b03c8e1cd9843cb32adf9b? I am not asking for you to identify the subtle bugs. It is obvious it would be difficult to determine from reported and unreported cases such bugs. Just your sense or first hand knowledge if a Linux Kernel under "unique" stress including Virtual Memory Subsystem stress would be a key element of? I am one of those who seems to often cause system kernels of any OS stress due to my very high power user use of Operating Systems and hence why my question. Regards, John L. Males Toronto, Ontario Canada 28 February 2013 17:11 <mailto:jlma...@gmail.com> == 2013-02-28 16:57:24.110757989-0500-EST 28 Feb 16:57:24 ntpdate[27847]: ntpdate 4.2.6p2@1.2194-o Sun Oct 17 13:35:14 UTC 2010 (1) 28 Feb 16:58:59 ntpdate[27852]: step time server 132.246.11.228 offset -0.000138 sec Linux 3.4.24-kernel.org-jlm-010-amd64 #1 SMP PREEMPT Sun Dec 23 10:06:41 EST 2012 Modified Debian GNU/Linux 6.0.3 (squeeze) (Evaluating alternatives to Debian) -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) iEYEARECAAYFAlEv1h8ACgkQ+V/XUtB6aBDSzQCgiu+hskZzz2bMfLG5u+Ao9YzJ hwIAn1IY5jnq0sJjIe0nxFnA+LKrGtAh =1KDN -END PGP SIGNATURE- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Linux Kernel Commit 025cee7f8fef02af09b03c8e1cd9843cb32adf9b
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Dave, With respect to Linux Kernel Commit 025cee7f8fef02af09b03c8e1cd9843cb32adf9b Change Log: http://www.kernel.org/pub/linux/kernel/v3.0/ChangeLog-3.4.34 comment It has probably been the cause of a number of subtle bugs over the years, although the conditions to excite them would have been hard to trigger. Would the Linux Kernel under unique Virtual Memory Subsystem stress excite some of these subtle bugs in Linux Kernels prior to this change set for Commit 025cee7f8fef02af09b03c8e1cd9843cb32adf9b? I am not asking for you to identify the subtle bugs. It is obvious it would be difficult to determine from reported and unreported cases such bugs. Just your sense or first hand knowledge if a Linux Kernel under unique stress including Virtual Memory Subsystem stress would be a key element of? I am one of those who seems to often cause system kernels of any OS stress due to my very high power user use of Operating Systems and hence why my question. Regards, John L. Males Toronto, Ontario Canada 28 February 2013 17:11 mailto:jlma...@gmail.com == 2013-02-28 16:57:24.110757989-0500-EST 28 Feb 16:57:24 ntpdate[27847]: ntpdate 4.2.6p2@1.2194-o Sun Oct 17 13:35:14 UTC 2010 (1) 28 Feb 16:58:59 ntpdate[27852]: step time server 132.246.11.228 offset -0.000138 sec Linux 3.4.24-kernel.org-jlm-010-amd64 #1 SMP PREEMPT Sun Dec 23 10:06:41 EST 2012 Modified Debian GNU/Linux 6.0.3 (squeeze) (Evaluating alternatives to Debian) -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) iEYEARECAAYFAlEv1h8ACgkQ+V/XUtB6aBDSzQCgiu+hskZzz2bMfLG5u+Ao9YzJ hwIAn1IY5jnq0sJjIe0nxFnA+LKrGtAh =1KDN -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re[04]: Kernel Failure - 3.4.24
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Sebastian, Message replied to: Date: Tue, 29 Jan 2013 22:32:53 +0100 From: Sebastian Andrzej Siewior To: jlma...@gmail.com Cc: linux-kernel@vger.kernel.org Subject: Re: Kernel Failure - 3.4.24 > On 01/28/2013 08:57 PM, John L. Males wrote: > > I was not suggesting you are responsible for the bug at > > all. On > Okay then :) > > > I have no custom patches to the kernel. > okay. > > > I looked at the RedHat bug 468794. The bug seems to > > indicate it was never fixed. The bug was reported against > > 2.6.27.4-47.rc3.fc10.i686 #1 on 2008-10-27 21:34:04 EDT and > > was closed 2009-12-18 01:40:33 EST. The differences are a > > bug of at least 5 years ago and a 2.6 kernel verses 5 years > > later and current at time stable kernel 3.4.24 from > > kernel.org with no patches I applied when this kernel > > failure I encountered occurred. If this is the same bug > > then there is a bug that may have been about for a while or > > perhaps a regression. The fact is the RedHat bug 468794 > > was never fixed. > > Lets see. According to the backtrace it seems that the kernel > was not able to write the buffer back to disk. The RH bug > says that someone unplugged the device without an unmount of > the disk. Yes I read that someone unplugged the device without an unmount of the device in the RedHat log. > > My question are: > - what were you doing by the time this happened? I plugged the USB device into my laptop, then removed it. There was no user activity related to activity on the device. If there was activity, as opposed to user based activity, to warrant the kernel needing to write a buffer to the USB flash drive it was not a result of any user activity to the USB drive. Based on your findings in the back trace the kernel was not able to write a buffer to the USB device from what happened at the time. I would be concerned that the kernel thought there was a buffer to write when the user, which was me, performed no activity upon the USB device. The person who owns the USB device knows next to nothing about computers, let alone Windows or Linux, so I would be the only one performing any actions related to the device. > - can you reproduce it (reliably)? No, I did try exactly what I did when the kernel failure happened and sadly could not recreate the issue. I know how important that question is and tried a few times to cause the problem. I was hoping the kernel failure information would have information indicating the cause of the failure. I often placed my system in hibernate such that my system will go a month or bit more before I will reboot my kernel or to boot a newly compiled kernel. I know for a while in the early 3.2.x releases doing so caused the kernel some issues and the system would need to be rebooted or would just reboot on its own. There are a number of variable to this. I do not know if this USB failure is an artifact of that often necessary practive I have to place my system in hibernate almost daily, sometimes a few times during the day. As an FYI I had a full kernel opps a few 3.2.x versions ago. It was my first one in years. I was hoping there would be a file of the information that displayed on the screen. My research after the kernel opps suggests one has to write down the information on the screen from a kernel opps, which I did not do as I did not think I would need to anymore. The reason I mention this is that kernel opps was with a USB device as well. The difference was it was a USB Wireless BGN device that I have used many times over the last 12 months with a number of 3.2.x kernels with no kernel opps/failure, just odd functional issues that seem to resolve in later kernel versions. The kernel opps that occurred with this Wireless BGN device only occurred once with that exact older 3.2.x kernel version and I have no clue why. I have no information I know of about that kernel opps that might help with this kernel failure. I did not know I needed to write down the screen from the opps. I therefore cannot provide the kernel opps information that might share some common findings with the kernel failure of this issue. I suspect there may be nothing in common, but without the kernel opps information we will not know for certain. The USB device was a MP3 player that acts like a flash USB drive when it is plugged into a computer. This means one can copy to/from, rename, delete files using the command line or any file manager one uses. > - Is this *new* meaning is there a kernel where did not > happen? I am not sure where the "new" reference you are referring to is from. That said, the only time this person's MP3 player/USB flash was used was with the kernel.org 3.2.24 kernel I noted. The only other USB problem I had was once with a USB Wireless BGN d
Re[04]: Kernel Failure - 3.4.24
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Sebastian, Message replied to: Date: Tue, 29 Jan 2013 22:32:53 +0100 From: Sebastian Andrzej Siewior bige...@linutronix.de To: jlma...@gmail.com Cc: linux-kernel@vger.kernel.org Subject: Re: Kernel Failure - 3.4.24 On 01/28/2013 08:57 PM, John L. Males wrote: I was not suggesting you are responsible for the bug at all. On Okay then :) I have no custom patches to the kernel. okay. I looked at the RedHat bug 468794. The bug seems to indicate it was never fixed. The bug was reported against 2.6.27.4-47.rc3.fc10.i686 #1 on 2008-10-27 21:34:04 EDT and was closed 2009-12-18 01:40:33 EST. The differences are a bug of at least 5 years ago and a 2.6 kernel verses 5 years later and current at time stable kernel 3.4.24 from kernel.org with no patches I applied when this kernel failure I encountered occurred. If this is the same bug then there is a bug that may have been about for a while or perhaps a regression. The fact is the RedHat bug 468794 was never fixed. Lets see. According to the backtrace it seems that the kernel was not able to write the buffer back to disk. The RH bug says that someone unplugged the device without an unmount of the disk. Yes I read that someone unplugged the device without an unmount of the device in the RedHat log. My question are: - what were you doing by the time this happened? I plugged the USB device into my laptop, then removed it. There was no user activity related to activity on the device. If there was activity, as opposed to user based activity, to warrant the kernel needing to write a buffer to the USB flash drive it was not a result of any user activity to the USB drive. Based on your findings in the back trace the kernel was not able to write a buffer to the USB device from what happened at the time. I would be concerned that the kernel thought there was a buffer to write when the user, which was me, performed no activity upon the USB device. The person who owns the USB device knows next to nothing about computers, let alone Windows or Linux, so I would be the only one performing any actions related to the device. - can you reproduce it (reliably)? No, I did try exactly what I did when the kernel failure happened and sadly could not recreate the issue. I know how important that question is and tried a few times to cause the problem. I was hoping the kernel failure information would have information indicating the cause of the failure. I often placed my system in hibernate such that my system will go a month or bit more before I will reboot my kernel or to boot a newly compiled kernel. I know for a while in the early 3.2.x releases doing so caused the kernel some issues and the system would need to be rebooted or would just reboot on its own. There are a number of variable to this. I do not know if this USB failure is an artifact of that often necessary practive I have to place my system in hibernate almost daily, sometimes a few times during the day. As an FYI I had a full kernel opps a few 3.2.x versions ago. It was my first one in years. I was hoping there would be a file of the information that displayed on the screen. My research after the kernel opps suggests one has to write down the information on the screen from a kernel opps, which I did not do as I did not think I would need to anymore. The reason I mention this is that kernel opps was with a USB device as well. The difference was it was a USB Wireless BGN device that I have used many times over the last 12 months with a number of 3.2.x kernels with no kernel opps/failure, just odd functional issues that seem to resolve in later kernel versions. The kernel opps that occurred with this Wireless BGN device only occurred once with that exact older 3.2.x kernel version and I have no clue why. I have no information I know of about that kernel opps that might help with this kernel failure. I did not know I needed to write down the screen from the opps. I therefore cannot provide the kernel opps information that might share some common findings with the kernel failure of this issue. I suspect there may be nothing in common, but without the kernel opps information we will not know for certain. The USB device was a MP3 player that acts like a flash USB drive when it is plugged into a computer. This means one can copy to/from, rename, delete files using the command line or any file manager one uses. - Is this *new* meaning is there a kernel where did not happen? I am not sure where the new reference you are referring to is from. That said, the only time this person's MP3 player/USB flash was used was with the kernel.org 3.2.24 kernel I noted. The only other USB problem I had was once with a USB Wireless BGN device that has see alot of activity on my system and had one opps on a 3.2.x kernel prior to 3.2.24 and again only once on that kernel version. Sebastian I know you know
Re[02]: Kernel Failure - 3.4.24
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Sebastian, Message replied to: Date: Sun, 27 Jan 2013 17:14:14 +0100 From: Sebastian Andrzej Siewior To: jlma...@gmail.com Cc: linux-kernel@vger.kernel.org Subject: Re: Kernel Failure - 3.4.24 > On 01/17/2013 12:42 AM, John L. Males wrote: > > Hello, > > Hi, > > > I copied Sebastian in on the post my review of Changelogs > > suggests Sebastian is the one who will want to know about > > this kernel failure or will know who should be. > > I did what? I reviewed patches since they went in this > problem occurs? I was not suggesting you are responsible for the bug at all. On various links I looked at for reporting kernel bugs it is suggested it is best to copy in a kernel person in when posting to the LKML otherwise the posting will likely not receive a response. When I looked through recent ChangeLogs your name was often related to USB issues. As I believed this kernel failure was a USB related I copied you in. I was expecting upon your review of the kernel failure you determined that you knew who best to look at the kernel failure you could then direct the issue to that person. Doing so I would have been be very grateful. I am not a kernel developer and I therefore do not know how to read these failures, let alone know what code failed. I also therefore do not know who this kernel failure should be directed to. I was hoping your higher Linux kernel and LKML knowledge would assist me in ensuring this kernel failure was directed to the correct kernel developer. > > > I am not on the LKML so it would be appreciated if I was CC > > on any related replies to this kernel failure which > > appeared to occur when a USB MP3 player was inserted or > > removed from the HP NC6400 laptop. > > Next time you post logs try to format them like this (or > people might ignore it because it is too hard to read): Thank you for the formatting tip. > > > The kernel failure trace information was: > > > > Kernel failure message 1: [327619.690505] [ cut > > here ] [327619.690518] WARNING: at > > fs/buffer.c:1106 mark_buffer_dirty +0x8b/0xb0() > > [327619.690523] > Hardware name: HP Compaq nc6400 (RM100AW#ABA) > [327619.690526] Modules linked in: nls_utf8 > > nls_cp437 vfat fat usb_storage option usb_wwan usbserial > > snd_hrtimer ip6table_filter ip6_tables iptable_filter > > ip_tables kvm_intel kvm ebtable_nat ebtables x_tables > > cpufreq_userspace cpufreq_stats cpufreq_powersave > > cpufreq_conservative bridge stp bnep rfcomm bluetooth crc16 > > ppdev lp binfmt_misc i915 drm_kms_helper drm i2c_algo_bit > > i2c_core uinput fuse loop snd_hda_codec_si3054 > > snd_hda_codec_analog snd_hda_intel snd_hda_codec > > tpm_infineon arc4 snd_hwdep snd_pcm_oss snd_mixer_oss > > snd_pcm iwl3945 snd_seq_dummy iwlegacy snd_seq_oss usbhid > > snd_seq_midi snd_rawmidi coretemp hid snd_seq_midi_event > > snd_seq mac80211 pcmcia snd_timer irda snd_seq_device > > microcode snd cfg80211 yenta_socket psmouse joydev > > parport_pc tifm_7xx1 tpm_tis soundcore pcmcia_rsrc > > tifm_core pcspkr evdev snd_page_alloc pcmcia_core parport > > tpm crc_ccitt hp_wmi hp_accel lis3lv02d sparse_keymap > > serio_raw acpi_cpufreq tpm_bios rng_core rfkill mperf > > input_polldev wmi battery ac container power_supply > > processor video button ext2 mbcache dm_mod btrfs > > zlib_deflate crc32c libcrc32c sg sr_mod cdrom sd_mod > > crc_t10dif ata_generic pata_acpi uhci_hcd ata_piix libata > > ehci_hcd sdhci_pci scsi_mod ide_pci_generic tg3 ide_core > > sdhci mmc_core libphy usbcore thermal usb_common fan > > thermal_sys > > [last unloaded: scsi_wait_scan] > > [327619.690737] Pid: 31574, comm: sync Tainted: G W > 3.4.24-kernel.org-jlm-010-amd64 #1 > > This line looks like you have custom patches on your tree. I have no custom patches to the kernel. For the last several kernels over about 12 months I download the kernel source directly from kernel.org. The first time I downloaded a kernel from kernel.org I used the kernel configuration GUI and configured the kernel from scratch. Thereafter as was the case with this kernel I use "make oldconfig" using the last kernel configuration I used with the new items added since the last kernel I compiled that "make oldconfig" identifies. > > [327619.690741] Call Trace: > [327619.690751] [] warn_slowpath_common > +0x7f/0xc0 [327619.690757] [] > warn_slowpath_null+0x1a/0x20 [327619.690762] > [] mark_buffer_dirty+0x8b/0xb0 > [327619.690774] [] ext2_sync_super > +0x94/0x100 [ext2] [327619.690784] [] > ext2_sync_fs+0x69/0x80 [ext2] [327619.690790] > [] ? __sync_filesystem+0x90/0x90 > [327619.690795] [
Re[02]: Kernel Failure - 3.4.24
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Sebastian, Message replied to: Date: Sun, 27 Jan 2013 17:14:14 +0100 From: Sebastian Andrzej Siewior bige...@linutronix.de To: jlma...@gmail.com Cc: linux-kernel@vger.kernel.org Subject: Re: Kernel Failure - 3.4.24 On 01/17/2013 12:42 AM, John L. Males wrote: Hello, Hi, I copied Sebastian in on the post my review of Changelogs suggests Sebastian is the one who will want to know about this kernel failure or will know who should be. I did what? I reviewed patches since they went in this problem occurs? I was not suggesting you are responsible for the bug at all. On various links I looked at for reporting kernel bugs it is suggested it is best to copy in a kernel person in when posting to the LKML otherwise the posting will likely not receive a response. When I looked through recent ChangeLogs your name was often related to USB issues. As I believed this kernel failure was a USB related I copied you in. I was expecting upon your review of the kernel failure you determined that you knew who best to look at the kernel failure you could then direct the issue to that person. Doing so I would have been be very grateful. I am not a kernel developer and I therefore do not know how to read these failures, let alone know what code failed. I also therefore do not know who this kernel failure should be directed to. I was hoping your higher Linux kernel and LKML knowledge would assist me in ensuring this kernel failure was directed to the correct kernel developer. I am not on the LKML so it would be appreciated if I was CC on any related replies to this kernel failure which appeared to occur when a USB MP3 player was inserted or removed from the HP NC6400 laptop. Next time you post logs try to format them like this (or people might ignore it because it is too hard to read): Thank you for the formatting tip. The kernel failure trace information was: Kernel failure message 1: [327619.690505] [ cut here ] [327619.690518] WARNING: at fs/buffer.c:1106 mark_buffer_dirty +0x8b/0xb0() [327619.690523] Hardware name: HP Compaq nc6400 (RM100AW#ABA) [327619.690526] Modules linked in: nls_utf8 nls_cp437 vfat fat usb_storage option usb_wwan usbserial snd_hrtimer ip6table_filter ip6_tables iptable_filter ip_tables kvm_intel kvm ebtable_nat ebtables x_tables cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_conservative bridge stp bnep rfcomm bluetooth crc16 ppdev lp binfmt_misc i915 drm_kms_helper drm i2c_algo_bit i2c_core uinput fuse loop snd_hda_codec_si3054 snd_hda_codec_analog snd_hda_intel snd_hda_codec tpm_infineon arc4 snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm iwl3945 snd_seq_dummy iwlegacy snd_seq_oss usbhid snd_seq_midi snd_rawmidi coretemp hid snd_seq_midi_event snd_seq mac80211 pcmcia snd_timer irda snd_seq_device microcode snd cfg80211 yenta_socket psmouse joydev parport_pc tifm_7xx1 tpm_tis soundcore pcmcia_rsrc tifm_core pcspkr evdev snd_page_alloc pcmcia_core parport tpm crc_ccitt hp_wmi hp_accel lis3lv02d sparse_keymap serio_raw acpi_cpufreq tpm_bios rng_core rfkill mperf input_polldev wmi battery ac container power_supply processor video button ext2 mbcache dm_mod btrfs zlib_deflate crc32c libcrc32c sg sr_mod cdrom sd_mod crc_t10dif ata_generic pata_acpi uhci_hcd ata_piix libata ehci_hcd sdhci_pci scsi_mod ide_pci_generic tg3 ide_core sdhci mmc_core libphy usbcore thermal usb_common fan thermal_sys [last unloaded: scsi_wait_scan] [327619.690737] Pid: 31574, comm: sync Tainted: G W 3.4.24-kernel.org-jlm-010-amd64 #1 This line looks like you have custom patches on your tree. I have no custom patches to the kernel. For the last several kernels over about 12 months I download the kernel source directly from kernel.org. The first time I downloaded a kernel from kernel.org I used the kernel configuration GUI and configured the kernel from scratch. Thereafter as was the case with this kernel I use make oldconfig using the last kernel configuration I used with the new items added since the last kernel I compiled that make oldconfig identifies. [327619.690741] Call Trace: [327619.690751] [8105177f] warn_slowpath_common +0x7f/0xc0 [327619.690757] [810517da] warn_slowpath_null+0x1a/0x20 [327619.690762] [811b67fb] mark_buffer_dirty+0x8b/0xb0 [327619.690774] [a028b734] ext2_sync_super +0x94/0x100 [ext2] [327619.690784] [a028b809] ext2_sync_fs+0x69/0x80 [ext2] [327619.690790] [811b4480] ? __sync_filesystem+0x90/0x90 [327619.690795] [811b4453] __sync_filesystem +0x63/0x90 [327619.690801] [811b449f] sync_one_sb +0x1f/0x30 [327619.690807] [81188c77] iterate_supers +0xb7/0xf0 [327619.690812] [811b44fa] sys_sync +0x4a/0x70 [327619.690819] [814c51a9] system_call_fastpath+0x16/0x1b [327619.690942] ---[ end trace 7e4761e5ee97ad0c
Kernel Failure - 3.4.24
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello, I copied Sebastian in on the post my review of Changelogs suggests Sebastian is the one who will want to know about this kernel failure or will know who should be. I am not on the LKML so it would be appreciated if I was CC on any related replies to this kernel failure which appeared to occur when a USB MP3 player was inserted or removed from the HP NC6400 laptop. The kernel failure trace information was: Kernel failure message 1: [327619.690505] [ cut here ] [327619.690518] WARNING: at fs/buffer.c:1106 mark_buffer_dirty +0x8b/0xb0() [327619.690523] Hardware name: HP Compaq nc6400 (RM100AW#ABA) [327619.690526] Modules linked in: nls_utf8 nls_cp437 vfat fat usb_storage option usb_wwan usbserial snd_hrtimer ip6table_filter ip6_tables iptable_filter ip_tables kvm_intel kvm ebtable_nat ebtables x_tables cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_conservative bridge stp bnep rfcomm bluetooth crc16 ppdev lp binfmt_misc i915 drm_kms_helper drm i2c_algo_bit i2c_core uinput fuse loop snd_hda_codec_si3054 snd_hda_codec_analog snd_hda_intel snd_hda_codec tpm_infineon arc4 snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm iwl3945 snd_seq_dummy iwlegacy snd_seq_oss usbhid snd_seq_midi snd_rawmidi coretemp hid snd_seq_midi_event snd_seq mac80211 pcmcia snd_timer irda snd_seq_device microcode snd cfg80211 yenta_socket psmouse joydev parport_pc tifm_7xx1 tpm_tis soundcore pcmcia_rsrc tifm_core pcspkr evdev snd_page_alloc pcmcia_core parport tpm crc_ccitt hp_wmi hp_accel lis3lv02d sparse_keymap serio_raw acpi_cpufreq tpm_bios rng_core rfkill mperf input_polldev wmi battery ac container power_supply processor video button ext2 mbcache dm_mod btrfs zlib_deflate crc32c libcrc32c sg sr_mod cdrom sd_mod crc_t10dif ata_generic pata_acpi uhci_hcd ata_piix libata ehci_hcd sdhci_pci scsi_mod ide_pci_generic tg3 ide_core sdhci mmc_core libphy usbcore thermal usb_common fan thermal_sys [last unloaded: scsi_wait_scan] [327619.690737] Pid: 31574, comm: sync Tainted: GW 3.4.24-kernel.org-jlm-010-amd64 #1 [327619.690741] Call Trace: [327619.690751] [] warn_slowpath_common +0x7f/0xc0 [327619.690757] [] warn_slowpath_null+0x1a/0x20 [327619.690762] [] mark_buffer_dirty+0x8b/0xb0 [327619.690774] [] ext2_sync_super +0x94/0x100 [ext2] [327619.690784] [] ext2_sync_fs+0x69/0x80 [ext2] [327619.690790] [] ? __sync_filesystem+0x90/0x90 [327619.690795] [] __sync_filesystem +0x63/0x90 [327619.690801] [] sync_one_sb +0x1f/0x30 [327619.690807] [] iterate_supers +0xb7/0xf0 [327619.690812] [] sys_sync +0x4a/0x70 [327619.690819] [] system_call_fastpath+0x16/0x1b [327619.690942] ---[ end trace 7e4761e5ee97ad0c ]--- If you need additional information please advise. Regards, John L. Males Toronto, Ontario Canada 16 January 2013 18:42 == 2013-01-16 18:19:55.448036096-0500-EST 16 Jan 18:19:55 ntpdate[17025]: ntpdate 4.2.6p2@1.2194-o Sun Oct 17 13:35:14 UTC 2010 (1) 16 Jan 18:20:09 ntpdate[17030]: step time server 192.75.12.10 offset -3.181109 sec Linux 3.4.24-kernel.org-jlm-010-amd64 #1 SMP PREEMPT Sun Dec 23 10:06:41 EST 2012 Modified Debian GNU/Linux 6.0.3 (squeeze) (Evaluating alternatives to Debian) -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) iEYEARECAAYFAlD3Ot0ACgkQ+V/XUtB6aBBNzwCdFcVnM+qtUoIpArxVlr8fAN/E Se0Anj/z1hTzMmktfTHMeuDQHNj6GMh1 =auyh -END PGP SIGNATURE- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Kernel Failure - 3.4.24
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello, I copied Sebastian in on the post my review of Changelogs suggests Sebastian is the one who will want to know about this kernel failure or will know who should be. I am not on the LKML so it would be appreciated if I was CC on any related replies to this kernel failure which appeared to occur when a USB MP3 player was inserted or removed from the HP NC6400 laptop. The kernel failure trace information was: Kernel failure message 1: [327619.690505] [ cut here ] [327619.690518] WARNING: at fs/buffer.c:1106 mark_buffer_dirty +0x8b/0xb0() [327619.690523] Hardware name: HP Compaq nc6400 (RM100AW#ABA) [327619.690526] Modules linked in: nls_utf8 nls_cp437 vfat fat usb_storage option usb_wwan usbserial snd_hrtimer ip6table_filter ip6_tables iptable_filter ip_tables kvm_intel kvm ebtable_nat ebtables x_tables cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_conservative bridge stp bnep rfcomm bluetooth crc16 ppdev lp binfmt_misc i915 drm_kms_helper drm i2c_algo_bit i2c_core uinput fuse loop snd_hda_codec_si3054 snd_hda_codec_analog snd_hda_intel snd_hda_codec tpm_infineon arc4 snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm iwl3945 snd_seq_dummy iwlegacy snd_seq_oss usbhid snd_seq_midi snd_rawmidi coretemp hid snd_seq_midi_event snd_seq mac80211 pcmcia snd_timer irda snd_seq_device microcode snd cfg80211 yenta_socket psmouse joydev parport_pc tifm_7xx1 tpm_tis soundcore pcmcia_rsrc tifm_core pcspkr evdev snd_page_alloc pcmcia_core parport tpm crc_ccitt hp_wmi hp_accel lis3lv02d sparse_keymap serio_raw acpi_cpufreq tpm_bios rng_core rfkill mperf input_polldev wmi battery ac container power_supply processor video button ext2 mbcache dm_mod btrfs zlib_deflate crc32c libcrc32c sg sr_mod cdrom sd_mod crc_t10dif ata_generic pata_acpi uhci_hcd ata_piix libata ehci_hcd sdhci_pci scsi_mod ide_pci_generic tg3 ide_core sdhci mmc_core libphy usbcore thermal usb_common fan thermal_sys [last unloaded: scsi_wait_scan] [327619.690737] Pid: 31574, comm: sync Tainted: GW 3.4.24-kernel.org-jlm-010-amd64 #1 [327619.690741] Call Trace: [327619.690751] [8105177f] warn_slowpath_common +0x7f/0xc0 [327619.690757] [810517da] warn_slowpath_null+0x1a/0x20 [327619.690762] [811b67fb] mark_buffer_dirty+0x8b/0xb0 [327619.690774] [a028b734] ext2_sync_super +0x94/0x100 [ext2] [327619.690784] [a028b809] ext2_sync_fs+0x69/0x80 [ext2] [327619.690790] [811b4480] ? __sync_filesystem+0x90/0x90 [327619.690795] [811b4453] __sync_filesystem +0x63/0x90 [327619.690801] [811b449f] sync_one_sb +0x1f/0x30 [327619.690807] [81188c77] iterate_supers +0xb7/0xf0 [327619.690812] [811b44fa] sys_sync +0x4a/0x70 [327619.690819] [814c51a9] system_call_fastpath+0x16/0x1b [327619.690942] ---[ end trace 7e4761e5ee97ad0c ]--- If you need additional information please advise. Regards, John L. Males Toronto, Ontario Canada 16 January 2013 18:42 == 2013-01-16 18:19:55.448036096-0500-EST 16 Jan 18:19:55 ntpdate[17025]: ntpdate 4.2.6p2@1.2194-o Sun Oct 17 13:35:14 UTC 2010 (1) 16 Jan 18:20:09 ntpdate[17030]: step time server 192.75.12.10 offset -3.181109 sec Linux 3.4.24-kernel.org-jlm-010-amd64 #1 SMP PREEMPT Sun Dec 23 10:06:41 EST 2012 Modified Debian GNU/Linux 6.0.3 (squeeze) (Evaluating alternatives to Debian) -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) iEYEARECAAYFAlD3Ot0ACgkQ+V/XUtB6aBBNzwCdFcVnM+qtUoIpArxVlr8fAN/E Se0Anj/z1hTzMmktfTHMeuDQHNj6GMh1 =auyh -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problems with SCSI tape rewind / verify on 2.4.29
Andrew/Kai, > List: linux-kernel > Subject:Re: Problems with SCSI tape rewind / verify on 2.4.29 > From: Andrew Morton > Date: 2005-03-02 22:17:11 > Message-ID: <20050302141711.00ec7147.akpm () osdl ! org> > [Download message RAW] > > Kai Makisara <[EMAIL PROTECTED]> wrote: > > > > f seek with tape is changed back to returning success, this would > > enable correct tar --verify at the beginning of the tape. However, > > I am not sure what happens if we are not at the beginning. I will > > investigate this and suggest a long term fix to the tar people (a > > fix that should be compatible with all Unix tape semantics I know) > > and also suggest possible fixes to st (this may include automatic > > writing of a filemark when BSF is used after writes). Kai, I have a second problem that is perhaps another case of kernel and tar combined effect problem. I have not had time to test with the 2.6.7 and 2.6.9 knoppix based kernels to see if same problem as >= 2.4.26 has. Can you hold out about 3-4 days for me to do the test and report the issue to Marcelo to screen first, Kai? I have feeling what I experienced in testing change to st.c Marcelo suggested that caused me to try 2.4.26 again and fail on this new issue has some bearing on tape positioning you want to check out. > > Yes, please let's get a tar fix in the pipeline. > > GNU tar must run on a lot of operating systems. It's odd. > > > If you think want to make st return success for seeks even if > > nothing happens (as it did earlier), I don't have anything against > > that. It would I think it is important if an error is enccountered a non-successful return (code) is returned. If an action is required that requires no action as it is at the place/state/position being requested it is reasonable to return a successful return (code). > > solve the practical problem several people have reported recently. > > (My recommendation for the people seeing this problem is to do > > verification separately with 'tar -d'.) As aside, I have tried the tar -d option as well and it worked, but was my understanding the --verify does a data readback compare of the files in the tar, whereas the tar -d option only compares if files names in tar to directory? That to me means a big difference in confidance the tar backup is ok, as I look to have readback verify to increase confidance of backup success. > > Yes, I think we need to grit our teeth and do this. I'll stick a > comment in there. Regards, John L. Males Willowdale, Ontario Canada 02 March 2005 (17:45 -) 18:51 == "Bmer ... Boom Boom, how are you Boom Boom" "Meow, meoaa" as Boomer loudly announces intent Boomer is coming for attention Loved to kneed arm and lick arm with Boomers very large tongue Able to catch, or at least hit, almost any object in flight withing reach of front paws Boomer 1985 (Born), Adopted 04 September 1991 04 September 1991 - 08 February 2000 18:50 "How are you Mr. Sylvester?" "... Grunt Grunt" ... quick licks of nose Rolls over for pet and stomac rub when Dad arrives home and grunting Runs back and forth from study, tilts head as glowing green eyes stare for "attention please", grunts and meows, repeats run, tilt head and stare few times for good measure, grunts and meows Lays on floor just outside study to guard Dad Loved to groom Miss Mahogany, and let Mahogany cuddle beside Sylvester 1989 (estimated Born) Found in building mail area noon hour 09 Feburary 1992 09 February 1992 - 19 January 2003 23:25 "Hello Miss Chicago 'White Sox', how are you 'Chico'?" "Grunt" (thank you) ... as put out food for Chicago "ME" So loud the world stops A very determined Miss "White Sox" AKA "Chico" ... Cheryl Crawford used as nickname Loved to chase kibble slid down hall floor, bat about and then eat Loved to hook paw in dish to toss out a single kibble at time, dart at as moved, then eat ... "Crunches" Chicago "White Sox", "Chico" August 1989 (born), adopted 04 February 1991 05 October 2004 06:52 Quite "Grunts" as lay Chicago on bed for last time 04 February 1991 - 05 October 2004 07:32 pgp7yXKaQLjXX.pgp Description: PGP signature
Re: Problems with SCSI tape rewind / verify on 2.4.29
Marcelo, My couple cents worth: > On Wed, Mar 02, 2005 at 11:17:19PM +0200, Kai Makisara wrote: > > On Wed, 2 Mar 2005, Marcelo Tosatti wrote: > > > > > On Wed, Mar 02, 2005 at 11:15:42AM -, Mark Yeatman wrote: > > > > Hi > > > > > > > > Never had to log a bug before, hope this is correctly done. > > > > > > > > Thanks > > > > > > > > Mark > > > > > > > > Detail > > > > > > > > [1.] One line summary of the problem: > > > > SCSI tape drive is refusing to rewind after backup to allow > > > > verify and causing illegal seek error > > > > > > > > [2.] Full description of the problem/report: > > > > On backup the tape drive is reporting the following error and > > > > failing it's backups. > > > > > > > > tar: /dev/st0: Warning: Cannot seek: Illegal seek > > > > > > > > I have traced this back to failing at an upgrade of the kernel > > > > to 2.4.29 on Feb 8th. The backups have not worked since. > > > > Replacement Drives have been tried and cables to no avail. I > > > > noticed in the the changelog that a patch by Solar Designer to > > > > the Scsi tape return code had been made. > > > > BTW, this "fix" by Solar Designer introduces a bug to 2.4.29: a > > tape driver is supposed to return ENOMEM in the case that was > > changed to return EIO ;-( > > Reverted. > > > > v2.6 also contains the same problem BTW. > > > > > > Try this: > > > > > > --- a/drivers/scsi/st.c.orig 2005-03-02 09:02:13.637158144 -0300 > > > +++ b/drivers/scsi/st.c 2005-03-02 09:02:20.208159200 -0300 > > > @@ -3778,7 +3778,6 @@ > > > read: st_read, > > > write: st_write, > > > ioctl: st_ioctl, > > > - llseek: no_llseek, > > > open: st_open, > > > flush: st_flush, > > > release:st_release, > > > > This change covers up the problem. The real bug is in tar. The > > following code is from tar is supposed to reposition the tape to > > the beginning of the file jus written: > > > > #ifdef MTIOCTOP > > { > > struct mtop operation; > > int status; > > > > operation.mt_op = MTBSF; > > operation.mt_count = 1; > > if (status = rmtioctl (archive, MTIOCTOP, (char *) > > ), status > > < 0) > > { > > if (errno != EIO > > || (status = rmtioctl (archive, MTIOCTOP, (char *) > > ), > > status < 0)) > > { > > #endif > > if (rmtlseek (archive, (off_t) 0, SEEK_SET) != 0) > > { > > /* Lseek failed. Try a different method. */ > > seek_warn (archive_name_array[0]); > > return; > > } > > #ifdef MTIOCTOP > > } > > } > > } > > #endif > > > > > > Here is output from strace showing what happens with 'tar -c -W' > > applied at the beginning of the tape (this is using kernel > > 2.6.11-rc4 but the same probably happens with 2.4.29): > > ... > > ioctl(3, MGSL_IOCGPARAMS or MTIOCTOP or SNDCTL_MIDI_MPUMODE, > > 0x7fffecd0) = -1 EIO (Input/output error) > > ioctl(3, MGSL_IOCGPARAMS or MTIOCTOP or SNDCTL_MIDI_MPUMODE, > > 0x7fffecd0) = -1 EIO (Input/output error) > > lseek(3, 0, SEEK_SET) = -1 ESPIPE (Illegal seek) > > > > So, both tape positioning commands fail and the code falls back to > > lseek. Earlier it has returned success even though it has not done > > anything (this was on purpose because it is the way some other > > Unices behave and with reason). In that case this tar succeeded > > but it was pure luck. The first BSF did position the tape > > correctly although it did fail. > > > > The 2.6 st driver does contain this near the beginning of > > st_open(): > > > > nonseekable_open(inode, filp); > > > > This probably makes lseek fail. This code has been in st.c since > > 2.6.8. > > Thanks for the cluebat Kai, is this problem fixed in newer versions > of tar? My testing last week or so has been with the latest tar, tar-1.15.1-2, tar 1.14 and 1.13 had same lseek --verify issues. > > I suspect v2.4 should work with older versions of tar, so we should > keep "lseek&quo
Re: Problems with SCSI tape rewind / verify on 2.4.29
Sorry gents, Let me correct this one more time. Regards, John L. Males Willowdale, Ontario Canada 02 March 2005 16:26 ** Reply Seperator ** On (Wed) 2005-03-02 16:15:07 -0500 John L. Males wrote in Message-ID: [EMAIL PROTECTED] To: [EMAIL PROTECTED] From: John L. Males <[EMAIL PROTECTED]> Subject: Re: Re[03]: Problems with SCSI tape rewind / verify on 2.4.29 Date: Wed, 2 Mar 2005 16:15:07 -0500 > Marcelo, > > Sorry gents, seems the LKML used to handel the RE numbering in long > past when I last mailed to LKML, bit not now, so resending this > eMail to ensure goes back to orignal thread so all the eMail > discussion is in one eMail thread. > > My applogies if this caused confusion on the LKML. > > > Regards, > > John L. Males > Willowdale, Ontario > Canada > 02 March 2005 (16:06 -) 16:15 > > > ** Reply Seperator ** > > On (Wed) 2005-03-02 15:46:26 -0500 > John L. Males wrote in Message-ID: > [EMAIL PROTECTED] > > To: Marcelo Tosatti <[EMAIL PROTECTED]> > From: John L. Males <[EMAIL PROTECTED]> > Subject: Re[03]: Problems with SCSI tape rewind / verify on 2.4.29 > Date: Wed, 2 Mar 2005 15:46:26 -0500 > > > Hi Marcello, > > > > > > ** Reply Seperator ** > > > > On (Wed) 2005-03-02 11:34:41 -0300 > > Marcelo Tosatti wrote in Message-ID: > > [EMAIL PROTECTED] > > > > To: Gene Heskett <[EMAIL PROTECTED]> > > From: Marcelo Tosatti <[EMAIL PROTECTED]> > > Subject: Re: Problems with SCSI tape rewind / verify on 2.4.29 > > Date: Wed, 2 Mar 2005 11:34:41 -0300 > > > > > > > > n Wed, Mar 02, 2005 at 12:08:51PM -0500, Gene Heskett wrote: > > > > On Wednesday 02 March 2005 07:03, Marcelo Tosatti wrote: > > > > >On Wed, Mar 02, 2005 at 11:15:42AM -, Mark Yeatman wrote: > > > > >> Hi > > > > >> > > > > >> Never had to log a bug before, hope this is correctly done. > > > > >> > > > > >> Thanks > > > > >> > > > > >> Mark > > > > >> > > > > >> Detail > > > > >> > > > > >> [1.] One line summary of the problem: > > > > >> SCSI tape drive is refusing to rewind after backup to allow > > > > >verify> and causing illegal seek error > > > > In my experiences with this problem that I am sure is exactly the > > same issue, the tape in fact does rewind after creating the tar to > > then perfore the --verify option. The illegal see error seems to > > arise after the rewind or more correctly bsf commands may be being > > used (but not confirmed this yet, another test I need to do in few > > days). For sure the tape is positionted back. I know as I have > > also done this test with larger directories and know and have > > heard the tape take long time to naturally get back to the start > > of tar file to be able to perform the --verify option. That said > > I am using a DLT drive. Perhpas with different drivers or tape > > drivers this issue may have variations on theme and behaviour with > > net result the same error message and/or root cause. > > > > > > >> > > > > >> [2.] Full description of the problem/report: > > > > >> On backup the tape drive is reporting the following error > > > > >and> failing it's backups. > > > > >> > > > > >> tar: /dev/st0: Warning: Cannot seek: Illegal seek > > > > >> > > > > >> I have traced this back to failing at an upgrade of the > > > > >kernel to> 2.4.29 on Feb 8th. The backups have not worked > > > > >since. Replacement> Drives have been tried and cables to no > > > > >avail. I noticed in the> the changelog that a patch by Solar > > > > >Designer to the Scsi tape> return code had been made. > > > > Last kernel to work correctly in 2.4 branch was 2.4.26. Kernel > > versions 2.4.27, 2.4.28 and 2.4.29 all fail based on my experience > > with DLT SCSI based tape. > > > > > > > > > > > >v2.6 also contains the same problem BTW. > > > > > > > > > >Try this: > > > > > > > > > >--- a/drivers/scsi/st.c.orig 2005-03-02 09:02:13.637158144 > > > > >-0300+++ b/drivers/scsi/st.c 2005-03-02 09:02:20.208159200 > > > > >-0300@@ -3778,7 +3778,6 @@ > > > > > read: st_read, > > >
Re: Re[03]: Problems with SCSI tape rewind / verify on 2.4.29
Marcelo, Sorry gents, seems the LKML used to handel the RE numbering in long past when I last mailed to LKML, bit not now, so resending this eMail to ensure goes back to orignal thread so all the eMail discussion is in one eMail thread. My applogies if this caused confusion on the LKML. Regards, John L. Males Willowdale, Ontario Canada 02 March 2005 (16:06 -) 16:15 ** Reply Seperator ** On (Wed) 2005-03-02 15:46:26 -0500 John L. Males wrote in Message-ID: [EMAIL PROTECTED] To: Marcelo Tosatti <[EMAIL PROTECTED]> From: John L. Males <[EMAIL PROTECTED]> Subject: Re[03]: Problems with SCSI tape rewind / verify on 2.4.29 Date: Wed, 2 Mar 2005 15:46:26 -0500 > Hi Marcello, > > > ** Reply Seperator ** > > On (Wed) 2005-03-02 11:34:41 -0300 > Marcelo Tosatti wrote in Message-ID: > [EMAIL PROTECTED] > > To: Gene Heskett <[EMAIL PROTECTED]> > From: Marcelo Tosatti <[EMAIL PROTECTED]> > Subject: Re: Problems with SCSI tape rewind / verify on 2.4.29 > Date: Wed, 2 Mar 2005 11:34:41 -0300 > > > > > n Wed, Mar 02, 2005 at 12:08:51PM -0500, Gene Heskett wrote: > > > On Wednesday 02 March 2005 07:03, Marcelo Tosatti wrote: > > > >On Wed, Mar 02, 2005 at 11:15:42AM -, Mark Yeatman wrote: > > > >> Hi > > > >> > > > >> Never had to log a bug before, hope this is correctly done. > > > >> > > > >> Thanks > > > >> > > > >> Mark > > > >> > > > >> Detail > > > >> > > > >> [1.] One line summary of the problem: > > > >> SCSI tape drive is refusing to rewind after backup to allow > > > >verify> and causing illegal seek error > > In my experiences with this problem that I am sure is exactly the > same issue, the tape in fact does rewind after creating the tar to > then perfore the --verify option. The illegal see error seems to > arise after the rewind or more correctly bsf commands may be being > used (but not confirmed this yet, another test I need to do in few > days). For sure the tape is positionted back. I know as I have > also done this test with larger directories and know and have heard > the tape take long time to naturally get back to the start of tar > file to be able to perform the --verify option. That said I am > using a DLT drive. Perhpas with different drivers or tape drivers > this issue may have variations on theme and behaviour with net > result the same error message and/or root cause. > > > > >> > > > >> [2.] Full description of the problem/report: > > > >> On backup the tape drive is reporting the following error and > > > >> failing it's backups. > > > >> > > > >> tar: /dev/st0: Warning: Cannot seek: Illegal seek > > > >> > > > >> I have traced this back to failing at an upgrade of the > > > >kernel to> 2.4.29 on Feb 8th. The backups have not worked > > > >since. Replacement> Drives have been tried and cables to no > > > >avail. I noticed in the> the changelog that a patch by Solar > > > >Designer to the Scsi tape> return code had been made. > > Last kernel to work correctly in 2.4 branch was 2.4.26. Kernel > versions 2.4.27, 2.4.28 and 2.4.29 all fail based on my experience > with DLT SCSI based tape. > > > > > > > > >v2.6 also contains the same problem BTW. > > > > > > > >Try this: > > > > > > > >--- a/drivers/scsi/st.c.orig 2005-03-02 09:02:13.637158144 > > > >-0300+++ b/drivers/scsi/st.c 2005-03-02 09:02:20.208159200 > > > >-0300@@ -3778,7 +3778,6 @@ > > > > read: st_read, > > > > write: st_write, > > > > ioctl: st_ioctl, > > > >- llseek: no_llseek, > > > > open: st_open, > > > > flush: st_flush, > > > > release: st_release, > > > >- > > > > > > Interesting Marcelo. How long has this been true in 2.6? > > In the 2.6 tree the tar --verify works with 2.6.7, but fails with > 2.6.9. I am unable to test 2.6.8, but based on research of the code > changes of 2.6.8 compared to the changes made in 2.4.27 re llseek I > would expect 2.6.8 to fail as well with my DLT SCSI tape. > > > > > Actually I just checked and it seems v2.6 is not using > > "no_llseek". > > > > However John L. Males reports the same problem with v2.6 - John, > > care to retest with v2.6.10 ? > > My ability to test a 2.6.x kernel is limited to what 2.6.x kerne
Re[03]: Problems with SCSI tape rewind / verify on 2.4.29
Hi Marcello, ** Reply Seperator ** On (Wed) 2005-03-02 11:34:41 -0300 Marcelo Tosatti wrote in Message-ID: [EMAIL PROTECTED] To: Gene Heskett <[EMAIL PROTECTED]> From: Marcelo Tosatti <[EMAIL PROTECTED]> Subject: Re: Problems with SCSI tape rewind / verify on 2.4.29 Date: Wed, 2 Mar 2005 11:34:41 -0300 > > n Wed, Mar 02, 2005 at 12:08:51PM -0500, Gene Heskett wrote: > > On Wednesday 02 March 2005 07:03, Marcelo Tosatti wrote: > > >On Wed, Mar 02, 2005 at 11:15:42AM -, Mark Yeatman wrote: > > >> Hi > > >> > > >> Never had to log a bug before, hope this is correctly done. > > >> > > >> Thanks > > >> > > >> Mark > > >> > > >> Detail > > >> > > >> [1.] One line summary of the problem: > > >> SCSI tape drive is refusing to rewind after backup to allow > > >verify> and causing illegal seek error In my experiences with this problem that I am sure is exactly the same issue, the tape in fact does rewind after creating the tar to then perfore the --verify option. The illegal see error seems to arise after the rewind or more correctly bsf commands may be being used (but not confirmed this yet, another test I need to do in few days). For sure the tape is positionted back. I know as I have also done this test with larger directories and know and have heard the tape take long time to naturally get back to the start of tar file to be able to perform the --verify option. That said I am using a DLT drive. Perhpas with different drivers or tape drivers this issue may have variations on theme and behaviour with net result the same error message and/or root cause. > > >> > > >> [2.] Full description of the problem/report: > > >> On backup the tape drive is reporting the following error and > > >> failing it's backups. > > >> > > >> tar: /dev/st0: Warning: Cannot seek: Illegal seek > > >> > > >> I have traced this back to failing at an upgrade of the kernel > > >to> 2.4.29 on Feb 8th. The backups have not worked since. > > >Replacement> Drives have been tried and cables to no avail. I > > >noticed in the> the changelog that a patch by Solar Designer to > > >the Scsi tape> return code had been made. Last kernel to work correctly in 2.4 branch was 2.4.26. Kernel versions 2.4.27, 2.4.28 and 2.4.29 all fail based on my experience with DLT SCSI based tape. > > > > > >v2.6 also contains the same problem BTW. > > > > > >Try this: > > > > > >--- a/drivers/scsi/st.c.orig 2005-03-02 09:02:13.637158144 -0300 > > >+++ b/drivers/scsi/st.c 2005-03-02 09:02:20.208159200 -0300 > > >@@ -3778,7 +3778,6 @@ > > > read: st_read, > > > write: st_write, > > > ioctl: st_ioctl, > > >- llseek: no_llseek, > > > open: st_open, > > > flush: st_flush, > > > release: st_release, > > >- > > > > Interesting Marcelo. How long has this been true in 2.6? In the 2.6 tree the tar --verify works with 2.6.7, but fails with 2.6.9. I am unable to test 2.6.8, but based on research of the code changes of 2.6.8 compared to the changes made in 2.4.27 re llseek I would expect 2.6.8 to fail as well with my DLT SCSI tape. > > Actually I just checked and it seems v2.6 is not using "no_llseek". > > However John L. Males reports the same problem with v2.6 - John, > care to retest with v2.6.10 ? My ability to test a 2.6.x kernel is limited to what 2.6.x kernel I can find on a livecd. The 2.6.7 and 2.6.9 kernel tests I conducted were using Knoppix 3.6 and 3.7. I do not have means at this time, nor time, to build up a dedicated drive to test 2.6.x kernels. If someone knows of or can build a 2.6.10 kernel on a live CD I will be happy to do the test. That said, I looked at the patch for 2.6.10 and seems alot of changes were made to st.c in 2.6.10. I did not see, but could of missed in looking, any lseek related change in 2.6.10. Given how it seems the test I ran with the change in st.c Marcell suggested what is the expected thought on this issue with 2.6.10? I am just asking from curiousity. Again, if someone can tell me of a live cd or can easly make a live cd with the 2.6.10 kernel I can test this issue wiht 2.6.10. Perhaps there is someone else with a DLT/SCSI tape driver that could test this tar --verify issue on 2.6.10? > > > I thought I had an amanda problem, and eventually went to virtual > > tapes on disk, largely because of this. However, I have to say it > > is working better than tapes ever did here. Unforch, that 200GB > > disk is certainly a single point of failu
Re[03]: Problems with SCSI tape rewind / verify on 2.4.29
Hi Marcello, ** Reply Seperator ** On (Wed) 2005-03-02 11:34:41 -0300 Marcelo Tosatti wrote in Message-ID: [EMAIL PROTECTED] To: Gene Heskett [EMAIL PROTECTED] From: Marcelo Tosatti [EMAIL PROTECTED] Subject: Re: Problems with SCSI tape rewind / verify on 2.4.29 Date: Wed, 2 Mar 2005 11:34:41 -0300 n Wed, Mar 02, 2005 at 12:08:51PM -0500, Gene Heskett wrote: On Wednesday 02 March 2005 07:03, Marcelo Tosatti wrote: On Wed, Mar 02, 2005 at 11:15:42AM -, Mark Yeatman wrote: Hi Never had to log a bug before, hope this is correctly done. Thanks Mark Detail [1.] One line summary of the problem: SCSI tape drive is refusing to rewind after backup to allow verify and causing illegal seek error In my experiences with this problem that I am sure is exactly the same issue, the tape in fact does rewind after creating the tar to then perfore the --verify option. The illegal see error seems to arise after the rewind or more correctly bsf commands may be being used (but not confirmed this yet, another test I need to do in few days). For sure the tape is positionted back. I know as I have also done this test with larger directories and know and have heard the tape take long time to naturally get back to the start of tar file to be able to perform the --verify option. That said I am using a DLT drive. Perhpas with different drivers or tape drivers this issue may have variations on theme and behaviour with net result the same error message and/or root cause. [2.] Full description of the problem/report: On backup the tape drive is reporting the following error and failing it's backups. tar: /dev/st0: Warning: Cannot seek: Illegal seek I have traced this back to failing at an upgrade of the kernel to 2.4.29 on Feb 8th. The backups have not worked since. Replacement Drives have been tried and cables to no avail. I noticed in the the changelog that a patch by Solar Designer to the Scsi tape return code had been made. Last kernel to work correctly in 2.4 branch was 2.4.26. Kernel versions 2.4.27, 2.4.28 and 2.4.29 all fail based on my experience with DLT SCSI based tape. v2.6 also contains the same problem BTW. Try this: --- a/drivers/scsi/st.c.orig 2005-03-02 09:02:13.637158144 -0300 +++ b/drivers/scsi/st.c 2005-03-02 09:02:20.208159200 -0300 @@ -3778,7 +3778,6 @@ read: st_read, write: st_write, ioctl: st_ioctl, - llseek: no_llseek, open: st_open, flush: st_flush, release: st_release, - Interesting Marcelo. How long has this been true in 2.6? In the 2.6 tree the tar --verify works with 2.6.7, but fails with 2.6.9. I am unable to test 2.6.8, but based on research of the code changes of 2.6.8 compared to the changes made in 2.4.27 re llseek I would expect 2.6.8 to fail as well with my DLT SCSI tape. Actually I just checked and it seems v2.6 is not using no_llseek. However John L. Males reports the same problem with v2.6 - John, care to retest with v2.6.10 ? My ability to test a 2.6.x kernel is limited to what 2.6.x kernel I can find on a livecd. The 2.6.7 and 2.6.9 kernel tests I conducted were using Knoppix 3.6 and 3.7. I do not have means at this time, nor time, to build up a dedicated drive to test 2.6.x kernels. If someone knows of or can build a 2.6.10 kernel on a live CD I will be happy to do the test. That said, I looked at the patch for 2.6.10 and seems alot of changes were made to st.c in 2.6.10. I did not see, but could of missed in looking, any lseek related change in 2.6.10. Given how it seems the test I ran with the change in st.c Marcell suggested what is the expected thought on this issue with 2.6.10? I am just asking from curiousity. Again, if someone can tell me of a live cd or can easly make a live cd with the 2.6.10 kernel I can test this issue wiht 2.6.10. Perhaps there is someone else with a DLT/SCSI tape driver that could test this tar --verify issue on 2.6.10? I thought I had an amanda problem, and eventually went to virtual tapes on disk, largely because of this. However, I have to say it is working better than tapes ever did here. Unforch, that 200GB disk is certainly a single point of failure I don't relish thinking about... :) Regards, John L. Males Willowdale, Ontario Canada 02 March 2005 (15:00 -) 15:46 == Bmer ... Boom Boom, how are you Boom Boom Meow, meoaa as Boomer loudly announces intent Boomer is coming for attention Loved to kneed arm and lick arm with Boomers very large tongue Able to catch, or at least hit, almost any object in flight withing reach of front paws Boomer 1985 (Born), Adopted 04 September 1991 04 September 1991 - 08 February 2000 18:50 How are you Mr. Sylvester? ... Grunt Grunt ... quick licks of nose Rolls over for pet and stomac rub when Dad arrives home
Re: Re[03]: Problems with SCSI tape rewind / verify on 2.4.29
Marcelo, Sorry gents, seems the LKML used to handel the RE numbering in long past when I last mailed to LKML, bit not now, so resending this eMail to ensure goes back to orignal thread so all the eMail discussion is in one eMail thread. My applogies if this caused confusion on the LKML. Regards, John L. Males Willowdale, Ontario Canada 02 March 2005 (16:06 -) 16:15 ** Reply Seperator ** On (Wed) 2005-03-02 15:46:26 -0500 John L. Males wrote in Message-ID: [EMAIL PROTECTED] To: Marcelo Tosatti [EMAIL PROTECTED] From: John L. Males [EMAIL PROTECTED] Subject: Re[03]: Problems with SCSI tape rewind / verify on 2.4.29 Date: Wed, 2 Mar 2005 15:46:26 -0500 Hi Marcello, ** Reply Seperator ** On (Wed) 2005-03-02 11:34:41 -0300 Marcelo Tosatti wrote in Message-ID: [EMAIL PROTECTED] To: Gene Heskett [EMAIL PROTECTED] From: Marcelo Tosatti [EMAIL PROTECTED] Subject: Re: Problems with SCSI tape rewind / verify on 2.4.29 Date: Wed, 2 Mar 2005 11:34:41 -0300 n Wed, Mar 02, 2005 at 12:08:51PM -0500, Gene Heskett wrote: On Wednesday 02 March 2005 07:03, Marcelo Tosatti wrote: On Wed, Mar 02, 2005 at 11:15:42AM -, Mark Yeatman wrote: Hi Never had to log a bug before, hope this is correctly done. Thanks Mark Detail [1.] One line summary of the problem: SCSI tape drive is refusing to rewind after backup to allow verify and causing illegal seek error In my experiences with this problem that I am sure is exactly the same issue, the tape in fact does rewind after creating the tar to then perfore the --verify option. The illegal see error seems to arise after the rewind or more correctly bsf commands may be being used (but not confirmed this yet, another test I need to do in few days). For sure the tape is positionted back. I know as I have also done this test with larger directories and know and have heard the tape take long time to naturally get back to the start of tar file to be able to perform the --verify option. That said I am using a DLT drive. Perhpas with different drivers or tape drivers this issue may have variations on theme and behaviour with net result the same error message and/or root cause. [2.] Full description of the problem/report: On backup the tape drive is reporting the following error and failing it's backups. tar: /dev/st0: Warning: Cannot seek: Illegal seek I have traced this back to failing at an upgrade of the kernel to 2.4.29 on Feb 8th. The backups have not worked since. Replacement Drives have been tried and cables to no avail. I noticed in the the changelog that a patch by Solar Designer to the Scsi tape return code had been made. Last kernel to work correctly in 2.4 branch was 2.4.26. Kernel versions 2.4.27, 2.4.28 and 2.4.29 all fail based on my experience with DLT SCSI based tape. v2.6 also contains the same problem BTW. Try this: --- a/drivers/scsi/st.c.orig 2005-03-02 09:02:13.637158144 -0300+++ b/drivers/scsi/st.c 2005-03-02 09:02:20.208159200 -0300@@ -3778,7 +3778,6 @@ read: st_read, write: st_write, ioctl: st_ioctl, - llseek: no_llseek, open: st_open, flush: st_flush, release: st_release, - Interesting Marcelo. How long has this been true in 2.6? In the 2.6 tree the tar --verify works with 2.6.7, but fails with 2.6.9. I am unable to test 2.6.8, but based on research of the code changes of 2.6.8 compared to the changes made in 2.4.27 re llseek I would expect 2.6.8 to fail as well with my DLT SCSI tape. Actually I just checked and it seems v2.6 is not using no_llseek. However John L. Males reports the same problem with v2.6 - John, care to retest with v2.6.10 ? My ability to test a 2.6.x kernel is limited to what 2.6.x kernel I can find on a livecd. The 2.6.7 and 2.6.9 kernel tests I conducted were using Knoppix 3.6 and 3.7. I do not have means at this time, nor time, to build up a dedicated drive to test 2.6.x kernels. If someone knows of or can build a 2.6.10 kernel on a live CD I will be happy to do the test. That said, I looked at the patch for 2.6.10 and seems alot of changes were made to st.c in 2.6.10. I did not see, but could of missed in looking, any lseek related change in 2.6.10. Given how it seems the test I ran with the change in st.c Marcell suggested what is the expected thought on this issue with 2.6.10? I am just asking from curiousity. Again, if someone can tell me of a live cd or can easly make a live cd with the 2.6.10 kernel I can test this issue wiht 2.6.10. Perhaps there is someone else with a DLT/SCSI tape driver that could test this tar --verify issue on 2.6.10? I thought I had an amanda problem, and eventually went to virtual tapes on disk, largely because of this. However, I have to say it is working better than tapes ever
Re: Problems with SCSI tape rewind / verify on 2.4.29
Sorry gents, Let me correct this one more time. Regards, John L. Males Willowdale, Ontario Canada 02 March 2005 16:26 ** Reply Seperator ** On (Wed) 2005-03-02 16:15:07 -0500 John L. Males wrote in Message-ID: [EMAIL PROTECTED] To: [EMAIL PROTECTED] From: John L. Males [EMAIL PROTECTED] Subject: Re: Re[03]: Problems with SCSI tape rewind / verify on 2.4.29 Date: Wed, 2 Mar 2005 16:15:07 -0500 Marcelo, Sorry gents, seems the LKML used to handel the RE numbering in long past when I last mailed to LKML, bit not now, so resending this eMail to ensure goes back to orignal thread so all the eMail discussion is in one eMail thread. My applogies if this caused confusion on the LKML. Regards, John L. Males Willowdale, Ontario Canada 02 March 2005 (16:06 -) 16:15 ** Reply Seperator ** On (Wed) 2005-03-02 15:46:26 -0500 John L. Males wrote in Message-ID: [EMAIL PROTECTED] To: Marcelo Tosatti [EMAIL PROTECTED] From: John L. Males [EMAIL PROTECTED] Subject: Re[03]: Problems with SCSI tape rewind / verify on 2.4.29 Date: Wed, 2 Mar 2005 15:46:26 -0500 Hi Marcello, ** Reply Seperator ** On (Wed) 2005-03-02 11:34:41 -0300 Marcelo Tosatti wrote in Message-ID: [EMAIL PROTECTED] To: Gene Heskett [EMAIL PROTECTED] From: Marcelo Tosatti [EMAIL PROTECTED] Subject: Re: Problems with SCSI tape rewind / verify on 2.4.29 Date: Wed, 2 Mar 2005 11:34:41 -0300 n Wed, Mar 02, 2005 at 12:08:51PM -0500, Gene Heskett wrote: On Wednesday 02 March 2005 07:03, Marcelo Tosatti wrote: On Wed, Mar 02, 2005 at 11:15:42AM -, Mark Yeatman wrote: Hi Never had to log a bug before, hope this is correctly done. Thanks Mark Detail [1.] One line summary of the problem: SCSI tape drive is refusing to rewind after backup to allow verify and causing illegal seek error In my experiences with this problem that I am sure is exactly the same issue, the tape in fact does rewind after creating the tar to then perfore the --verify option. The illegal see error seems to arise after the rewind or more correctly bsf commands may be being used (but not confirmed this yet, another test I need to do in few days). For sure the tape is positionted back. I know as I have also done this test with larger directories and know and have heard the tape take long time to naturally get back to the start of tar file to be able to perform the --verify option. That said I am using a DLT drive. Perhpas with different drivers or tape drivers this issue may have variations on theme and behaviour with net result the same error message and/or root cause. [2.] Full description of the problem/report: On backup the tape drive is reporting the following error and failing it's backups. tar: /dev/st0: Warning: Cannot seek: Illegal seek I have traced this back to failing at an upgrade of the kernel to 2.4.29 on Feb 8th. The backups have not worked since. Replacement Drives have been tried and cables to no avail. I noticed in the the changelog that a patch by Solar Designer to the Scsi tape return code had been made. Last kernel to work correctly in 2.4 branch was 2.4.26. Kernel versions 2.4.27, 2.4.28 and 2.4.29 all fail based on my experience with DLT SCSI based tape. v2.6 also contains the same problem BTW. Try this: --- a/drivers/scsi/st.c.orig 2005-03-02 09:02:13.637158144 -0300+++ b/drivers/scsi/st.c 2005-03-02 09:02:20.208159200 -0300@@ -3778,7 +3778,6 @@ read: st_read, write: st_write, ioctl: st_ioctl, - llseek: no_llseek, open: st_open, flush: st_flush, release: st_release, - Interesting Marcelo. How long has this been true in 2.6? In the 2.6 tree the tar --verify works with 2.6.7, but fails with 2.6.9. I am unable to test 2.6.8, but based on research of the code changes of 2.6.8 compared to the changes made in 2.4.27 re llseek I would expect 2.6.8 to fail as well with my DLT SCSI tape. Actually I just checked and it seems v2.6 is not using no_llseek. However John L. Males reports the same problem with v2.6 - John, care to retest with v2.6.10 ? My ability to test a 2.6.x kernel is limited to what 2.6.x kernel I can find on a livecd. The 2.6.7 and 2.6.9 kernel tests I conducted were using Knoppix 3.6 and 3.7. I do not have means at this time, nor time, to build up a dedicated drive to test 2.6.x kernels. If someone knows of or can build a 2.6.10 kernel on a live CD I will be happy to do the test. That said, I looked at the patch for 2.6.10 and seems alot of changes were made to st.c in 2.6.10. I did not see, but could of missed in looking, any lseek related change in 2.6.10. Given how it seems the test I ran
Re: Problems with SCSI tape rewind / verify on 2.4.29
Marcelo, My couple cents worth: On Wed, Mar 02, 2005 at 11:17:19PM +0200, Kai Makisara wrote: On Wed, 2 Mar 2005, Marcelo Tosatti wrote: On Wed, Mar 02, 2005 at 11:15:42AM -, Mark Yeatman wrote: Hi Never had to log a bug before, hope this is correctly done. Thanks Mark Detail [1.] One line summary of the problem: SCSI tape drive is refusing to rewind after backup to allow verify and causing illegal seek error [2.] Full description of the problem/report: On backup the tape drive is reporting the following error and failing it's backups. tar: /dev/st0: Warning: Cannot seek: Illegal seek I have traced this back to failing at an upgrade of the kernel to 2.4.29 on Feb 8th. The backups have not worked since. Replacement Drives have been tried and cables to no avail. I noticed in the the changelog that a patch by Solar Designer to the Scsi tape return code had been made. BTW, this fix by Solar Designer introduces a bug to 2.4.29: a tape driver is supposed to return ENOMEM in the case that was changed to return EIO ;-( Reverted. v2.6 also contains the same problem BTW. Try this: --- a/drivers/scsi/st.c.orig 2005-03-02 09:02:13.637158144 -0300 +++ b/drivers/scsi/st.c 2005-03-02 09:02:20.208159200 -0300 @@ -3778,7 +3778,6 @@ read: st_read, write: st_write, ioctl: st_ioctl, - llseek: no_llseek, open: st_open, flush: st_flush, release:st_release, This change covers up the problem. The real bug is in tar. The following code is from tar is supposed to reposition the tape to the beginning of the file jus written: #ifdef MTIOCTOP { struct mtop operation; int status; operation.mt_op = MTBSF; operation.mt_count = 1; if (status = rmtioctl (archive, MTIOCTOP, (char *) operation), status 0) { if (errno != EIO || (status = rmtioctl (archive, MTIOCTOP, (char *) operation), status 0)) { #endif if (rmtlseek (archive, (off_t) 0, SEEK_SET) != 0) { /* Lseek failed. Try a different method. */ seek_warn (archive_name_array[0]); return; } #ifdef MTIOCTOP } } } #endif Here is output from strace showing what happens with 'tar -c -W' applied at the beginning of the tape (this is using kernel 2.6.11-rc4 but the same probably happens with 2.4.29): ... ioctl(3, MGSL_IOCGPARAMS or MTIOCTOP or SNDCTL_MIDI_MPUMODE, 0x7fffecd0) = -1 EIO (Input/output error) ioctl(3, MGSL_IOCGPARAMS or MTIOCTOP or SNDCTL_MIDI_MPUMODE, 0x7fffecd0) = -1 EIO (Input/output error) lseek(3, 0, SEEK_SET) = -1 ESPIPE (Illegal seek) So, both tape positioning commands fail and the code falls back to lseek. Earlier it has returned success even though it has not done anything (this was on purpose because it is the way some other Unices behave and with reason). In that case this tar succeeded but it was pure luck. The first BSF did position the tape correctly although it did fail. The 2.6 st driver does contain this near the beginning of st_open(): nonseekable_open(inode, filp); This probably makes lseek fail. This code has been in st.c since 2.6.8. Thanks for the cluebat Kai, is this problem fixed in newer versions of tar? My testing last week or so has been with the latest tar, tar-1.15.1-2, tar 1.14 and 1.13 had same lseek --verify issues. I suspect v2.4 should work with older versions of tar, so we should keep lseek working to make it happy. What is your opinion? I would agree in the lseek sense. I feel that if there is some bad behaviour in tar it should be reported to the tar folks to be fixed so long term things are done correctly and over time the kernel worarounds can be depreciated. Regards John L. Males Willowdale, Ontario Canada 02 March 2005 (17:04 -) 17:25 == Bmer ... Boom Boom, how are you Boom Boom Meow, meoaa as Boomer loudly announces intent Boomer is coming for attention Loved to kneed arm and lick arm with Boomers very large tongue Able to catch, or at least hit, almost any object in flight withing reach of front paws Boomer 1985 (Born), Adopted 04 September 1991 04 September 1991 - 08 February 2000 18:50 How are you Mr. Sylvester? ... Grunt Grunt ... quick licks of nose Rolls over for pet and stomac rub when Dad arrives home and grunting Runs back and forth from study, tilts head as glowing green eyes stare for attention please, grunts and meows, repeats run, tilt head and stare few times for good
Re: Problems with SCSI tape rewind / verify on 2.4.29
Andrew/Kai, List: linux-kernel Subject:Re: Problems with SCSI tape rewind / verify on 2.4.29 From: Andrew Morton akpm () osdl ! org Date: 2005-03-02 22:17:11 Message-ID: 20050302141711.00ec7147.akpm () osdl ! org [Download message RAW] Kai Makisara [EMAIL PROTECTED] wrote: f seek with tape is changed back to returning success, this would enable correct tar --verify at the beginning of the tape. However, I am not sure what happens if we are not at the beginning. I will investigate this and suggest a long term fix to the tar people (a fix that should be compatible with all Unix tape semantics I know) and also suggest possible fixes to st (this may include automatic writing of a filemark when BSF is used after writes). Kai, I have a second problem that is perhaps another case of kernel and tar combined effect problem. I have not had time to test with the 2.6.7 and 2.6.9 knoppix based kernels to see if same problem as = 2.4.26 has. Can you hold out about 3-4 days for me to do the test and report the issue to Marcelo to screen first, Kai? I have feeling what I experienced in testing change to st.c Marcelo suggested that caused me to try 2.4.26 again and fail on this new issue has some bearing on tape positioning you want to check out. Yes, please let's get a tar fix in the pipeline. GNU tar must run on a lot of operating systems. It's odd. If you think want to make st return success for seeks even if nothing happens (as it did earlier), I don't have anything against that. It would I think it is important if an error is enccountered a non-successful return (code) is returned. If an action is required that requires no action as it is at the place/state/position being requested it is reasonable to return a successful return (code). solve the practical problem several people have reported recently. (My recommendation for the people seeing this problem is to do verification separately with 'tar -d'.) As aside, I have tried the tar -d option as well and it worked, but was my understanding the --verify does a data readback compare of the files in the tar, whereas the tar -d option only compares if files names in tar to directory? That to me means a big difference in confidance the tar backup is ok, as I look to have readback verify to increase confidance of backup success. Yes, I think we need to grit our teeth and do this. I'll stick a comment in there. Regards, John L. Males Willowdale, Ontario Canada 02 March 2005 (17:45 -) 18:51 == Bmer ... Boom Boom, how are you Boom Boom Meow, meoaa as Boomer loudly announces intent Boomer is coming for attention Loved to kneed arm and lick arm with Boomers very large tongue Able to catch, or at least hit, almost any object in flight withing reach of front paws Boomer 1985 (Born), Adopted 04 September 1991 04 September 1991 - 08 February 2000 18:50 How are you Mr. Sylvester? ... Grunt Grunt ... quick licks of nose Rolls over for pet and stomac rub when Dad arrives home and grunting Runs back and forth from study, tilts head as glowing green eyes stare for attention please, grunts and meows, repeats run, tilt head and stare few times for good measure, grunts and meows Lays on floor just outside study to guard Dad Loved to groom Miss Mahogany, and let Mahogany cuddle beside Sylvester 1989 (estimated Born) Found in building mail area noon hour 09 Feburary 1992 09 February 1992 - 19 January 2003 23:25 Hello Miss Chicago 'White Sox', how are you 'Chico'? Grunt (thank you) ... as put out food for Chicago ME So loud the world stops A very determined Miss White Sox AKA Chico ... Cheryl Crawford used as nickname Loved to chase kibble slid down hall floor, bat about and then eat Loved to hook paw in dish to toss out a single kibble at time, dart at as moved, then eat ... Crunches Chicago White Sox, Chico August 1989 (born), adopted 04 February 1991 05 October 2004 06:52 Quite Grunts as lay Chicago on bed for last time 04 February 1991 - 05 October 2004 07:32 pgp7yXKaQLjXX.pgp Description: PGP signature
Linux Kernel 2.2.19 Available Memory Bug
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello, Please note I am not on the Kernel Mailing List so do try to copy me in with any reply, questions, clarifications, confirmations, et al on this bug report. Before I forget I am using SuSE 6.4 with most of the updates from the base applied, most meaning I generally lag a bit behind at times. I am NOT using the SuSE kernel due to bugs introduced into the SuSE kernel with the SuSE patches/enhancements. I am using the Linus 2.2.19 Kernel with the OpenWall patches. System is AMD K6-2 500 based system. The version of gcc was 2.95.2 to compile the kernel. The bug I am reporting is that when one sets the amount of memory, i.e. 128M, 256M; at the time of booting the 2.2.19 kernel the "Total Memory" as reported by KDE, "free", etc is short by a important amount. To be more specific I will detail the results of "free" below against the "mem" value passed to the kernel. Please note for the purposes of this test I always had 256MB or ram (2x128MB) installed in my system. The BIOS reports total system memory as 262144K. "mem=256m" *** KDE reports 251.09 Total System memory, or 263290880 bytes. "free -m" indicates "Total Memory" as 251 "free -k" indicates "Total Memory" as 257120 "free -k" indicates "Total Memory" as 263290880 The exact same vaules as noted above are indicated for "mem=262144k", and "mem=268435546" (256 X 1024 x 1024). "mem=128m" *** "free -m" indicates "Total Memory" as 124 "free -k" indicates "Total Memory" as 127344 "free -k" indicates "Total Memory" as 130400256 Regards, John L. Males Software I.Q. Consulting Toronto, Ontario Canada 20 July 2001 03:47 mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] -BEGIN PGP SIGNATURE- Version: PGPfreeware 6.5.8 for non-commercial use <http://www.pgp.com> iQA/AwUBO1fwKPLzhJbmoDZ+EQKQowCfcqeGPdpduaFpTQO1P9XaOlJccHEAn20p v0V59vV7rrFEvMQCLwzXyO2V =Ezn3 -END PGP SIGNATURE- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Linux Kernel 2.2.19 Available Memory Bug
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello, Please note I am not on the Kernel Mailing List so do try to copy me in with any reply, questions, clarifications, confirmations, et al on this bug report. Before I forget I am using SuSE 6.4 with most of the updates from the base applied, most meaning I generally lag a bit behind at times. I am NOT using the SuSE kernel due to bugs introduced into the SuSE kernel with the SuSE patches/enhancements. I am using the Linus 2.2.19 Kernel with the OpenWall patches. System is AMD K6-2 500 based system. The version of gcc was 2.95.2 to compile the kernel. The bug I am reporting is that when one sets the amount of memory, i.e. 128M, 256M; at the time of booting the 2.2.19 kernel the Total Memory as reported by KDE, free, etc is short by a important amount. To be more specific I will detail the results of free below against the mem value passed to the kernel. Please note for the purposes of this test I always had 256MB or ram (2x128MB) installed in my system. The BIOS reports total system memory as 262144K. mem=256m *** KDE reports 251.09 Total System memory, or 263290880 bytes. free -m indicates Total Memory as 251 free -k indicates Total Memory as 257120 free -k indicates Total Memory as 263290880 The exact same vaules as noted above are indicated for mem=262144k, and mem=268435546 (256 X 1024 x 1024). mem=128m *** free -m indicates Total Memory as 124 free -k indicates Total Memory as 127344 free -k indicates Total Memory as 130400256 Regards, John L. Males Software I.Q. Consulting Toronto, Ontario Canada 20 July 2001 03:47 mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] -BEGIN PGP SIGNATURE- Version: PGPfreeware 6.5.8 for non-commercial use http://www.pgp.com iQA/AwUBO1fwKPLzhJbmoDZ+EQKQowCfcqeGPdpduaFpTQO1P9XaOlJccHEAn20p v0V59vV7rrFEvMQCLwzXyO2V =Ezn3 -END PGP SIGNATURE- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/