Re: Some linux kernel with KAISER/KPTI patch can't work under qemu + haxm.
FYI, this was fixed by https://www.spinics.net/lists/stable/msg209612.html On Tue, Jan 9, 2018 at 5:36 PM, lepton <ytht@gmail.com> wrote: > I tried some debug, it seems it crashed after switch CR3: > > > I tried 2 different kernel, so actual crash points are different, but > they have same pattern. It crashed when trying to pop %rax after > switch CR3: > > for 4.10 with kaiser patch: it crashed at this point: > > Dump of assembler code for function ret_from_fork: >0x8190ad00 <+0>: push %rbp >0x8190ad01 <+1>: mov%rsp,%rbp >0x8190ad04 <+4>: mov%rax,%rdi >0x8190ad07 <+7>: callq 0x81083670 >0x8190ad0c <+12>:test %rbx,%rbx >0x8190ad0f <+15>:jne0x8190ad32 <ret_from_fork+50> >0x8190ad11 <+17>:lea0x8(%rsp),%rdi >0x8190ad16 <+22>:callq 0x810027a0 > >0x8190ad1b <+27>:push %rax >0x8190ad1c <+28>:mov%cr3,%rax >0x8190ad1f <+31>:or $0x1000,%rax >0x8190ad25 <+37>:mov%rax,%cr3 >0x8190ad28 <+40>:pop%rax > <--- crashed here >0x8190ad29 <+41>:swapgs >0x8190ad2c <+44>:pop%rbp > > > For 4.4.110: > > Dump of assembler code for function opportunistic_sysret_failed: >0x81887165 <+0>: jmp0x81887181 > <opportunistic_sysret_failed+28> >0x81887167 <+2>: mov%cr3,%rax >0x8188716a <+5>: or %gs:0xf0c0,%rax >0x81887173 <+14>: js 0x8188717d > <opportunistic_sysret_failed+24> >0x81887175 <+16>: mov%al,%gs:0xf0c7 >0x8188717d <+24>: mov%rax,%cr3 >0xffff81887180 <+27>: pop%rax > < crashed here >0x81887181 <+28>: swapgs >0x81887184 <+31>: jmpq 0x81887ab0 > >0x81887189 <+36>: nopl 0x0(%rax) > > On Thu, Jan 4, 2018 at 2:13 PM, lepton <ytht@gmail.com> wrote: >> It seems for some reason, some linux kernel with KAISER/KPTI patch >> can't work with qemu + haxm. >> The mainline kernel from Linus is fine. But the patch to 4.4/4.10 doesn't >> work. >> >> I am not familiar with HAXM and KPTI either. so not sure if this is a >> qemu bug or KPTI bug or haxm bug. >> >> The same kernel works fine under qemu + kvm. >> >> This is the way to reproduce it: >> >> 1. Download qemu for windows, follow instructions here: >> >> https://www.qemu.org/2017/11/22/haxm-usage-windows/ >> >> 2. Build a kernel with KAISER/KPTI, I am using kernel here: >> >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/log/?h=linux-4.4.y >> Or follow instruction here to build a 4.10 kernel: >> https://github.com/IAIK/KAISER/tree/master/KAISER >> >> 3. Build an ext2 image which has a simple init: >> dd of=img count=8192 bs=4096 if=/dev/zero >> mke2fs img >> gcc --static -o init init.c >> debugfs -R "write init init" -w img >> cat init.c >> #include >> #include >> >> int main() >> { >> while(1) { >> printf("This is init %d\n", time(NULL)); >> sleep(3600); >> } >> } >> >> 4. copy kernel and disk image generated from 3 to windows and run it: >> qemu-system-x86_64.exe -kernel bzImage -hda img -append "init=/init >> root=/dev/sda" -serial stdio -accel hax >> >> You will see kernel panic or "vpu shutdown reqeust" to qemu.
Re: Some linux kernel with KAISER/KPTI patch can't work under qemu + haxm.
FYI, this was fixed by https://www.spinics.net/lists/stable/msg209612.html On Tue, Jan 9, 2018 at 5:36 PM, lepton wrote: > I tried some debug, it seems it crashed after switch CR3: > > > I tried 2 different kernel, so actual crash points are different, but > they have same pattern. It crashed when trying to pop %rax after > switch CR3: > > for 4.10 with kaiser patch: it crashed at this point: > > Dump of assembler code for function ret_from_fork: >0x8190ad00 <+0>: push %rbp >0x8190ad01 <+1>: mov%rsp,%rbp >0x8190ad04 <+4>: mov%rax,%rdi >0x8190ad07 <+7>: callq 0x81083670 >0x8190ad0c <+12>:test %rbx,%rbx >0x8190ad0f <+15>:jne0x8190ad32 >0x8190ad11 <+17>:lea0x8(%rsp),%rdi >0x8190ad16 <+22>:callq 0x810027a0 > >0x8190ad1b <+27>:push %rax >0x8190ad1c <+28>:mov%cr3,%rax >0x8190ad1f <+31>:or $0x1000,%rax >0x8190ad25 <+37>:mov%rax,%cr3 >0x8190ad28 <+40>:pop%rax > <--- crashed here >0x8190ad29 <+41>:swapgs >0x8190ad2c <+44>:pop%rbp > > > For 4.4.110: > > Dump of assembler code for function opportunistic_sysret_failed: >0x81887165 <+0>: jmp0x81887181 > >0x81887167 <+2>: mov%cr3,%rax >0x8188716a <+5>: or %gs:0xf0c0,%rax >0x81887173 <+14>: js 0x8188717d > >0x81887175 <+16>: mov%al,%gs:0xf0c7 >0x8188717d <+24>: mov%rax,%cr3 >0x81887180 <+27>: pop%rax > < crashed here >0x81887181 <+28>: swapgs >0x81887184 <+31>: jmpq 0x81887ab0 > >0x81887189 <+36>: nopl 0x0(%rax) > > On Thu, Jan 4, 2018 at 2:13 PM, lepton wrote: >> It seems for some reason, some linux kernel with KAISER/KPTI patch >> can't work with qemu + haxm. >> The mainline kernel from Linus is fine. But the patch to 4.4/4.10 doesn't >> work. >> >> I am not familiar with HAXM and KPTI either. so not sure if this is a >> qemu bug or KPTI bug or haxm bug. >> >> The same kernel works fine under qemu + kvm. >> >> This is the way to reproduce it: >> >> 1. Download qemu for windows, follow instructions here: >> >> https://www.qemu.org/2017/11/22/haxm-usage-windows/ >> >> 2. Build a kernel with KAISER/KPTI, I am using kernel here: >> >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/log/?h=linux-4.4.y >> Or follow instruction here to build a 4.10 kernel: >> https://github.com/IAIK/KAISER/tree/master/KAISER >> >> 3. Build an ext2 image which has a simple init: >> dd of=img count=8192 bs=4096 if=/dev/zero >> mke2fs img >> gcc --static -o init init.c >> debugfs -R "write init init" -w img >> cat init.c >> #include >> #include >> >> int main() >> { >> while(1) { >> printf("This is init %d\n", time(NULL)); >> sleep(3600); >> } >> } >> >> 4. copy kernel and disk image generated from 3 to windows and run it: >> qemu-system-x86_64.exe -kernel bzImage -hda img -append "init=/init >> root=/dev/sda" -serial stdio -accel hax >> >> You will see kernel panic or "vpu shutdown reqeust" to qemu.
Re: Some linux kernel with KAISER/KPTI patch can't work under qemu + haxm.
I tried some debug, it seems it crashed after switch CR3: I tried 2 different kernel, so actual crash points are different, but they have same pattern. It crashed when trying to pop %rax after switch CR3: for 4.10 with kaiser patch: it crashed at this point: Dump of assembler code for function ret_from_fork: 0x8190ad00 <+0>: push %rbp 0x8190ad01 <+1>: mov%rsp,%rbp 0x8190ad04 <+4>: mov%rax,%rdi 0x8190ad07 <+7>: callq 0x81083670 0x8190ad0c <+12>:test %rbx,%rbx 0x8190ad0f <+15>:jne0x8190ad32 <ret_from_fork+50> 0x8190ad11 <+17>:lea0x8(%rsp),%rdi 0x8190ad16 <+22>:callq 0x810027a0 0x8190ad1b <+27>:push %rax 0x8190ad1c <+28>:mov%cr3,%rax 0x8190ad1f <+31>:or $0x1000,%rax 0x8190ad25 <+37>:mov%rax,%cr3 0x8190ad28 <+40>:pop%rax <--- crashed here 0x8190ad29 <+41>:swapgs 0x8190ad2c <+44>:pop%rbp For 4.4.110: Dump of assembler code for function opportunistic_sysret_failed: 0x81887165 <+0>: jmp0x81887181 <opportunistic_sysret_failed+28> 0x81887167 <+2>: mov%cr3,%rax 0x8188716a <+5>: or %gs:0xf0c0,%rax 0x81887173 <+14>: js 0x8188717d <opportunistic_sysret_failed+24> 0x81887175 <+16>: mov%al,%gs:0xf0c7 0x8188717d <+24>: mov%rax,%cr3 0x81887180 <+27>: pop%rax < crashed here 0x81887181 <+28>: swapgs 0x81887184 <+31>: jmpq 0x81887ab0 0x81887189 <+36>: nopl 0x0(%rax) On Thu, Jan 4, 2018 at 2:13 PM, lepton <ytht@gmail.com> wrote: > It seems for some reason, some linux kernel with KAISER/KPTI patch > can't work with qemu + haxm. > The mainline kernel from Linus is fine. But the patch to 4.4/4.10 doesn't > work. > > I am not familiar with HAXM and KPTI either. so not sure if this is a > qemu bug or KPTI bug or haxm bug. > > The same kernel works fine under qemu + kvm. > > This is the way to reproduce it: > > 1. Download qemu for windows, follow instructions here: > > https://www.qemu.org/2017/11/22/haxm-usage-windows/ > > 2. Build a kernel with KAISER/KPTI, I am using kernel here: > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/log/?h=linux-4.4.y > Or follow instruction here to build a 4.10 kernel: > https://github.com/IAIK/KAISER/tree/master/KAISER > > 3. Build an ext2 image which has a simple init: > dd of=img count=8192 bs=4096 if=/dev/zero > mke2fs img > gcc --static -o init init.c > debugfs -R "write init init" -w img > cat init.c > #include > #include > > int main() > { > while(1) { > printf("This is init %d\n", time(NULL)); > sleep(3600); > } > } > > 4. copy kernel and disk image generated from 3 to windows and run it: > qemu-system-x86_64.exe -kernel bzImage -hda img -append "init=/init > root=/dev/sda" -serial stdio -accel hax > > You will see kernel panic or "vpu shutdown reqeust" to qemu.
Re: Some linux kernel with KAISER/KPTI patch can't work under qemu + haxm.
I tried some debug, it seems it crashed after switch CR3: I tried 2 different kernel, so actual crash points are different, but they have same pattern. It crashed when trying to pop %rax after switch CR3: for 4.10 with kaiser patch: it crashed at this point: Dump of assembler code for function ret_from_fork: 0x8190ad00 <+0>: push %rbp 0x8190ad01 <+1>: mov%rsp,%rbp 0x8190ad04 <+4>: mov%rax,%rdi 0x8190ad07 <+7>: callq 0x81083670 0x8190ad0c <+12>:test %rbx,%rbx 0x8190ad0f <+15>:jne0x8190ad32 0x8190ad11 <+17>:lea0x8(%rsp),%rdi 0x8190ad16 <+22>:callq 0x810027a0 0x8190ad1b <+27>:push %rax 0x8190ad1c <+28>:mov%cr3,%rax 0x8190ad1f <+31>:or $0x1000,%rax 0x8190ad25 <+37>:mov%rax,%cr3 0x8190ad28 <+40>:pop%rax <--- crashed here 0x8190ad29 <+41>:swapgs 0x8190ad2c <+44>:pop%rbp For 4.4.110: Dump of assembler code for function opportunistic_sysret_failed: 0x81887165 <+0>: jmp0x81887181 0x81887167 <+2>: mov%cr3,%rax 0x8188716a <+5>: or %gs:0xf0c0,%rax 0x81887173 <+14>: js 0x8188717d 0x81887175 <+16>: mov%al,%gs:0xf0c7 0x8188717d <+24>: mov%rax,%cr3 0x81887180 <+27>: pop%rax < crashed here 0x81887181 <+28>: swapgs 0x81887184 <+31>: jmpq 0x81887ab0 0x81887189 <+36>: nopl 0x0(%rax) On Thu, Jan 4, 2018 at 2:13 PM, lepton wrote: > It seems for some reason, some linux kernel with KAISER/KPTI patch > can't work with qemu + haxm. > The mainline kernel from Linus is fine. But the patch to 4.4/4.10 doesn't > work. > > I am not familiar with HAXM and KPTI either. so not sure if this is a > qemu bug or KPTI bug or haxm bug. > > The same kernel works fine under qemu + kvm. > > This is the way to reproduce it: > > 1. Download qemu for windows, follow instructions here: > > https://www.qemu.org/2017/11/22/haxm-usage-windows/ > > 2. Build a kernel with KAISER/KPTI, I am using kernel here: > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/log/?h=linux-4.4.y > Or follow instruction here to build a 4.10 kernel: > https://github.com/IAIK/KAISER/tree/master/KAISER > > 3. Build an ext2 image which has a simple init: > dd of=img count=8192 bs=4096 if=/dev/zero > mke2fs img > gcc --static -o init init.c > debugfs -R "write init init" -w img > cat init.c > #include > #include > > int main() > { > while(1) { > printf("This is init %d\n", time(NULL)); > sleep(3600); > } > } > > 4. copy kernel and disk image generated from 3 to windows and run it: > qemu-system-x86_64.exe -kernel bzImage -hda img -append "init=/init > root=/dev/sda" -serial stdio -accel hax > > You will see kernel panic or "vpu shutdown reqeust" to qemu.
Some linux kernel with KAISER/KPTI patch can't work under qemu + haxm.
It seems for some reason, some linux kernel with KAISER/KPTI patch can't work with qemu + haxm. The mainline kernel from Linus is fine. But the patch to 4.4/4.10 doesn't work. I am not familiar with HAXM and KPTI either. so not sure if this is a qemu bug or KPTI bug or haxm bug. The same kernel works fine under qemu + kvm. This is the way to reproduce it: 1. Download qemu for windows, follow instructions here: https://www.qemu.org/2017/11/22/haxm-usage-windows/ 2. Build a kernel with KAISER/KPTI, I am using kernel here: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/log/?h=linux-4.4.y Or follow instruction here to build a 4.10 kernel: https://github.com/IAIK/KAISER/tree/master/KAISER 3. Build an ext2 image which has a simple init: dd of=img count=8192 bs=4096 if=/dev/zero mke2fs img gcc --static -o init init.c debugfs -R "write init init" -w img cat init.c #include #include int main() { while(1) { printf("This is init %d\n", time(NULL)); sleep(3600); } } 4. copy kernel and disk image generated from 3 to windows and run it: qemu-system-x86_64.exe -kernel bzImage -hda img -append "init=/init root=/dev/sda" -serial stdio -accel hax You will see kernel panic or "vpu shutdown reqeust" to qemu.
Some linux kernel with KAISER/KPTI patch can't work under qemu + haxm.
It seems for some reason, some linux kernel with KAISER/KPTI patch can't work with qemu + haxm. The mainline kernel from Linus is fine. But the patch to 4.4/4.10 doesn't work. I am not familiar with HAXM and KPTI either. so not sure if this is a qemu bug or KPTI bug or haxm bug. The same kernel works fine under qemu + kvm. This is the way to reproduce it: 1. Download qemu for windows, follow instructions here: https://www.qemu.org/2017/11/22/haxm-usage-windows/ 2. Build a kernel with KAISER/KPTI, I am using kernel here: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/log/?h=linux-4.4.y Or follow instruction here to build a 4.10 kernel: https://github.com/IAIK/KAISER/tree/master/KAISER 3. Build an ext2 image which has a simple init: dd of=img count=8192 bs=4096 if=/dev/zero mke2fs img gcc --static -o init init.c debugfs -R "write init init" -w img cat init.c #include #include int main() { while(1) { printf("This is init %d\n", time(NULL)); sleep(3600); } } 4. copy kernel and disk image generated from 3 to windows and run it: qemu-system-x86_64.exe -kernel bzImage -hda img -append "init=/init root=/dev/sda" -serial stdio -accel hax You will see kernel panic or "vpu shutdown reqeust" to qemu.
Re: [PATCH v2] mtd: Fix mtdblock for >4GB MTD devices
If checking some calling side, the len is from cache_size of struct mtdblk_dev, it's defined as unsigned int now. So it's not 64bit yet. BTW, seems it's just block size (512) at some other calling side. (Sorry for previous same content email, just found out it's html format and rejected by mail list) On Mon, Feb 27, 2017 at 1:31 AM, Marek Vasut <marek.va...@gmail.com> wrote: > On 02/22/2017 03:15 AM, Lepton Wu wrote: >> Change to use loff_t instead of unsigned long in some functions >> to make sure mtdblock can handle offset bigger than 4G in 32 bits mode. >> >> Signed-off-by: Lepton Wu <ytht@gmail.com> >> --- >> Changes in v2: >> - Make the commit message more clearer and fix some format issues. >> >> drivers/mtd/mtdblock.c| 35 ++- >> drivers/mtd/mtdblock_ro.c | 4 ++-- >> 2 files changed, 20 insertions(+), 19 deletions(-) >> >> diff --git a/drivers/mtd/mtdblock.c b/drivers/mtd/mtdblock.c >> index bb4c14f83c75..373c0edca803 100644 >> --- a/drivers/mtd/mtdblock.c >> +++ b/drivers/mtd/mtdblock.c >> @@ -61,8 +61,8 @@ static void erase_callback(struct erase_info *done) >> wake_up(wait_q); >> } >> >> -static int erase_write (struct mtd_info *mtd, unsigned long pos, >> - int len, const char *buf) >> +static int erase_write(struct mtd_info *mtd, loff_t pos, int len, >> +const char *buf) > > Can the length be 64bit too now ? > > [...] > > -- > Best regards, > Marek Vasut
Re: [PATCH v2] mtd: Fix mtdblock for >4GB MTD devices
If checking some calling side, the len is from cache_size of struct mtdblk_dev, it's defined as unsigned int now. So it's not 64bit yet. BTW, seems it's just block size (512) at some other calling side. (Sorry for previous same content email, just found out it's html format and rejected by mail list) On Mon, Feb 27, 2017 at 1:31 AM, Marek Vasut wrote: > On 02/22/2017 03:15 AM, Lepton Wu wrote: >> Change to use loff_t instead of unsigned long in some functions >> to make sure mtdblock can handle offset bigger than 4G in 32 bits mode. >> >> Signed-off-by: Lepton Wu >> --- >> Changes in v2: >> - Make the commit message more clearer and fix some format issues. >> >> drivers/mtd/mtdblock.c| 35 ++- >> drivers/mtd/mtdblock_ro.c | 4 ++-- >> 2 files changed, 20 insertions(+), 19 deletions(-) >> >> diff --git a/drivers/mtd/mtdblock.c b/drivers/mtd/mtdblock.c >> index bb4c14f83c75..373c0edca803 100644 >> --- a/drivers/mtd/mtdblock.c >> +++ b/drivers/mtd/mtdblock.c >> @@ -61,8 +61,8 @@ static void erase_callback(struct erase_info *done) >> wake_up(wait_q); >> } >> >> -static int erase_write (struct mtd_info *mtd, unsigned long pos, >> - int len, const char *buf) >> +static int erase_write(struct mtd_info *mtd, loff_t pos, int len, >> +const char *buf) > > Can the length be 64bit too now ? > > [...] > > -- > Best regards, > Marek Vasut
[PATCH v2] mtd: Fix mtdblock for >4GB MTD devices
Change to use loff_t instead of unsigned long in some functions to make sure mtdblock can handle offset bigger than 4G in 32 bits mode. Signed-off-by: Lepton Wu <ytht@gmail.com> --- Changes in v2: - Make the commit message more clearer and fix some format issues. drivers/mtd/mtdblock.c| 35 ++- drivers/mtd/mtdblock_ro.c | 4 ++-- 2 files changed, 20 insertions(+), 19 deletions(-) diff --git a/drivers/mtd/mtdblock.c b/drivers/mtd/mtdblock.c index bb4c14f83c75..373c0edca803 100644 --- a/drivers/mtd/mtdblock.c +++ b/drivers/mtd/mtdblock.c @@ -61,8 +61,8 @@ static void erase_callback(struct erase_info *done) wake_up(wait_q); } -static int erase_write (struct mtd_info *mtd, unsigned long pos, - int len, const char *buf) +static int erase_write(struct mtd_info *mtd, loff_t pos, int len, + const char *buf) { struct erase_info erase; DECLARE_WAITQUEUE(wait, current); @@ -88,8 +88,7 @@ static int erase_write (struct mtd_info *mtd, unsigned long pos, if (ret) { set_current_state(TASK_RUNNING); remove_wait_queue(_q, ); - printk (KERN_WARNING "mtdblock: erase of region [0x%lx, 0x%x] " -"on \"%s\" failed\n", + pr_warn("mtdblock: erase of region [0x%llx, 0x%x] on \"%s\" failed\n", pos, len, mtd->name); return ret; } @@ -139,23 +138,24 @@ static int write_cached_data (struct mtdblk_dev *mtdblk) } -static int do_cached_write (struct mtdblk_dev *mtdblk, unsigned long pos, - int len, const char *buf) +static int do_cached_write(struct mtdblk_dev *mtdblk, loff_t pos, + int len, const char *buf) { struct mtd_info *mtd = mtdblk->mbd.mtd; unsigned int sect_size = mtdblk->cache_size; size_t retlen; int ret; - pr_debug("mtdblock: write on \"%s\" at 0x%lx, size 0x%x\n", + pr_debug("mtdblock: write on \"%s\" at 0x%llx, size 0x%x\n", mtd->name, pos, len); if (!sect_size) return mtd_write(mtd, pos, len, , buf); while (len > 0) { - unsigned long sect_start = (pos/sect_size)*sect_size; - unsigned int offset = pos - sect_start; + unsigned int offset; + loff_t sect_start = + div_u64_rem(pos, sect_size, ) * sect_size; unsigned int size = sect_size - offset; if( size > len ) size = len; @@ -209,23 +209,24 @@ static int do_cached_write (struct mtdblk_dev *mtdblk, unsigned long pos, } -static int do_cached_read (struct mtdblk_dev *mtdblk, unsigned long pos, - int len, char *buf) +static int do_cached_read(struct mtdblk_dev *mtdblk, loff_t pos, + int len, char *buf) { struct mtd_info *mtd = mtdblk->mbd.mtd; unsigned int sect_size = mtdblk->cache_size; size_t retlen; int ret; - pr_debug("mtdblock: read on \"%s\" at 0x%lx, size 0x%x\n", - mtd->name, pos, len); + pr_debug("mtdblock: read on \"%s\" at 0x%llx, size 0x%x\n", +mtd->name, pos, len); if (!sect_size) return mtd_read(mtd, pos, len, , buf); while (len > 0) { - unsigned long sect_start = (pos/sect_size)*sect_size; - unsigned int offset = pos - sect_start; + unsigned int offset; + loff_t sect_start = + div_u64_rem(pos, sect_size, ) * sect_size; unsigned int size = sect_size - offset; if (size > len) size = len; @@ -259,7 +260,7 @@ static int mtdblock_readsect(struct mtd_blktrans_dev *dev, unsigned long block, char *buf) { struct mtdblk_dev *mtdblk = container_of(dev, struct mtdblk_dev, mbd); - return do_cached_read(mtdblk, block<<9, 512, buf); + return do_cached_read(mtdblk, (loff_t)block << 9, 512, buf); } static int mtdblock_writesect(struct mtd_blktrans_dev *dev, @@ -275,7 +276,7 @@ static int mtdblock_writesect(struct mtd_blktrans_dev *dev, * return -EAGAIN sometimes, but why bother? */ } - return do_cached_write(mtdblk, block<<9, 512, buf); + return do_cached_write(mtdblk, (loff_t)block << 9, 512, buf); } static int mtdblock_open(struct mtd_blktrans_dev *mbd) diff --git a/drivers/mtd/mtdblock_ro.c b/drivers/mtd/mtdblock_ro.c index fb5dc89369de..92829e3fb3b7 100644 --- a/drivers/mtd/mtdblock_ro.c +++ b/drivers/mtd/mtdblock_ro
[PATCH v2] mtd: Fix mtdblock for >4GB MTD devices
Change to use loff_t instead of unsigned long in some functions to make sure mtdblock can handle offset bigger than 4G in 32 bits mode. Signed-off-by: Lepton Wu --- Changes in v2: - Make the commit message more clearer and fix some format issues. drivers/mtd/mtdblock.c| 35 ++- drivers/mtd/mtdblock_ro.c | 4 ++-- 2 files changed, 20 insertions(+), 19 deletions(-) diff --git a/drivers/mtd/mtdblock.c b/drivers/mtd/mtdblock.c index bb4c14f83c75..373c0edca803 100644 --- a/drivers/mtd/mtdblock.c +++ b/drivers/mtd/mtdblock.c @@ -61,8 +61,8 @@ static void erase_callback(struct erase_info *done) wake_up(wait_q); } -static int erase_write (struct mtd_info *mtd, unsigned long pos, - int len, const char *buf) +static int erase_write(struct mtd_info *mtd, loff_t pos, int len, + const char *buf) { struct erase_info erase; DECLARE_WAITQUEUE(wait, current); @@ -88,8 +88,7 @@ static int erase_write (struct mtd_info *mtd, unsigned long pos, if (ret) { set_current_state(TASK_RUNNING); remove_wait_queue(_q, ); - printk (KERN_WARNING "mtdblock: erase of region [0x%lx, 0x%x] " -"on \"%s\" failed\n", + pr_warn("mtdblock: erase of region [0x%llx, 0x%x] on \"%s\" failed\n", pos, len, mtd->name); return ret; } @@ -139,23 +138,24 @@ static int write_cached_data (struct mtdblk_dev *mtdblk) } -static int do_cached_write (struct mtdblk_dev *mtdblk, unsigned long pos, - int len, const char *buf) +static int do_cached_write(struct mtdblk_dev *mtdblk, loff_t pos, + int len, const char *buf) { struct mtd_info *mtd = mtdblk->mbd.mtd; unsigned int sect_size = mtdblk->cache_size; size_t retlen; int ret; - pr_debug("mtdblock: write on \"%s\" at 0x%lx, size 0x%x\n", + pr_debug("mtdblock: write on \"%s\" at 0x%llx, size 0x%x\n", mtd->name, pos, len); if (!sect_size) return mtd_write(mtd, pos, len, , buf); while (len > 0) { - unsigned long sect_start = (pos/sect_size)*sect_size; - unsigned int offset = pos - sect_start; + unsigned int offset; + loff_t sect_start = + div_u64_rem(pos, sect_size, ) * sect_size; unsigned int size = sect_size - offset; if( size > len ) size = len; @@ -209,23 +209,24 @@ static int do_cached_write (struct mtdblk_dev *mtdblk, unsigned long pos, } -static int do_cached_read (struct mtdblk_dev *mtdblk, unsigned long pos, - int len, char *buf) +static int do_cached_read(struct mtdblk_dev *mtdblk, loff_t pos, + int len, char *buf) { struct mtd_info *mtd = mtdblk->mbd.mtd; unsigned int sect_size = mtdblk->cache_size; size_t retlen; int ret; - pr_debug("mtdblock: read on \"%s\" at 0x%lx, size 0x%x\n", - mtd->name, pos, len); + pr_debug("mtdblock: read on \"%s\" at 0x%llx, size 0x%x\n", +mtd->name, pos, len); if (!sect_size) return mtd_read(mtd, pos, len, , buf); while (len > 0) { - unsigned long sect_start = (pos/sect_size)*sect_size; - unsigned int offset = pos - sect_start; + unsigned int offset; + loff_t sect_start = + div_u64_rem(pos, sect_size, ) * sect_size; unsigned int size = sect_size - offset; if (size > len) size = len; @@ -259,7 +260,7 @@ static int mtdblock_readsect(struct mtd_blktrans_dev *dev, unsigned long block, char *buf) { struct mtdblk_dev *mtdblk = container_of(dev, struct mtdblk_dev, mbd); - return do_cached_read(mtdblk, block<<9, 512, buf); + return do_cached_read(mtdblk, (loff_t)block << 9, 512, buf); } static int mtdblock_writesect(struct mtd_blktrans_dev *dev, @@ -275,7 +276,7 @@ static int mtdblock_writesect(struct mtd_blktrans_dev *dev, * return -EAGAIN sometimes, but why bother? */ } - return do_cached_write(mtdblk, block<<9, 512, buf); + return do_cached_write(mtdblk, (loff_t)block << 9, 512, buf); } static int mtdblock_open(struct mtd_blktrans_dev *mbd) diff --git a/drivers/mtd/mtdblock_ro.c b/drivers/mtd/mtdblock_ro.c index fb5dc89369de..92829e3fb3b7 100644 --- a/drivers/mtd/mtdblock_ro.c +++ b/drivers/mtd/mtdblock_ro.c @@ -31,7 +31,7 @@ stati
[PATCH] Make mtdblock can handle partition bigger than 4G.
Signed-off-by: Lepton Wu <ytht@gmail.com> --- drivers/mtd/mtdblock.c| 33 + drivers/mtd/mtdblock_ro.c | 4 ++-- 2 files changed, 19 insertions(+), 18 deletions(-) diff --git a/drivers/mtd/mtdblock.c b/drivers/mtd/mtdblock.c index bb4c14f83c75..3d2da76287a7 100644 --- a/drivers/mtd/mtdblock.c +++ b/drivers/mtd/mtdblock.c @@ -61,8 +61,8 @@ static void erase_callback(struct erase_info *done) wake_up(wait_q); } -static int erase_write (struct mtd_info *mtd, unsigned long pos, - int len, const char *buf) +static int erase_write(struct mtd_info *mtd, loff_t pos, int len, + const char *buf) { struct erase_info erase; DECLARE_WAITQUEUE(wait, current); @@ -88,8 +88,7 @@ static int erase_write (struct mtd_info *mtd, unsigned long pos, if (ret) { set_current_state(TASK_RUNNING); remove_wait_queue(_q, ); - printk (KERN_WARNING "mtdblock: erase of region [0x%lx, 0x%x] " -"on \"%s\" failed\n", + pr_warn("mtdblock: erase of region [0x%llx, 0x%x] on \"%s\" failed\n", pos, len, mtd->name); return ret; } @@ -139,23 +138,24 @@ static int write_cached_data (struct mtdblk_dev *mtdblk) } -static int do_cached_write (struct mtdblk_dev *mtdblk, unsigned long pos, - int len, const char *buf) +static int do_cached_write(struct mtdblk_dev *mtdblk, loff_t pos, + int len, const char *buf) { struct mtd_info *mtd = mtdblk->mbd.mtd; unsigned int sect_size = mtdblk->cache_size; size_t retlen; int ret; - pr_debug("mtdblock: write on \"%s\" at 0x%lx, size 0x%x\n", + pr_debug("mtdblock: write on \"%s\" at 0x%llx, size 0x%x\n", mtd->name, pos, len); if (!sect_size) return mtd_write(mtd, pos, len, , buf); while (len > 0) { - unsigned long sect_start = (pos/sect_size)*sect_size; - unsigned int offset = pos - sect_start; + unsigned int offset; + loff_t sect_start = + div_u64_rem(pos, sect_size, ) * sect_size; unsigned int size = sect_size - offset; if( size > len ) size = len; @@ -209,23 +209,24 @@ static int do_cached_write (struct mtdblk_dev *mtdblk, unsigned long pos, } -static int do_cached_read (struct mtdblk_dev *mtdblk, unsigned long pos, - int len, char *buf) +static int do_cached_read(struct mtdblk_dev *mtdblk, loff_t pos, + int len, char *buf) { struct mtd_info *mtd = mtdblk->mbd.mtd; unsigned int sect_size = mtdblk->cache_size; size_t retlen; int ret; - pr_debug("mtdblock: read on \"%s\" at 0x%lx, size 0x%x\n", + pr_debug("mtdblock: read on \"%s\" at 0x%llx, size 0x%x\n", mtd->name, pos, len); if (!sect_size) return mtd_read(mtd, pos, len, , buf); while (len > 0) { - unsigned long sect_start = (pos/sect_size)*sect_size; - unsigned int offset = pos - sect_start; + unsigned int offset; + loff_t sect_start = + div_u64_rem(pos, sect_size, ) * sect_size; unsigned int size = sect_size - offset; if (size > len) size = len; @@ -259,7 +260,7 @@ static int mtdblock_readsect(struct mtd_blktrans_dev *dev, unsigned long block, char *buf) { struct mtdblk_dev *mtdblk = container_of(dev, struct mtdblk_dev, mbd); - return do_cached_read(mtdblk, block<<9, 512, buf); + return do_cached_read(mtdblk, (loff_t)block<<9, 512, buf); } static int mtdblock_writesect(struct mtd_blktrans_dev *dev, @@ -275,7 +276,7 @@ static int mtdblock_writesect(struct mtd_blktrans_dev *dev, * return -EAGAIN sometimes, but why bother? */ } - return do_cached_write(mtdblk, block<<9, 512, buf); + return do_cached_write(mtdblk, (loff_t)block<<9, 512, buf); } static int mtdblock_open(struct mtd_blktrans_dev *mbd) diff --git a/drivers/mtd/mtdblock_ro.c b/drivers/mtd/mtdblock_ro.c index fb5dc89369de..92829e3fb3b7 100644 --- a/drivers/mtd/mtdblock_ro.c +++ b/drivers/mtd/mtdblock_ro.c @@ -31,7 +31,7 @@ static int mtdblock_readsect(struct mtd_blktrans_dev *dev, { size_t retlen; - if (mtd_read(dev->mtd, (block * 512), 512, , buf)) + if (mtd_read(dev->mtd, (loff_t)block << 9, 512, , buf)) return 1;
[PATCH] Make mtdblock can handle partition bigger than 4G.
Signed-off-by: Lepton Wu --- drivers/mtd/mtdblock.c| 33 + drivers/mtd/mtdblock_ro.c | 4 ++-- 2 files changed, 19 insertions(+), 18 deletions(-) diff --git a/drivers/mtd/mtdblock.c b/drivers/mtd/mtdblock.c index bb4c14f83c75..3d2da76287a7 100644 --- a/drivers/mtd/mtdblock.c +++ b/drivers/mtd/mtdblock.c @@ -61,8 +61,8 @@ static void erase_callback(struct erase_info *done) wake_up(wait_q); } -static int erase_write (struct mtd_info *mtd, unsigned long pos, - int len, const char *buf) +static int erase_write(struct mtd_info *mtd, loff_t pos, int len, + const char *buf) { struct erase_info erase; DECLARE_WAITQUEUE(wait, current); @@ -88,8 +88,7 @@ static int erase_write (struct mtd_info *mtd, unsigned long pos, if (ret) { set_current_state(TASK_RUNNING); remove_wait_queue(_q, ); - printk (KERN_WARNING "mtdblock: erase of region [0x%lx, 0x%x] " -"on \"%s\" failed\n", + pr_warn("mtdblock: erase of region [0x%llx, 0x%x] on \"%s\" failed\n", pos, len, mtd->name); return ret; } @@ -139,23 +138,24 @@ static int write_cached_data (struct mtdblk_dev *mtdblk) } -static int do_cached_write (struct mtdblk_dev *mtdblk, unsigned long pos, - int len, const char *buf) +static int do_cached_write(struct mtdblk_dev *mtdblk, loff_t pos, + int len, const char *buf) { struct mtd_info *mtd = mtdblk->mbd.mtd; unsigned int sect_size = mtdblk->cache_size; size_t retlen; int ret; - pr_debug("mtdblock: write on \"%s\" at 0x%lx, size 0x%x\n", + pr_debug("mtdblock: write on \"%s\" at 0x%llx, size 0x%x\n", mtd->name, pos, len); if (!sect_size) return mtd_write(mtd, pos, len, , buf); while (len > 0) { - unsigned long sect_start = (pos/sect_size)*sect_size; - unsigned int offset = pos - sect_start; + unsigned int offset; + loff_t sect_start = + div_u64_rem(pos, sect_size, ) * sect_size; unsigned int size = sect_size - offset; if( size > len ) size = len; @@ -209,23 +209,24 @@ static int do_cached_write (struct mtdblk_dev *mtdblk, unsigned long pos, } -static int do_cached_read (struct mtdblk_dev *mtdblk, unsigned long pos, - int len, char *buf) +static int do_cached_read(struct mtdblk_dev *mtdblk, loff_t pos, + int len, char *buf) { struct mtd_info *mtd = mtdblk->mbd.mtd; unsigned int sect_size = mtdblk->cache_size; size_t retlen; int ret; - pr_debug("mtdblock: read on \"%s\" at 0x%lx, size 0x%x\n", + pr_debug("mtdblock: read on \"%s\" at 0x%llx, size 0x%x\n", mtd->name, pos, len); if (!sect_size) return mtd_read(mtd, pos, len, , buf); while (len > 0) { - unsigned long sect_start = (pos/sect_size)*sect_size; - unsigned int offset = pos - sect_start; + unsigned int offset; + loff_t sect_start = + div_u64_rem(pos, sect_size, ) * sect_size; unsigned int size = sect_size - offset; if (size > len) size = len; @@ -259,7 +260,7 @@ static int mtdblock_readsect(struct mtd_blktrans_dev *dev, unsigned long block, char *buf) { struct mtdblk_dev *mtdblk = container_of(dev, struct mtdblk_dev, mbd); - return do_cached_read(mtdblk, block<<9, 512, buf); + return do_cached_read(mtdblk, (loff_t)block<<9, 512, buf); } static int mtdblock_writesect(struct mtd_blktrans_dev *dev, @@ -275,7 +276,7 @@ static int mtdblock_writesect(struct mtd_blktrans_dev *dev, * return -EAGAIN sometimes, but why bother? */ } - return do_cached_write(mtdblk, block<<9, 512, buf); + return do_cached_write(mtdblk, (loff_t)block<<9, 512, buf); } static int mtdblock_open(struct mtd_blktrans_dev *mbd) diff --git a/drivers/mtd/mtdblock_ro.c b/drivers/mtd/mtdblock_ro.c index fb5dc89369de..92829e3fb3b7 100644 --- a/drivers/mtd/mtdblock_ro.c +++ b/drivers/mtd/mtdblock_ro.c @@ -31,7 +31,7 @@ static int mtdblock_readsect(struct mtd_blktrans_dev *dev, { size_t retlen; - if (mtd_read(dev->mtd, (block * 512), 512, , buf)) + if (mtd_read(dev->mtd, (loff_t)block << 9, 512, , buf)) return 1; return 0; } @@ -41
Re: [PATCH] 2.6.22.6 user-mode linux: use address instead of value as argument in os_free_irq_by_cb
I added dump_stack and some printk in host kernel. The following is what I got when sys_reboot in host kernel is called, the first line is printing the process state and ptrace state and pid of the calling process. the following is the call path. Sep 22 14:25:49 pc kernel: linux Rptrace: pid:3698 Sep 22 14:25:49 pc kernel: [] kernel_halt+0x8/0x24 Sep 22 14:25:49 pc kernel: [] sys_reboot+0x14c/0x1c7 Sep 22 14:25:49 pc kernel: [] recalc_sigpending+0xb/0x1d Sep 22 14:25:49 pc kernel: [] dequeue_signal+0x9d/0x11b Sep 22 14:25:49 pc kernel: [] get_signal_to_deliver+0xe1/0x389 Sep 22 14:25:49 pc kernel: [] do_notify_resume+0x84/0x5f1 Sep 22 14:25:49 pc kernel: [] ptrace_notify+0x6f/0x8d Sep 22 14:25:49 pc kernel: [] syscall_call+0x7/0xb Sep 22 14:25:49 pc kernel: === Sep 22 14:25:50 pc kernel: sd 1:0:0:0: [sda] Synchronizing SCSI cache Sep 22 14:25:50 pc kernel: sd 1:0:0:0: [sda] Stopping disk Sep 22 14:25:51 pc kernel: System halted. On Mon, Sep 24, 2007 at 01:02:20PM -0400, Jeff Dike wrote: > I would say that you shouldn't be running UMLs as root. However, a > sys_reboot shouldn't escape being ptraced either. This is a bug, but > I don't see any connection between it and ptrace missing a final > system call. > > Jeff > > -- > Work email - jdike at linux dot intel dot com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] 2.6.22.6 user-mode linux: use address instead of value as argument in os_free_irq_by_cb
I added dump_stack and some printk in host kernel. The following is what I got when sys_reboot in host kernel is called, the first line is printing the process state and ptrace state and pid of the calling process. the following is the call path. Sep 22 14:25:49 pc kernel: linux Rptrace: pid:3698 Sep 22 14:25:49 pc kernel: [c01234a7] kernel_halt+0x8/0x24 Sep 22 14:25:49 pc kernel: [c01239b2] sys_reboot+0x14c/0x1c7 Sep 22 14:25:49 pc kernel: [c0120712] recalc_sigpending+0xb/0x1d Sep 22 14:25:49 pc kernel: [c0121ed2] dequeue_signal+0x9d/0x11b Sep 22 14:25:49 pc kernel: [c01b] get_signal_to_deliver+0xe1/0x389 Sep 22 14:25:49 pc kernel: [c0101bd5] do_notify_resume+0x84/0x5f1 Sep 22 14:25:49 pc kernel: [c012212c] ptrace_notify+0x6f/0x8d Sep 22 14:25:49 pc kernel: [c010249e] syscall_call+0x7/0xb Sep 22 14:25:49 pc kernel: === Sep 22 14:25:50 pc kernel: sd 1:0:0:0: [sda] Synchronizing SCSI cache Sep 22 14:25:50 pc kernel: sd 1:0:0:0: [sda] Stopping disk Sep 22 14:25:51 pc kernel: System halted. On Mon, Sep 24, 2007 at 01:02:20PM -0400, Jeff Dike wrote: I would say that you shouldn't be running UMLs as root. However, a sys_reboot shouldn't escape being ptraced either. This is a bug, but I don't see any connection between it and ptrace missing a final system call. Jeff -- Work email - jdike at linux dot intel dot com - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] 2.6.22.6 user-mode linux: fix error in check_sysemu
it is a error do count++ here, it will let the following compare (after 8 lines) " if (!count)" always be false. Signed-off-by: Lepton Wu <[EMAIL PROTECTED]> diff -X linux-2.6.22.6/Documentation/dontdiff -pr -U 8 linux-2.6.22.6/arch/um/os-Linux/start_up.c linux-2.6.22.6-uml/arch/um/os-Linux/start_up.c --- linux-2.6.22.6/arch/um/os-Linux/start_up.c 2007-09-14 17:41:10.0 +0800 +++ linux-2.6.22.6-uml/arch/um/os-Linux/start_up.c 2007-09-23 20:14:08.0 +0800 @@ -250,17 +250,16 @@ static void __init check_sysemu(void) non_fatal("Checking advanced syscall emulation patch for ptrace..."); pid = start_ptraced_child(); if((ptrace(PTRACE_OLDSETOPTIONS, pid, 0, (void *) PTRACE_O_TRACESYSGOOD) < 0)) fatal_perror("check_ptrace: PTRACE_OLDSETOPTIONS failed"); while(1){ - count++; if(ptrace(PTRACE_SYSEMU_SINGLESTEP, pid, 0, 0) < 0) goto fail; CATCH_EINTR(n = waitpid(pid, , WUNTRACED)); if(n < 0) fatal_perror("check_ptrace : wait failed"); if(WIFSTOPPED(status) && (WSTOPSIG(status) == (SIGTRAP|0x80))){ if (!count) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] 2.6.22.6 user-mode linux: fix error in check_sysemu
it is a error do count++ here, it will let the following compare (after 8 lines)if (!count) always be false. Signed-off-by: Lepton Wu [EMAIL PROTECTED] diff -X linux-2.6.22.6/Documentation/dontdiff -pr -U 8 linux-2.6.22.6/arch/um/os-Linux/start_up.c linux-2.6.22.6-uml/arch/um/os-Linux/start_up.c --- linux-2.6.22.6/arch/um/os-Linux/start_up.c 2007-09-14 17:41:10.0 +0800 +++ linux-2.6.22.6-uml/arch/um/os-Linux/start_up.c 2007-09-23 20:14:08.0 +0800 @@ -250,17 +250,16 @@ static void __init check_sysemu(void) non_fatal(Checking advanced syscall emulation patch for ptrace...); pid = start_ptraced_child(stack); if((ptrace(PTRACE_OLDSETOPTIONS, pid, 0, (void *) PTRACE_O_TRACESYSGOOD) 0)) fatal_perror(check_ptrace: PTRACE_OLDSETOPTIONS failed); while(1){ - count++; if(ptrace(PTRACE_SYSEMU_SINGLESTEP, pid, 0, 0) 0) goto fail; CATCH_EINTR(n = waitpid(pid, status, WUNTRACED)); if(n 0) fatal_perror(check_ptrace : wait failed); if(WIFSTOPPED(status) (WSTOPSIG(status) == (SIGTRAP|0x80))){ if (!count) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH] 2.6.22.6 user-mode linux: No need to new a stack for clone without CLONE_VM
Since we just call clone without CLONE_VM, it is no need to use anoymous mmap to get a new stack frame. Let's keep codes simple. Signed-off-by: Lepton Wu <[EMAIL PROTECTED]> diff -X linux-2.6.22.6-uml/Documentation/dontdiff -pru linux-2.6.22.6/arch/um/os-Linux/start_up.c linux-2.6.22.6-uml/arch/um/os-Linux/start_up.c --- linux-2.6.22.6/arch/um/os-Linux/start_up.c 2007-09-14 17:41:10.0 +0800 +++ linux-2.6.22.6-uml/arch/um/os-Linux/start_up.c 2007-09-22 23:28:49.0 +0800 @@ -101,19 +101,13 @@ static void non_fatal(char *fmt, ...) fflush(stdout); } -static int start_ptraced_child(void **stack_out) +static int start_ptraced_child(void) { - void *stack; - unsigned long sp; + unsigned long sp = (((unsigned long) ) & ~(UM_KERN_PAGE_SIZE-1)) + + UM_KERN_PAGE_SIZE - sizeof(void *); int pid, n, status; - stack = mmap(NULL, UM_KERN_PAGE_SIZE, -PROT_READ | PROT_WRITE | PROT_EXEC, -MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); - if(stack == MAP_FAILED) - fatal_perror("check_ptrace : mmap failed"); - sp = (unsigned long) stack + UM_KERN_PAGE_SIZE - sizeof(void *); - pid = clone(ptrace_child, (void *) sp, SIGCHLD, NULL); + pid = clone(ptrace_child, sp, SIGCHLD, NULL); if(pid < 0) fatal_perror("start_ptraced_child : clone failed"); CATCH_EINTR(n = waitpid(pid, , WUNTRACED)); @@ -123,7 +117,6 @@ static int start_ptraced_child(void **st fatal("check_ptrace : expected SIGSTOP, got status = %d", status); - *stack_out = stack; return pid; } @@ -133,7 +126,7 @@ static int start_ptraced_child(void **st * So only for SYSEMU features we test mustpanic, while normal host features * must work anyway! */ -static int stop_ptraced_child(int pid, void *stack, int exitcode, +static int stop_ptraced_child(int pid, int exitcode, int mustexit) { int status, n, ret = 0; @@ -154,8 +147,6 @@ static int stop_ptraced_child(int pid, v ret = -1; } - if(munmap(stack, UM_KERN_PAGE_SIZE) < 0) - fatal_perror("check_ptrace : munmap failed"); return ret; } @@ -207,13 +198,12 @@ __uml_setup("nosysemu", nosysemu_cmd_par static void __init check_sysemu(void) { - void *stack; unsigned long regs[MAX_REG_NR]; int pid, n, status, count=0; non_fatal("Checking syscall emulation patch for ptrace..."); sysemu_supported = 0; - pid = start_ptraced_child(); + pid = start_ptraced_child(); if(ptrace(PTRACE_SYSEMU, pid, 0, 0) < 0) goto fail; @@ -240,7 +230,7 @@ static void __init check_sysemu(void) goto fail; } - if (stop_ptraced_child(pid, stack, 0, 0) < 0) + if (stop_ptraced_child(pid, 0, 0) < 0) goto fail_stopped; sysemu_supported = 1; @@ -248,7 +238,7 @@ static void __init check_sysemu(void) set_using_sysemu(!force_sysemu_disabled); non_fatal("Checking advanced syscall emulation patch for ptrace..."); - pid = start_ptraced_child(); + pid = start_ptraced_child(); if((ptrace(PTRACE_OLDSETOPTIONS, pid, 0, (void *) PTRACE_O_TRACESYSGOOD) < 0)) @@ -279,7 +269,7 @@ static void __init check_sysemu(void) fatal("check_ptrace : expected SIGTRAP or " "(SIGTRAP | 0x80), got status = %d", status); } - if (stop_ptraced_child(pid, stack, 0, 0) < 0) + if (stop_ptraced_child(pid, 0, 0) < 0) goto fail_stopped; sysemu_supported = 2; @@ -290,18 +280,17 @@ static void __init check_sysemu(void) return; fail: - stop_ptraced_child(pid, stack, 1, 0); + stop_ptraced_child(pid, 1, 0); fail_stopped: non_fatal("missing\n"); } static void __init check_ptrace(void) { - void *stack; int pid, syscall, n, status; non_fatal("Checking that ptrace can change system call numbers..."); - pid = start_ptraced_child(); + pid = start_ptraced_child(); if((ptrace(PTRACE_OLDSETOPTIONS, pid, 0, (void *) PTRACE_O_TRACESYSGOOD) < 0)) @@ -331,7 +320,7 @@ static void __init check_ptrace(void) break; } } - stop_ptraced_child(pid, stack, 0, 1); + stop_ptraced_child(pid, 0, 1); non_fatal("OK\n"); check_sysemu(); } @@ -412,11 +401,10 @@ __uml_setup("noptraceldt", noptraceldt_c static inline void check_skas3_ptrace_faultinfo(void) { struct ptrace_faultinfo fi; - void *stack; int pid, n; non_fatal(&q
[RFC PATCH] 2.6.22.6 user-mode linux: before abort, we make it sure all children quit
In a stock 2.6.22.6 kernel, poweroff a user mode linux guest (2.6.22.6 running in skas0 mode) will halt the host linux. I think the reason is the kernel thread abort because of a bug. Then the sys_reboot in process of user mode linux guest is not trapped by the user mode linux kernel and is executed by host. I think it is better to make sure all of our children process to quit when user mode linux kernel abort. Signed-off-by: Lepton Wu <[EMAIL PROTECTED]> diff -X linux-2.6.22.6/Documentation/dontdiff -pru linux-2.6.22.6/arch/um/os-Linux/util.c linux-2.6.22.6-lepton/arch/um/os-Linux/util.c --- linux-2.6.22.6/arch/um/os-Linux/util.c 2007-09-14 17:41:10.0 +0800 +++ linux-2.6.22.6-lepton/arch/um/os-Linux/util.c 2007-09-22 13:56:05.0 +0800 @@ -106,5 +106,6 @@ int setjmp_wrapper(void (*proc)(void *, void os_dump_core(void) { signal(SIGSEGV, SIG_DFL); + kill(0, SIGTERM); abort(); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH] 2.6.22.6 user-mode linux: before abort, we make it sure all children quit
In a stock 2.6.22.6 kernel, poweroff a user mode linux guest (2.6.22.6 running in skas0 mode) will halt the host linux. I think the reason is the kernel thread abort because of a bug. Then the sys_reboot in process of user mode linux guest is not trapped by the user mode linux kernel and is executed by host. I think it is better to make sure all of our children process to quit when user mode linux kernel abort. Signed-off-by: Lepton Wu [EMAIL PROTECTED] diff -X linux-2.6.22.6/Documentation/dontdiff -pru linux-2.6.22.6/arch/um/os-Linux/util.c linux-2.6.22.6-lepton/arch/um/os-Linux/util.c --- linux-2.6.22.6/arch/um/os-Linux/util.c 2007-09-14 17:41:10.0 +0800 +++ linux-2.6.22.6-lepton/arch/um/os-Linux/util.c 2007-09-22 13:56:05.0 +0800 @@ -106,5 +106,6 @@ int setjmp_wrapper(void (*proc)(void *, void os_dump_core(void) { signal(SIGSEGV, SIG_DFL); + kill(0, SIGTERM); abort(); } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH] 2.6.22.6 user-mode linux: No need to new a stack for clone without CLONE_VM
Since we just call clone without CLONE_VM, it is no need to use anoymous mmap to get a new stack frame. Let's keep codes simple. Signed-off-by: Lepton Wu [EMAIL PROTECTED] diff -X linux-2.6.22.6-uml/Documentation/dontdiff -pru linux-2.6.22.6/arch/um/os-Linux/start_up.c linux-2.6.22.6-uml/arch/um/os-Linux/start_up.c --- linux-2.6.22.6/arch/um/os-Linux/start_up.c 2007-09-14 17:41:10.0 +0800 +++ linux-2.6.22.6-uml/arch/um/os-Linux/start_up.c 2007-09-22 23:28:49.0 +0800 @@ -101,19 +101,13 @@ static void non_fatal(char *fmt, ...) fflush(stdout); } -static int start_ptraced_child(void **stack_out) +static int start_ptraced_child(void) { - void *stack; - unsigned long sp; + unsigned long sp = (((unsigned long) sp) ~(UM_KERN_PAGE_SIZE-1)) + + UM_KERN_PAGE_SIZE - sizeof(void *); int pid, n, status; - stack = mmap(NULL, UM_KERN_PAGE_SIZE, -PROT_READ | PROT_WRITE | PROT_EXEC, -MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); - if(stack == MAP_FAILED) - fatal_perror(check_ptrace : mmap failed); - sp = (unsigned long) stack + UM_KERN_PAGE_SIZE - sizeof(void *); - pid = clone(ptrace_child, (void *) sp, SIGCHLD, NULL); + pid = clone(ptrace_child, sp, SIGCHLD, NULL); if(pid 0) fatal_perror(start_ptraced_child : clone failed); CATCH_EINTR(n = waitpid(pid, status, WUNTRACED)); @@ -123,7 +117,6 @@ static int start_ptraced_child(void **st fatal(check_ptrace : expected SIGSTOP, got status = %d, status); - *stack_out = stack; return pid; } @@ -133,7 +126,7 @@ static int start_ptraced_child(void **st * So only for SYSEMU features we test mustpanic, while normal host features * must work anyway! */ -static int stop_ptraced_child(int pid, void *stack, int exitcode, +static int stop_ptraced_child(int pid, int exitcode, int mustexit) { int status, n, ret = 0; @@ -154,8 +147,6 @@ static int stop_ptraced_child(int pid, v ret = -1; } - if(munmap(stack, UM_KERN_PAGE_SIZE) 0) - fatal_perror(check_ptrace : munmap failed); return ret; } @@ -207,13 +198,12 @@ __uml_setup(nosysemu, nosysemu_cmd_par static void __init check_sysemu(void) { - void *stack; unsigned long regs[MAX_REG_NR]; int pid, n, status, count=0; non_fatal(Checking syscall emulation patch for ptrace...); sysemu_supported = 0; - pid = start_ptraced_child(stack); + pid = start_ptraced_child(); if(ptrace(PTRACE_SYSEMU, pid, 0, 0) 0) goto fail; @@ -240,7 +230,7 @@ static void __init check_sysemu(void) goto fail; } - if (stop_ptraced_child(pid, stack, 0, 0) 0) + if (stop_ptraced_child(pid, 0, 0) 0) goto fail_stopped; sysemu_supported = 1; @@ -248,7 +238,7 @@ static void __init check_sysemu(void) set_using_sysemu(!force_sysemu_disabled); non_fatal(Checking advanced syscall emulation patch for ptrace...); - pid = start_ptraced_child(stack); + pid = start_ptraced_child(); if((ptrace(PTRACE_OLDSETOPTIONS, pid, 0, (void *) PTRACE_O_TRACESYSGOOD) 0)) @@ -279,7 +269,7 @@ static void __init check_sysemu(void) fatal(check_ptrace : expected SIGTRAP or (SIGTRAP | 0x80), got status = %d, status); } - if (stop_ptraced_child(pid, stack, 0, 0) 0) + if (stop_ptraced_child(pid, 0, 0) 0) goto fail_stopped; sysemu_supported = 2; @@ -290,18 +280,17 @@ static void __init check_sysemu(void) return; fail: - stop_ptraced_child(pid, stack, 1, 0); + stop_ptraced_child(pid, 1, 0); fail_stopped: non_fatal(missing\n); } static void __init check_ptrace(void) { - void *stack; int pid, syscall, n, status; non_fatal(Checking that ptrace can change system call numbers...); - pid = start_ptraced_child(stack); + pid = start_ptraced_child(); if((ptrace(PTRACE_OLDSETOPTIONS, pid, 0, (void *) PTRACE_O_TRACESYSGOOD) 0)) @@ -331,7 +320,7 @@ static void __init check_ptrace(void) break; } } - stop_ptraced_child(pid, stack, 0, 1); + stop_ptraced_child(pid, 0, 1); non_fatal(OK\n); check_sysemu(); } @@ -412,11 +401,10 @@ __uml_setup(noptraceldt, noptraceldt_c static inline void check_skas3_ptrace_faultinfo(void) { struct ptrace_faultinfo fi; - void *stack; int pid, n; non_fatal( - PTRACE_FAULTINFO...); - pid = start_ptraced_child(stack); + pid = start_ptraced_child(); n = ptrace(PTRACE_FAULTINFO, pid, 0, fi); if (n 0) { @@ -434,13
[PATCH] 2.6.22.6 user-mode linux: use address instead of value as argument in os_free_irq_by_cb
Hi, There is a bug in os_free_irq_by_cb, when the first element of active_fds list is free, the value of active_fds is not updated, just value in stack is updated. The intresting thing is that without this patch, a poweroff in user mode linux guest will halt the host linux system.It seems that after the tracing thread is dead, the syscall to sys_reboot of the traced thread is executed by host. I don't know if it is another bug. Signed-off-by: Lepton Wu <[EMAIL PROTECTED]> diff -X linux-2.6.22.6/Documentation/dontdiff -pru linux-2.6.22.6/arch/um/include/os.h linux-2.6.22.6-lepton/arch/um/include/os.h --- linux-2.6.22.6/arch/um/include/os.h 2007-09-14 17:41:10.0 +0800 +++ linux-2.6.22.6-lepton/arch/um/include/os.h 2007-09-22 12:15:28.0 +0800 @@ -325,7 +325,7 @@ extern void reboot_skas(void); extern int os_waiting_for_events(struct irq_fd *active_fds); extern int os_create_pollfd(int fd, int events, void *tmp_pfd, int size_tmpfds); extern void os_free_irq_by_cb(int (*test)(struct irq_fd *, void *), void *arg, - struct irq_fd *active_fds, struct irq_fd ***last_irq_ptr2); + struct irq_fd **active_fds_ptr, struct irq_fd ***last_irq_ptr2); extern void os_free_irq_later(struct irq_fd *active_fds, int irq, void *dev_id); extern int os_get_pollfd(int i); diff -X linux-2.6.22.6/Documentation/dontdiff -pru linux-2.6.22.6/arch/um/kernel/irq.c linux-2.6.22.6-lepton/arch/um/kernel/irq.c --- linux-2.6.22.6/arch/um/kernel/irq.c 2007-09-14 17:41:10.0 +0800 +++ linux-2.6.22.6-lepton/arch/um/kernel/irq.c 2007-09-22 12:15:05.0 +0800 @@ -218,7 +218,7 @@ static void free_irq_by_cb(int (*test)(s unsigned long flags; spin_lock_irqsave(_lock, flags); - os_free_irq_by_cb(test, arg, active_fds, _irq_ptr); + os_free_irq_by_cb(test, arg, _fds, _irq_ptr); spin_unlock_irqrestore(_lock, flags); } diff -X linux-2.6.22.6/Documentation/dontdiff -pru linux-2.6.22.6/arch/um/os-Linux/irq.c linux-2.6.22.6-lepton/arch/um/os-Linux/irq.c --- linux-2.6.22.6/arch/um/os-Linux/irq.c 2007-09-14 17:41:10.0 +0800 +++ linux-2.6.22.6-lepton/arch/um/os-Linux/irq.c2007-09-22 12:15:42.0 +0800 @@ -84,12 +84,12 @@ int os_create_pollfd(int fd, int events, } void os_free_irq_by_cb(int (*test)(struct irq_fd *, void *), void *arg, - struct irq_fd *active_fds, struct irq_fd ***last_irq_ptr2) + struct irq_fd **active_fds_ptr, struct irq_fd ***last_irq_ptr2) { struct irq_fd **prev; int i = 0; - prev = _fds; + prev = active_fds_ptr; while (*prev != NULL) { if ((*test)(*prev, arg)) { struct irq_fd *old_fd = *prev; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RESEND] 2.6.22.6 networking [ipv4]: fix wrong destination when reply packetes
Now icmp_reply is only called by icmp_echo and icmp_timestamp ip_send_reply is only called by tcp_v4_send_reset and tcp_v4_send_ack I think in all situations the ip_hdr(skb)->saddr is set and should be the destination of reply packets. If using rt->rt_src as destination is correct in some situation, can anyone give me a example? I think perhaps it is a copy and paste from code like ip_build_and_send_pkt, but reply packets in these situations (icmp_echo and icmp_timestamp and tcp_v4_send_ack and tcp_v4_send_reset) is diffrent, I think we can just use ip_hdr(skb)->saddr as destination address. On Thu, Sep 20, 2007 at 09:35:09PM -0700, David Stevens wrote: > I'm not sure why it's using rt_src here, but there are relevant cases that > your description doesn't cover. For example, what happens if the source > is not set in the original packet? Does NAT affect this? > > You quote RFC text for ICMP echo and the case where the receiving machine > is the final destination, but you're modifying code that is used for all > ICMP > types and used for ICMP errors generated when acting as an intermediate > router. > > In ordinary cases, and certainly with ICMP echo when the source is set in > the original packet and no rewriting is going on (and the address is not > spoofed), > using the original source as the destination is fine. But have you tested > or > considered the other cases? > > +-DLS > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RESEND] 2.6.22.6 networking [ipv4]: fix wrong destination when reply packetes
Now icmp_reply is only called by icmp_echo and icmp_timestamp ip_send_reply is only called by tcp_v4_send_reset and tcp_v4_send_ack I think in all situations the ip_hdr(skb)-saddr is set and should be the destination of reply packets. If using rt-rt_src as destination is correct in some situation, can anyone give me a example? I think perhaps it is a copy and paste from code like ip_build_and_send_pkt, but reply packets in these situations (icmp_echo and icmp_timestamp and tcp_v4_send_ack and tcp_v4_send_reset) is diffrent, I think we can just use ip_hdr(skb)-saddr as destination address. On Thu, Sep 20, 2007 at 09:35:09PM -0700, David Stevens wrote: I'm not sure why it's using rt_src here, but there are relevant cases that your description doesn't cover. For example, what happens if the source is not set in the original packet? Does NAT affect this? You quote RFC text for ICMP echo and the case where the receiving machine is the final destination, but you're modifying code that is used for all ICMP types and used for ICMP errors generated when acting as an intermediate router. In ordinary cases, and certainly with ICMP echo when the source is set in the original packet and no rewriting is going on (and the address is not spoofed), using the original source as the destination is fine. But have you tested or considered the other cases? +-DLS - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] 2.6.22.6 user-mode linux: use address instead of value as argument in os_free_irq_by_cb
Hi, There is a bug in os_free_irq_by_cb, when the first element of active_fds list is free, the value of active_fds is not updated, just value in stack is updated. The intresting thing is that without this patch, a poweroff in user mode linux guest will halt the host linux system.It seems that after the tracing thread is dead, the syscall to sys_reboot of the traced thread is executed by host. I don't know if it is another bug. Signed-off-by: Lepton Wu [EMAIL PROTECTED] diff -X linux-2.6.22.6/Documentation/dontdiff -pru linux-2.6.22.6/arch/um/include/os.h linux-2.6.22.6-lepton/arch/um/include/os.h --- linux-2.6.22.6/arch/um/include/os.h 2007-09-14 17:41:10.0 +0800 +++ linux-2.6.22.6-lepton/arch/um/include/os.h 2007-09-22 12:15:28.0 +0800 @@ -325,7 +325,7 @@ extern void reboot_skas(void); extern int os_waiting_for_events(struct irq_fd *active_fds); extern int os_create_pollfd(int fd, int events, void *tmp_pfd, int size_tmpfds); extern void os_free_irq_by_cb(int (*test)(struct irq_fd *, void *), void *arg, - struct irq_fd *active_fds, struct irq_fd ***last_irq_ptr2); + struct irq_fd **active_fds_ptr, struct irq_fd ***last_irq_ptr2); extern void os_free_irq_later(struct irq_fd *active_fds, int irq, void *dev_id); extern int os_get_pollfd(int i); diff -X linux-2.6.22.6/Documentation/dontdiff -pru linux-2.6.22.6/arch/um/kernel/irq.c linux-2.6.22.6-lepton/arch/um/kernel/irq.c --- linux-2.6.22.6/arch/um/kernel/irq.c 2007-09-14 17:41:10.0 +0800 +++ linux-2.6.22.6-lepton/arch/um/kernel/irq.c 2007-09-22 12:15:05.0 +0800 @@ -218,7 +218,7 @@ static void free_irq_by_cb(int (*test)(s unsigned long flags; spin_lock_irqsave(irq_lock, flags); - os_free_irq_by_cb(test, arg, active_fds, last_irq_ptr); + os_free_irq_by_cb(test, arg, active_fds, last_irq_ptr); spin_unlock_irqrestore(irq_lock, flags); } diff -X linux-2.6.22.6/Documentation/dontdiff -pru linux-2.6.22.6/arch/um/os-Linux/irq.c linux-2.6.22.6-lepton/arch/um/os-Linux/irq.c --- linux-2.6.22.6/arch/um/os-Linux/irq.c 2007-09-14 17:41:10.0 +0800 +++ linux-2.6.22.6-lepton/arch/um/os-Linux/irq.c2007-09-22 12:15:42.0 +0800 @@ -84,12 +84,12 @@ int os_create_pollfd(int fd, int events, } void os_free_irq_by_cb(int (*test)(struct irq_fd *, void *), void *arg, - struct irq_fd *active_fds, struct irq_fd ***last_irq_ptr2) + struct irq_fd **active_fds_ptr, struct irq_fd ***last_irq_ptr2) { struct irq_fd **prev; int i = 0; - prev = active_fds; + prev = active_fds_ptr; while (*prev != NULL) { if ((*test)(*prev, arg)) { struct irq_fd *old_fd = *prev; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH RESEND] 2.6.22.6 networking [ipv4]: fix wrong destination when reply packetes
Hi, This is a resend of this patch with more details. I'd like it can be accepted this time. The problem: In icmp_reply and ip_send_reply function, we now use rt->rt_src as destination to send out packets. For packets received in loopback device, this is wrong sometimes. Here is an example (NOTE: eth0 address is set to 10.10.10.1/24): #tcpdump -n -i lo icmp & [1] 3155 ... # hping3 --icmp --spoof 10.10.10.3 10.10.10.1 ... 09:53:49.508449 IP 10.10.10.3 > 10.10.10.1: icmp 8: echo request seq 0 09:53:49.508482 IP 10.10.10.1 > 10.10.10.1: icmp 8: echo reply seq 0 09:53:50.525560 IP 10.10.10.3 > 10.10.10.1: icmp 8: echo request seq 256 09:53:50.525589 IP 10.10.10.1 > 10.10.10.1: icmp 8: echo reply seq 256 The same thing will happend for tcp: # hping3 --syn --destport 1234 --spoof 10.10.10.3 10.10.10.1 (NOTE: there is no service to listen on port 1234) ... 10:02:59.395715 IP 10.10.10.3.2787 > 10.10.10.1.1234: S 72057069:72057069(0) win 512 10:02:59.395746 IP 10.10.10.1.1234 > 10.10.10.1.2787: R 0:0(0) ack 72057070 win 0 As you can see, all destination address is wrong. This problem comes from the fact that the route selection for packetes travle on loopback device only happend once. When we send out packets from loopback device, the skb->dst is assigned in ip_route_output. It won't be reassinged in packetes recveive path. So the rt->rt_src don't equal to ip_hdr(skb)->saddr. I don't know why we must use rt->rt_src as destionation address. at least for icmp reply packets, I thin we should use ip_hdr(skb)->saddr as destionation address. this is according to RFC792: ... Addresses The address of the source in an echo message will be the destination of the echo reply message. To form an echo reply message, the source and destination addresses are simply reversed, the type code changed to 0, and the checksum recomputed. A possible fix is to do ip_route_input in ip_rcv_finish for packtes received in loopback device. But I think just to use ip_hdr(skb)->saddr instead of rt->rt_src as destination to reply packetes is a more simple fix. Thanks Kenan Kalajdzic <[EMAIL PROTECTED]> for help me with more details about this problem. Signed-off-by: Lepton Wu <[EMAIL PROTECTED]> diff -X linux-2.6.22.6/Documentation/dontdiff -pru linux-2.6.22.6/net/ipv4/icmp.c linux-2.6.22.6-lepton/net/ipv4/icmp.c --- linux-2.6.22.6/net/ipv4/icmp.c 2007-09-14 17:41:18.0 +0800 +++ linux-2.6.22.6-lepton/net/ipv4/icmp.c 2007-09-18 09:57:30.0 +0800 @@ -382,6 +382,7 @@ static void icmp_reply(struct icmp_bxm * struct ipcm_cookie ipc; struct rtable *rt = (struct rtable *)skb->dst; __be32 daddr; + struct iphdr *ip = ip_hdr(skb); if (ip_options_echo(_param->replyopts, skb)) return; @@ -393,7 +394,7 @@ static void icmp_reply(struct icmp_bxm * icmp_out_count(icmp_param->data.icmph.type); inet->tos = ip_hdr(skb)->tos; - daddr = ipc.addr = rt->rt_src; + daddr = ipc.addr = ip->saddr; ipc.opt = NULL; if (icmp_param->replyopts.optlen) { ipc.opt = _param->replyopts; diff -X linux-2.6.22.6/Documentation/dontdiff -pru linux-2.6.22.6/net/ipv4/ip_output.c linux-2.6.22.6-lepton/net/ipv4/ip_output.c --- linux-2.6.22.6/net/ipv4/ip_output.c 2007-09-14 17:41:18.0 +0800 +++ linux-2.6.22.6-lepton/net/ipv4/ip_output.c 2007-09-18 09:57:13.0 +0800 @@ -1337,11 +1337,12 @@ void ip_send_reply(struct sock *sk, stru struct ipcm_cookie ipc; __be32 daddr; struct rtable *rt = (struct rtable*)skb->dst; + struct iphdr *ip = ip_hdr(skb); if (ip_options_echo(, skb)) return; - daddr = ipc.addr = rt->rt_src; + daddr = ipc.addr = ip->saddr; ipc.opt = NULL; if (replyopts.opt.optlen) { - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH RESEND] 2.6.22.6 networking [ipv4]: fix wrong destination when reply packetes
Hi, This is a resend of this patch with more details. I'd like it can be accepted this time. The problem: In icmp_reply and ip_send_reply function, we now use rt-rt_src as destination to send out packets. For packets received in loopback device, this is wrong sometimes. Here is an example (NOTE: eth0 address is set to 10.10.10.1/24): #tcpdump -n -i lo icmp [1] 3155 ... # hping3 --icmp --spoof 10.10.10.3 10.10.10.1 ... 09:53:49.508449 IP 10.10.10.3 10.10.10.1: icmp 8: echo request seq 0 09:53:49.508482 IP 10.10.10.1 10.10.10.1: icmp 8: echo reply seq 0 09:53:50.525560 IP 10.10.10.3 10.10.10.1: icmp 8: echo request seq 256 09:53:50.525589 IP 10.10.10.1 10.10.10.1: icmp 8: echo reply seq 256 The same thing will happend for tcp: # hping3 --syn --destport 1234 --spoof 10.10.10.3 10.10.10.1 (NOTE: there is no service to listen on port 1234) ... 10:02:59.395715 IP 10.10.10.3.2787 10.10.10.1.1234: S 72057069:72057069(0) win 512 mss 1414 10:02:59.395746 IP 10.10.10.1.1234 10.10.10.1.2787: R 0:0(0) ack 72057070 win 0 As you can see, all destination address is wrong. This problem comes from the fact that the route selection for packetes travle on loopback device only happend once. When we send out packets from loopback device, the skb-dst is assigned in ip_route_output. It won't be reassinged in packetes recveive path. So the rt-rt_src don't equal to ip_hdr(skb)-saddr. I don't know why we must use rt-rt_src as destionation address. at least for icmp reply packets, I thin we should use ip_hdr(skb)-saddr as destionation address. this is according to RFC792: ... Addresses The address of the source in an echo message will be the destination of the echo reply message. To form an echo reply message, the source and destination addresses are simply reversed, the type code changed to 0, and the checksum recomputed. A possible fix is to do ip_route_input in ip_rcv_finish for packtes received in loopback device. But I think just to use ip_hdr(skb)-saddr instead of rt-rt_src as destination to reply packetes is a more simple fix. Thanks Kenan Kalajdzic [EMAIL PROTECTED] for help me with more details about this problem. Signed-off-by: Lepton Wu [EMAIL PROTECTED] diff -X linux-2.6.22.6/Documentation/dontdiff -pru linux-2.6.22.6/net/ipv4/icmp.c linux-2.6.22.6-lepton/net/ipv4/icmp.c --- linux-2.6.22.6/net/ipv4/icmp.c 2007-09-14 17:41:18.0 +0800 +++ linux-2.6.22.6-lepton/net/ipv4/icmp.c 2007-09-18 09:57:30.0 +0800 @@ -382,6 +382,7 @@ static void icmp_reply(struct icmp_bxm * struct ipcm_cookie ipc; struct rtable *rt = (struct rtable *)skb-dst; __be32 daddr; + struct iphdr *ip = ip_hdr(skb); if (ip_options_echo(icmp_param-replyopts, skb)) return; @@ -393,7 +394,7 @@ static void icmp_reply(struct icmp_bxm * icmp_out_count(icmp_param-data.icmph.type); inet-tos = ip_hdr(skb)-tos; - daddr = ipc.addr = rt-rt_src; + daddr = ipc.addr = ip-saddr; ipc.opt = NULL; if (icmp_param-replyopts.optlen) { ipc.opt = icmp_param-replyopts; diff -X linux-2.6.22.6/Documentation/dontdiff -pru linux-2.6.22.6/net/ipv4/ip_output.c linux-2.6.22.6-lepton/net/ipv4/ip_output.c --- linux-2.6.22.6/net/ipv4/ip_output.c 2007-09-14 17:41:18.0 +0800 +++ linux-2.6.22.6-lepton/net/ipv4/ip_output.c 2007-09-18 09:57:13.0 +0800 @@ -1337,11 +1337,12 @@ void ip_send_reply(struct sock *sk, stru struct ipcm_cookie ipc; __be32 daddr; struct rtable *rt = (struct rtable*)skb-dst; + struct iphdr *ip = ip_hdr(skb); if (ip_options_echo(replyopts.opt, skb)) return; - daddr = ipc.addr = rt-rt_src; + daddr = ipc.addr = ip-saddr; ipc.opt = NULL; if (replyopts.opt.optlen) { - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH] 2.6.22.6 netfilter: sk_setup_caps in ip_make_route_harder
Hi, For local src packets, it is better to update sk_route_caps in ip_route_me_harder. Signed-off-by: Lepton Wu <[EMAIL PROTECTED]> diff -pru -X linux-2.6.22.6/Documentation/dontdiff linux-2.6.22.6/net/ipv4/netfilter.c linux-2.6.22.6-lepton/net/ipv4/netfilter.c --- linux-2.6.22.6/net/ipv4/netfilter.c 2007-09-19 13:19:13.0 +0800 +++ linux-2.6.22.6-lepton/net/ipv4/netfilter.c 2007-09-19 17:10:36.0 +0800 @@ -37,6 +37,10 @@ int ip_route_me_harder(struct sk_buff ** /* Drop old route. */ dst_release((*pskb)->dst); (*pskb)->dst = >u.dst; + if((*pskb)->sk){ + dst_hold((*pskb)->dst); + sk_setup_caps((*pskb)->sk, (*pskb)->dst); + } } else { /* non-local src, find valid iif to satisfy * rp-filter when calling ip_route_input. */ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH] 2.6.22.6 netfilter: sk_setup_caps in ip_make_route_harder
Hi, For local src packets, it is better to update sk_route_caps in ip_route_me_harder. Signed-off-by: Lepton Wu [EMAIL PROTECTED] diff -pru -X linux-2.6.22.6/Documentation/dontdiff linux-2.6.22.6/net/ipv4/netfilter.c linux-2.6.22.6-lepton/net/ipv4/netfilter.c --- linux-2.6.22.6/net/ipv4/netfilter.c 2007-09-19 13:19:13.0 +0800 +++ linux-2.6.22.6-lepton/net/ipv4/netfilter.c 2007-09-19 17:10:36.0 +0800 @@ -37,6 +37,10 @@ int ip_route_me_harder(struct sk_buff ** /* Drop old route. */ dst_release((*pskb)-dst); (*pskb)-dst = rt-u.dst; + if((*pskb)-sk){ + dst_hold((*pskb)-dst); + sk_setup_caps((*pskb)-sk, (*pskb)-dst); + } } else { /* non-local src, find valid iif to satisfy * rp-filter when calling ip_route_input. */ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] 2.6.22.6 NETWORKING [IPV4]: Always use source addr in skb to reply packet
Hi, Sorry for my error. The problem is the current icmp_reply and ip_send_reply will send out packets with wrong destination address. Not wrong source address. My point is that we should always use the source address of packets we received as the destination address of our reply packets. On Mon, Sep 17, 2007 at 08:14:56PM -0700, [EMAIL PROTECTED] wrote: > On Tue, 18 Sep 2007, YOSHIFUJI Hideaki / [EMAIL PROTECTED](B wrote: > > >In article <[EMAIL PROTECTED]> (at Mon, 17 Sep > >2007 19:20:44 -0700 (PDT)), David Miller <[EMAIL PROTECTED]> says: > > > >>From: lepton <[EMAIL PROTECTED]> > >>Date: Tue, 18 Sep 2007 10:16:17 +0800 > >> > >>>Hi, > >>> In some situation, icmp_reply and ip_send_reply will send > >>> out packet with the wrong source addr, the following patch > >>> will fix this. > >>> > >>> I don't understand why we must use rt->rt_src in the current > >>> code, if this is a wrong fix, please correct me. > >>> > >>>Signed-off-by: Lepton Wu <[EMAIL PROTECTED]> > >> > >>That the address is wrong is your opinion only :-) > >> > >>Source address selection is a rather complex topic, and > >>here we are definitely purposefully using the source > >>address selected by the routing lookup for the reply. > > > >And, if you do think something is "wrong", you need to describe it > >in detail, at least. > > I missed the beginning of the discussion, so apologies if I'm way off > base. > > it sounds like the question is, when a packet hits the box that causes a > icmp_reply (or other packet) to be generated, which IP address should be > used as the source > > 1. the destination address of the packet that generated the message > > or. > > 2. the IP address that the machine would use by default if the machine > were to generate a new connection to the destination. > > I understand that in many cases the historical approach has been #2, but > as more machines get multiple IP addresses on each interface, I believe > that it's less of a surprise to other systems if the default is #1. most > of the time the other systems don't care (and useusally don't want to > know) if the service they are contacting is on a dedicated machine or is > just one IP among many sharing a box. > > it gets especially bad when you have load balancing going on and the > results could come from multiple boxes. > > yes, sysadmins deal with this today, but it's a pain to do so and is a > continuing dribble of suprises when things don't quite work the way you > expect them to as you consoldate things onto more powerful systems (or > distribute them among multiple systems). > > if the packet got to the machine and the machine is accepting it, replying > back from the destination IP of that packet should be legitimate (it's > what you would do if there was a full connection after all) and greatly > reduces the cases where things change. > > David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] 2.6.22.6 NETWORKING [IPV4]: Always use source addr in skb to reply packet
Hi, sorry for my previous email. What I mean is icmp_reply and ip_send_reply in some situation will send out packets with wrong DESTINATION address. the source address is always correct. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] 2.6.22.6 NETWORKING [IPV4]: Always use source addr in skb to reply packet
Hi, sorry for lack of details. let's think about ip_send_reply. it is only called by tcp_v4_send_ack and tcp_v4_reset. I don't know why we need a source address diffrent from ip_hdr(skb)->s_addr icmp_reply is only called by icmp_echo and icmp_timestamp. Is there a situation to need we use a source address diffrent from ip_hdr(skb)->s_addr? My situaiton is: I DNAT some tcp packet to my box. some times the box will reply reset or ack packet with tcp_v4_send_ack and tcp_v4_reset, when this happens, it will use the rt->s_addr instead of ip_hdr(skb)->s_addr, then the packet will send out without change the source addr. Becaus netfilter don't know these packets belongs to the DNATed connection. Another people's situaiton is (quoted from email to me): While conducting a research about networking, I discovered improper handling of ICMP echo reply messages in Linux 2.4.26. I looked into the code and noticed that the icmp_reply function sets the destination address in the reply packet to rt->rt_src. This produces strange results in some cases as can be easily shown with hping and tcpdump. Here is an example (NOTE: eth0 address is set to 10.10.10.1/24): # tcpdump -n -i any icmp & [1] 16842 tcpdump: WARNING: Promiscuous mode not supported on the "any" device tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked), capture size 96 bytes # hping2 --icmp --spoof 10.10.10.3 10.10.10.1 HPING 10.10.10.1 (eth0 10.10.10.1): icmp mode set, 28 headers + 0 data bytes 02:16:53.206016 IP 10.10.10.3 > 10.10.10.1: icmp 8: echo request seq 0 02:16:53.206082 IP 10.10.10.1 > 10.10.10.1: icmp 8: echo reply seq 0 02:16:54.202123 IP 10.10.10.3 > 10.10.10.1: icmp 8: echo request seq If ICMP echo requests with a spoofed source address are sent to the address of our eth0 interface (which of course happens through the loopback interface), the code of icmp_reply sets the destination address in the reply to 10.10.10.1 instead of simply reversing the source and destination addresses as required by the RFC. On Tue, Sep 18, 2007 at 11:26:44AM +0900, YOSHIFUJI Hideaki / [EMAIL PROTECTED](B wrote: > In article <[EMAIL PROTECTED]> (at Mon, 17 Sep 2007 19:20:44 -0700 (PDT)), > David Miller <[EMAIL PROTECTED]> says: > > > From: lepton <[EMAIL PROTECTED]> > > Date: Tue, 18 Sep 2007 10:16:17 +0800 > > > > > Hi, > > > In some situation, icmp_reply and ip_send_reply will send > > > out packet with the wrong source addr, the following patch > > > will fix this. > > > > > > I don't understand why we must use rt->rt_src in the current > > > code, if this is a wrong fix, please correct me. > > > > > > Signed-off-by: Lepton Wu <[EMAIL PROTECTED]> > > > > That the address is wrong is your opinion only :-) > > > > Source address selection is a rather complex topic, and > > here we are definitely purposefully using the source > > address selected by the routing lookup for the reply. > > And, if you do think something is "wrong", you need to describe it > in detail, at least. > > --yoshfuji - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] 2.6.22.6 NETWORKING [IPV4]: Always use source addr in skb to reply packet
Hi, In some situation, icmp_reply and ip_send_reply will send out packet with the wrong source addr, the following patch will fix this. I don't understand why we must use rt->rt_src in the current code, if this is a wrong fix, please correct me. Signed-off-by: Lepton Wu <[EMAIL PROTECTED]> diff -X linux-2.6.22.6/Documentation/dontdiff -pru linux-2.6.22.6/net/ipv4/icmp.c linux-2.6.22.6-lepton/net/ipv4/icmp.c --- linux-2.6.22.6/net/ipv4/icmp.c 2007-09-14 17:41:18.0 +0800 +++ linux-2.6.22.6-lepton/net/ipv4/icmp.c 2007-09-18 09:57:30.0 +0800 @@ -382,6 +382,7 @@ static void icmp_reply(struct icmp_bxm * struct ipcm_cookie ipc; struct rtable *rt = (struct rtable *)skb->dst; __be32 daddr; + struct iphdr *ip = ip_hdr(skb); if (ip_options_echo(_param->replyopts, skb)) return; @@ -393,7 +394,7 @@ static void icmp_reply(struct icmp_bxm * icmp_out_count(icmp_param->data.icmph.type); inet->tos = ip_hdr(skb)->tos; - daddr = ipc.addr = rt->rt_src; + daddr = ipc.addr = ip->saddr; ipc.opt = NULL; if (icmp_param->replyopts.optlen) { ipc.opt = _param->replyopts; diff -X linux-2.6.22.6/Documentation/dontdiff -pru linux-2.6.22.6/net/ipv4/ip_output.c linux-2.6.22.6-lepton/net/ipv4/ip_output.c --- linux-2.6.22.6/net/ipv4/ip_output.c 2007-09-14 17:41:18.0 +0800 +++ linux-2.6.22.6-lepton/net/ipv4/ip_output.c 2007-09-18 09:57:13.0 +0800 @@ -1337,11 +1337,12 @@ void ip_send_reply(struct sock *sk, stru struct ipcm_cookie ipc; __be32 daddr; struct rtable *rt = (struct rtable*)skb->dst; + struct iphdr *ip = ip_hdr(skb); if (ip_options_echo(, skb)) return; - daddr = ipc.addr = rt->rt_src; + daddr = ipc.addr = ip->saddr; ipc.opt = NULL; if (replyopts.opt.optlen) { - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] 2.6.22.6 NETWORKING [IPV4]: Always use source addr in skb to reply packet
Hi, In some situation, icmp_reply and ip_send_reply will send out packet with the wrong source addr, the following patch will fix this. I don't understand why we must use rt-rt_src in the current code, if this is a wrong fix, please correct me. Signed-off-by: Lepton Wu [EMAIL PROTECTED] diff -X linux-2.6.22.6/Documentation/dontdiff -pru linux-2.6.22.6/net/ipv4/icmp.c linux-2.6.22.6-lepton/net/ipv4/icmp.c --- linux-2.6.22.6/net/ipv4/icmp.c 2007-09-14 17:41:18.0 +0800 +++ linux-2.6.22.6-lepton/net/ipv4/icmp.c 2007-09-18 09:57:30.0 +0800 @@ -382,6 +382,7 @@ static void icmp_reply(struct icmp_bxm * struct ipcm_cookie ipc; struct rtable *rt = (struct rtable *)skb-dst; __be32 daddr; + struct iphdr *ip = ip_hdr(skb); if (ip_options_echo(icmp_param-replyopts, skb)) return; @@ -393,7 +394,7 @@ static void icmp_reply(struct icmp_bxm * icmp_out_count(icmp_param-data.icmph.type); inet-tos = ip_hdr(skb)-tos; - daddr = ipc.addr = rt-rt_src; + daddr = ipc.addr = ip-saddr; ipc.opt = NULL; if (icmp_param-replyopts.optlen) { ipc.opt = icmp_param-replyopts; diff -X linux-2.6.22.6/Documentation/dontdiff -pru linux-2.6.22.6/net/ipv4/ip_output.c linux-2.6.22.6-lepton/net/ipv4/ip_output.c --- linux-2.6.22.6/net/ipv4/ip_output.c 2007-09-14 17:41:18.0 +0800 +++ linux-2.6.22.6-lepton/net/ipv4/ip_output.c 2007-09-18 09:57:13.0 +0800 @@ -1337,11 +1337,12 @@ void ip_send_reply(struct sock *sk, stru struct ipcm_cookie ipc; __be32 daddr; struct rtable *rt = (struct rtable*)skb-dst; + struct iphdr *ip = ip_hdr(skb); if (ip_options_echo(replyopts.opt, skb)) return; - daddr = ipc.addr = rt-rt_src; + daddr = ipc.addr = ip-saddr; ipc.opt = NULL; if (replyopts.opt.optlen) { - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] 2.6.22.6 NETWORKING [IPV4]: Always use source addr in skb to reply packet
Hi, sorry for lack of details. let's think about ip_send_reply. it is only called by tcp_v4_send_ack and tcp_v4_reset. I don't know why we need a source address diffrent from ip_hdr(skb)-s_addr icmp_reply is only called by icmp_echo and icmp_timestamp. Is there a situation to need we use a source address diffrent from ip_hdr(skb)-s_addr? My situaiton is: I DNAT some tcp packet to my box. some times the box will reply reset or ack packet with tcp_v4_send_ack and tcp_v4_reset, when this happens, it will use the rt-s_addr instead of ip_hdr(skb)-s_addr, then the packet will send out without change the source addr. Becaus netfilter don't know these packets belongs to the DNATed connection. Another people's situaiton is (quoted from email to me): While conducting a research about networking, I discovered improper handling of ICMP echo reply messages in Linux 2.4.26. I looked into the code and noticed that the icmp_reply function sets the destination address in the reply packet to rt-rt_src. This produces strange results in some cases as can be easily shown with hping and tcpdump. Here is an example (NOTE: eth0 address is set to 10.10.10.1/24): # tcpdump -n -i any icmp [1] 16842 tcpdump: WARNING: Promiscuous mode not supported on the any device tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked), capture size 96 bytes # hping2 --icmp --spoof 10.10.10.3 10.10.10.1 HPING 10.10.10.1 (eth0 10.10.10.1): icmp mode set, 28 headers + 0 data bytes 02:16:53.206016 IP 10.10.10.3 10.10.10.1: icmp 8: echo request seq 0 02:16:53.206082 IP 10.10.10.1 10.10.10.1: icmp 8: echo reply seq 0 02:16:54.202123 IP 10.10.10.3 10.10.10.1: icmp 8: echo request seq If ICMP echo requests with a spoofed source address are sent to the address of our eth0 interface (which of course happens through the loopback interface), the code of icmp_reply sets the destination address in the reply to 10.10.10.1 instead of simply reversing the source and destination addresses as required by the RFC. On Tue, Sep 18, 2007 at 11:26:44AM +0900, YOSHIFUJI Hideaki / [EMAIL PROTECTED](B wrote: In article [EMAIL PROTECTED] (at Mon, 17 Sep 2007 19:20:44 -0700 (PDT)), David Miller [EMAIL PROTECTED] says: From: lepton [EMAIL PROTECTED] Date: Tue, 18 Sep 2007 10:16:17 +0800 Hi, In some situation, icmp_reply and ip_send_reply will send out packet with the wrong source addr, the following patch will fix this. I don't understand why we must use rt-rt_src in the current code, if this is a wrong fix, please correct me. Signed-off-by: Lepton Wu [EMAIL PROTECTED] That the address is wrong is your opinion only :-) Source address selection is a rather complex topic, and here we are definitely purposefully using the source address selected by the routing lookup for the reply. And, if you do think something is wrong, you need to describe it in detail, at least. --yoshfuji - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] 2.6.22.6 NETWORKING [IPV4]: Always use source addr in skb to reply packet
Hi, sorry for my previous email. What I mean is icmp_reply and ip_send_reply in some situation will send out packets with wrong DESTINATION address. the source address is always correct. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] 2.6.22.6 NETWORKING [IPV4]: Always use source addr in skb to reply packet
Hi, Sorry for my error. The problem is the current icmp_reply and ip_send_reply will send out packets with wrong destination address. Not wrong source address. My point is that we should always use the source address of packets we received as the destination address of our reply packets. On Mon, Sep 17, 2007 at 08:14:56PM -0700, [EMAIL PROTECTED] wrote: On Tue, 18 Sep 2007, YOSHIFUJI Hideaki / [EMAIL PROTECTED](B wrote: In article [EMAIL PROTECTED] (at Mon, 17 Sep 2007 19:20:44 -0700 (PDT)), David Miller [EMAIL PROTECTED] says: From: lepton [EMAIL PROTECTED] Date: Tue, 18 Sep 2007 10:16:17 +0800 Hi, In some situation, icmp_reply and ip_send_reply will send out packet with the wrong source addr, the following patch will fix this. I don't understand why we must use rt-rt_src in the current code, if this is a wrong fix, please correct me. Signed-off-by: Lepton Wu [EMAIL PROTECTED] That the address is wrong is your opinion only :-) Source address selection is a rather complex topic, and here we are definitely purposefully using the source address selected by the routing lookup for the reply. And, if you do think something is wrong, you need to describe it in detail, at least. I missed the beginning of the discussion, so apologies if I'm way off base. it sounds like the question is, when a packet hits the box that causes a icmp_reply (or other packet) to be generated, which IP address should be used as the source 1. the destination address of the packet that generated the message or. 2. the IP address that the machine would use by default if the machine were to generate a new connection to the destination. I understand that in many cases the historical approach has been #2, but as more machines get multiple IP addresses on each interface, I believe that it's less of a surprise to other systems if the default is #1. most of the time the other systems don't care (and useusally don't want to know) if the service they are contacting is on a dedicated machine or is just one IP among many sharing a box. it gets especially bad when you have load balancing going on and the results could come from multiple boxes. yes, sysadmins deal with this today, but it's a pain to do so and is a continuing dribble of suprises when things don't quite work the way you expect them to as you consoldate things onto more powerful systems (or distribute them among multiple systems). if the packet got to the machine and the machine is accepting it, replying back from the destination IP of that packet should be legitimate (it's what you would do if there was a full connection after all) and greatly reduces the cases where things change. David Lang - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] 2.6.22.6 reiserfs: work around for dead loop in finish_unfinished
Hi, There is possible dead loop in finish_unfinished function. In most situation, the call chain iput -> ... -> reiserfs_delete_inode -> remove_save_link will success. But for some reason such as data corruption, reiserfs_delete_inode fails on reiserfs_do_truncate -> search_for_position_by_key. Then remove_save_link doesn't be called. We always get the same "save_link_key" in the while loop in finish_unfinished function. The following patch adds a check for the possible dead loop and just remove save link when deap loop. (against 2.6.22.6) Signed-off-by: Lepton Wu <[EMAIL PROTECTED]> diff -X linux-2.6.22.6/Documentation/dontdiff -pru linux-2.6.22.6/fs/reiserfs/super.c linux-2.6.22.6-lepton/fs/reiserfs/super.c --- linux-2.6.22.6/fs/reiserfs/super.c 2007-09-14 17:41:15.00000 +0800 +++ linux-2.6.22.6-lepton/fs/reiserfs/super.c 2007-09-16 19:06:39.0 +0800 @@ -144,7 +144,7 @@ static int finish_unfinished(struct supe { INITIALIZE_PATH(path); struct cpu_key max_cpu_key, obj_key; - struct reiserfs_key save_link_key; + struct reiserfs_key save_link_key, last_inode_key; int retval = 0; struct item_head *ih; struct buffer_head *bh; @@ -165,6 +165,8 @@ static int finish_unfinished(struct supe set_cpu_key_k_offset(_cpu_key, ~0U); max_cpu_key.key_length = 3; + memset(_inode_key, 0, sizeof(last_inode_key)); + #ifdef CONFIG_QUOTA /* Needed for iput() to work correctly and not trash data */ if (s->s_flags & MS_ACTIVE) { @@ -277,8 +279,16 @@ static int finish_unfinished(struct supe REISERFS_I(inode)->i_flags |= i_link_saved_unlink_mask; /* not completed unlink (rmdir) found */ reiserfs_info(s, "Removing %k..", INODE_PKEY(inode)); - /* removal gets completed in iput */ - retval = 0; + if(memcmp(_inode_key, INODE_PKEY(inode), + sizeof(last_inode_key))){ + last_inode_key = *INODE_PKEY(inode); + /* removal gets completed in iput */ + retval = 0; + } else { + reiserfs_warning(s, "Dead loop in finish_unfinished detected, just remove save link\n"); + retval = remove_save_link_only(s, + _link_key, 0); + } } iput(inode); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] 2.6.22.6 reiserfs: work around for dead loop in finish_unfinished
Hi, There is possible dead loop in finish_unfinished function. In most situation, the call chain iput - ... - reiserfs_delete_inode - remove_save_link will success. But for some reason such as data corruption, reiserfs_delete_inode fails on reiserfs_do_truncate - search_for_position_by_key. Then remove_save_link doesn't be called. We always get the same save_link_key in the while loop in finish_unfinished function. The following patch adds a check for the possible dead loop and just remove save link when deap loop. (against 2.6.22.6) Signed-off-by: Lepton Wu [EMAIL PROTECTED] diff -X linux-2.6.22.6/Documentation/dontdiff -pru linux-2.6.22.6/fs/reiserfs/super.c linux-2.6.22.6-lepton/fs/reiserfs/super.c --- linux-2.6.22.6/fs/reiserfs/super.c 2007-09-14 17:41:15.0 +0800 +++ linux-2.6.22.6-lepton/fs/reiserfs/super.c 2007-09-16 19:06:39.0 +0800 @@ -144,7 +144,7 @@ static int finish_unfinished(struct supe { INITIALIZE_PATH(path); struct cpu_key max_cpu_key, obj_key; - struct reiserfs_key save_link_key; + struct reiserfs_key save_link_key, last_inode_key; int retval = 0; struct item_head *ih; struct buffer_head *bh; @@ -165,6 +165,8 @@ static int finish_unfinished(struct supe set_cpu_key_k_offset(max_cpu_key, ~0U); max_cpu_key.key_length = 3; + memset(last_inode_key, 0, sizeof(last_inode_key)); + #ifdef CONFIG_QUOTA /* Needed for iput() to work correctly and not trash data */ if (s-s_flags MS_ACTIVE) { @@ -277,8 +279,16 @@ static int finish_unfinished(struct supe REISERFS_I(inode)-i_flags |= i_link_saved_unlink_mask; /* not completed unlink (rmdir) found */ reiserfs_info(s, Removing %k.., INODE_PKEY(inode)); - /* removal gets completed in iput */ - retval = 0; + if(memcmp(last_inode_key, INODE_PKEY(inode), + sizeof(last_inode_key))){ + last_inode_key = *INODE_PKEY(inode); + /* removal gets completed in iput */ + retval = 0; + } else { + reiserfs_warning(s, Dead loop in finish_unfinished detected, just remove save link\n); + retval = remove_save_link_only(s, + save_link_key, 0); + } } iput(inode); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] 2.6.22.6 fix kernel panic on corrupted reiserfs directory
Hi, When reading corrupted reiserfs directory data, d_reclen could be a negative number or a big positive number, this can lead to kernel panic or oop. The following patch adds a sanity check. (against 2.6.20.4) Signed-off-by: Lepton Wu <[EMAIL PROTECTED]> diff -X linux-2.6.22.6-lepton/Documentation/dontdiff -pru linux-2.6.22.6/fs/reiserfs/dir.c linux-2.6.22.6-lepton/fs/reiserfs/dir.c --- linux-2.6.22.6/fs/reiserfs/dir.c2007-09-14 17:41:15.0 +0800 +++ linux-2.6.22.6-lepton/fs/reiserfs/dir.c 2007-09-14 18:02:10.0 +0800 @@ -121,6 +121,16 @@ static int reiserfs_readdir(struct file continue; d_reclen = entry_length(bh, ih, entry_num); d_name = B_I_DEH_ENTRY_FILE_NAME(bh, ih, deh); + + if (d_reclen <= 0 || + d_name + d_reclen > bh->b_data + bh->b_size) { + /* There is corrupted data in entry, +* We'd better stop here */ + pathrelse(_to_entry); + ret = -EIO; + goto out; + } + if (!d_name[d_reclen - 1]) d_reclen = strlen(d_name); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] 2.6.22.6 fix kernel panic on corrupted reiserfs directory
Hi, When reading corrupted reiserfs directory data, d_reclen could be a negative number or a big positive number, this can lead to kernel panic or oop. The following patch adds a sanity check. (against 2.6.20.4) Signed-off-by: Lepton Wu [EMAIL PROTECTED] diff -X linux-2.6.22.6-lepton/Documentation/dontdiff -pru linux-2.6.22.6/fs/reiserfs/dir.c linux-2.6.22.6-lepton/fs/reiserfs/dir.c --- linux-2.6.22.6/fs/reiserfs/dir.c2007-09-14 17:41:15.0 +0800 +++ linux-2.6.22.6-lepton/fs/reiserfs/dir.c 2007-09-14 18:02:10.0 +0800 @@ -121,6 +121,16 @@ static int reiserfs_readdir(struct file continue; d_reclen = entry_length(bh, ih, entry_num); d_name = B_I_DEH_ENTRY_FILE_NAME(bh, ih, deh); + + if (d_reclen = 0 || + d_name + d_reclen bh-b_data + bh-b_size) { + /* There is corrupted data in entry, +* We'd better stop here */ + pathrelse(path_to_entry); + ret = -EIO; + goto out; + } + if (!d_name[d_reclen - 1]) d_reclen = strlen(d_name); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
I have two 1394 port in my computer, why did I get only one eth1394 interface?
Hi, My computer has two 1394 port, one is in the front panel, and another is in the back. I found with linux 1394 ethernet support, I only get one ethernet device named eth1. After read code, I found author says "This is where we add all of our ethernet * devices. One for each host." Then my question is: 1. Is it possible to use every 1394 port as a ethernet device? 2. If not, which port should I plug my firewire line into? 3. If I must plug my firewire line in some port, can I change the default port to use? Thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
I have two 1394 port in my computer, why did I get only one eth1394 interface?
Hi, My computer has two 1394 port, one is in the front panel, and another is in the back. I found with linux 1394 ethernet support, I only get one ethernet device named eth1. After read code, I found author says This is where we add all of our ethernet * devices. One for each host. Then my question is: 1. Is it possible to use every 1394 port as a ethernet device? 2. If not, which port should I plug my firewire line into? 3. If I must plug my firewire line in some port, can I change the default port to use? Thanks. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] 2.6.20.4 fix kernel panic on corrupted reiserfs directory
Yes, you are right. I need more work on my trival patch. On Thu, Apr 05, 2007 at 01:34:42PM +0600, Alexander E. Patrakov wrote: > lepton wrote: > >Hi, > > When reading corrupted reiserfs directory data, d_reclen > > could be a negative number, then memcpy will overflow > > kernel stack. This can lead to kernel panic. > > The following patch adds a sanity check. (against 2.6.20.4) > > Is it possible to get a large positive number here due to other fs > corruption and bypass your sanity check? If I read the code correctly, this > would still oops in the "if" statement just below the part you patched. > > >Signed-off-by: Lepton Wu <[EMAIL PROTECTED]> > > > >diff -pru linux-2.6/fs/reiserfs/dir.c linux-2.6-lepton/fs/reiserfs/dir.c > >--- linux-2.6/fs/reiserfs/dir.c 2007-02-20 14:34:32.0 +0800 > >+++ linux-2.6-lepton/fs/reiserfs/dir.c 2007-04-05 > >14:35:58.0 +0800 > >@@ -121,6 +121,11 @@ static int reiserfs_readdir(struct file > > /* it is hidden entry */ > > continue; > > d_reclen = entry_length(bh, ih, entry_num); > >+if (d_reclen < 0) { > >+pathrelse(_to_entry); > >+ret = -EIO; > >+goto out; > >+} > > d_name = B_I_DEH_ENTRY_FILE_NAME(bh, ih, > > deh); > > if (!d_name[d_reclen - 1]) > > d_reclen = strlen(d_name); > >O - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] 2.6.20.4 fix kernel panic on corrupted reiserfs directory
Hi, When reading corrupted reiserfs directory data, d_reclen could be a negative number, then memcpy will overflow kernel stack. This can lead to kernel panic. The following patch adds a sanity check. (against 2.6.20.4) Signed-off-by: Lepton Wu <[EMAIL PROTECTED]> diff -pru linux-2.6/fs/reiserfs/dir.c linux-2.6-lepton/fs/reiserfs/dir.c --- linux-2.6/fs/reiserfs/dir.c 2007-02-20 14:34:32.0 +0800 +++ linux-2.6-lepton/fs/reiserfs/dir.c 2007-04-05 14:35:58.0 +0800 @@ -121,6 +121,11 @@ static int reiserfs_readdir(struct file /* it is hidden entry */ continue; d_reclen = entry_length(bh, ih, entry_num); + if (d_reclen < 0) { + pathrelse(_to_entry); + ret = -EIO; + goto out; + } d_name = B_I_DEH_ENTRY_FILE_NAME(bh, ih, deh); if (!d_name[d_reclen - 1]) d_reclen = strlen(d_name); O - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] 2.6.20.4 fix kernel panic on corrupted reiserfs directory
Hi, When reading corrupted reiserfs directory data, d_reclen could be a negative number, then memcpy will overflow kernel stack. This can lead to kernel panic. The following patch adds a sanity check. (against 2.6.20.4) Signed-off-by: Lepton Wu [EMAIL PROTECTED] diff -pru linux-2.6/fs/reiserfs/dir.c linux-2.6-lepton/fs/reiserfs/dir.c --- linux-2.6/fs/reiserfs/dir.c 2007-02-20 14:34:32.0 +0800 +++ linux-2.6-lepton/fs/reiserfs/dir.c 2007-04-05 14:35:58.0 +0800 @@ -121,6 +121,11 @@ static int reiserfs_readdir(struct file /* it is hidden entry */ continue; d_reclen = entry_length(bh, ih, entry_num); + if (d_reclen 0) { + pathrelse(path_to_entry); + ret = -EIO; + goto out; + } d_name = B_I_DEH_ENTRY_FILE_NAME(bh, ih, deh); if (!d_name[d_reclen - 1]) d_reclen = strlen(d_name); O - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] 2.6.20.4 fix kernel panic on corrupted reiserfs directory
Yes, you are right. I need more work on my trival patch. On Thu, Apr 05, 2007 at 01:34:42PM +0600, Alexander E. Patrakov wrote: lepton wrote: Hi, When reading corrupted reiserfs directory data, d_reclen could be a negative number, then memcpy will overflow kernel stack. This can lead to kernel panic. The following patch adds a sanity check. (against 2.6.20.4) Is it possible to get a large positive number here due to other fs corruption and bypass your sanity check? If I read the code correctly, this would still oops in the if statement just below the part you patched. Signed-off-by: Lepton Wu [EMAIL PROTECTED] diff -pru linux-2.6/fs/reiserfs/dir.c linux-2.6-lepton/fs/reiserfs/dir.c --- linux-2.6/fs/reiserfs/dir.c 2007-02-20 14:34:32.0 +0800 +++ linux-2.6-lepton/fs/reiserfs/dir.c 2007-04-05 14:35:58.0 +0800 @@ -121,6 +121,11 @@ static int reiserfs_readdir(struct file /* it is hidden entry */ continue; d_reclen = entry_length(bh, ih, entry_num); +if (d_reclen 0) { +pathrelse(path_to_entry); +ret = -EIO; +goto out; +} d_name = B_I_DEH_ENTRY_FILE_NAME(bh, ih, deh); if (!d_name[d_reclen - 1]) d_reclen = strlen(d_name); O - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2.6.12.5]error condition fix in usbnet
Hi! I thinks this condition is strange, it could be a type error. See the following patch. Signed-off-by: Wu Tao <[EMAIL PROTECTED]> diff -pru linux-2.6-curr/drivers/usb/net/usbnet.c linux-2.6-curr-lepton/drivers/usb/net/usbnet.c --- linux-2.6-curr/drivers/usb/net/usbnet.c 2005-06-30 07:00:53.0 +0800 +++ linux-2.6-curr-lepton/drivers/usb/net/usbnet.c 2005-08-24 11:26:49.0 +0800 @@ -3807,7 +3807,7 @@ usbnet_probe (struct usb_interface *udev if ((dev->driver_info->flags & FLAG_ETHER) != 0 && (net->dev_addr [0] & 0x02) == 0) strcpy (net->name, "eth%d"); - } else if (!info->in || info->out) + } else if (!info->in || !info->out) status = get_endpoints (dev, udev); else { dev->in = usb_rcvbulkpipe (xdev, info->in); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2.6.12.5]error condition fix in usbnet
Hi! I thinks this condition is strange, it could be a type error. See the following patch. Signed-off-by: Wu Tao [EMAIL PROTECTED] diff -pru linux-2.6-curr/drivers/usb/net/usbnet.c linux-2.6-curr-lepton/drivers/usb/net/usbnet.c --- linux-2.6-curr/drivers/usb/net/usbnet.c 2005-06-30 07:00:53.0 +0800 +++ linux-2.6-curr-lepton/drivers/usb/net/usbnet.c 2005-08-24 11:26:49.0 +0800 @@ -3807,7 +3807,7 @@ usbnet_probe (struct usb_interface *udev if ((dev-driver_info-flags FLAG_ETHER) != 0 (net-dev_addr [0] 0x02) == 0) strcpy (net-name, eth%d); - } else if (!info-in || info-out) + } else if (!info-in || !info-out) status = get_endpoints (dev, udev); else { dev-in = usb_rcvbulkpipe (xdev, info-in); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2.6.12.5]fix gl_skb/skb type error in genelink driver in usbnet in 2.6
Hi! I think there is a type error when port genelink driver to 2.6.. With this error, a linux host will panic when it link with a windows host. See the following patch. diff -pruNX linux-2.6-curr/Documentation/dontdiff linux-2.6-curr/drivers/usb/net/usbnet.c linux-2.6-curr-lepton/drivers/usb/net/usbnet.c --- linux-2.6-curr/drivers/usb/net/usbnet.c 2005-06-30 07:00:53.0 +0800 +++ linux-2.6-curr-lepton/drivers/usb/net/usbnet.c 2005-08-22 13:55:18.0 +0800 @@ -1922,7 +1922,7 @@ static int genelink_rx_fixup (struct usb // copy the packet data to the new skb memcpy(skb_put(gl_skb, size), packet->packet_data, size); - skb_return (dev, skb); + skb_return (dev, gl_skb); } // advance to the next packet - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2.6.12.5]fix gl_skb/skb type error in genelink driver in usbnet in 2.6
Hi! I think there is a type error when port genelink driver to 2.6.. With this error, a linux host will panic when it link with a windows host. See the following patch. diff -pruNX linux-2.6-curr/Documentation/dontdiff linux-2.6-curr/drivers/usb/net/usbnet.c linux-2.6-curr-lepton/drivers/usb/net/usbnet.c --- linux-2.6-curr/drivers/usb/net/usbnet.c 2005-06-30 07:00:53.0 +0800 +++ linux-2.6-curr-lepton/drivers/usb/net/usbnet.c 2005-08-22 13:55:18.0 +0800 @@ -1922,7 +1922,7 @@ static int genelink_rx_fixup (struct usb // copy the packet data to the new skb memcpy(skb_put(gl_skb, size), packet-packet_data, size); - skb_return (dev, skb); + skb_return (dev, gl_skb); } // advance to the next packet - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Why don't register a d_delete function for externfs_dentry_ops in 2.4 kernel?
Hi! I read about code of linux-2.4.31/arch/um/fs/hostfs/externfs.c I found you have defined a function named exterfs_d_delete, but you don't register this function in externfs_dentry_ops. This behavior is diffrent from the hostfs code in 2.6 kernel It will lead to some strange problem like this: on guest UML box: ls -l /tmp/nonexist ( the result is "file not found", it is correct) IN the directory of host os: touch /tmp/nonexist on guest UML box again: ls -l /tmp/nonexist ( the result keeps "file not found", a incorrect negative dcache hit) I don't kown why 2.4 kernel have not fixed it. What do you think about it? Thanks - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Why don't register a d_delete function for externfs_dentry_ops in 2.4 kernel?
Hi! I read about code of linux-2.4.31/arch/um/fs/hostfs/externfs.c I found you have defined a function named exterfs_d_delete, but you don't register this function in externfs_dentry_ops. This behavior is diffrent from the hostfs code in 2.6 kernel It will lead to some strange problem like this: on guest UML box: ls -l /tmp/nonexist ( the result is file not found, it is correct) IN the directory of host os: touch /tmp/nonexist on guest UML box again: ls -l /tmp/nonexist ( the result keeps file not found, a incorrect negative dcache hit) I don't kown why 2.4 kernel have not fixed it. What do you think about it? Thanks - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/