query re unlink() ... inotify ... open() race

2015-09-11 Thread Pádraig Brady
Hi,

We're noticing a rare race here with open() in tail(1), where this happens:

  tail --follow=name "file"
/* "file" is unlinked() by another process */
read(IN_ATTRIB from inotify); /* for st_nlink-- */
open("file") /* Done to check if deleted, but this succeeds! */

The open() succeeding is surprising. Is that allowed?
The summary of the sequence in the kernel is:

  vfs_unlink() {
mutex_lock(&(dentry->d_inode->i_mutex));
security_inode_unlink(dir, dentry);
try_break_deleg(target, delegated_inode);
dir->i_op->unlink(dir, dentry);
dont_mount(dentry);
detach_mounts(dentry);
mutex_unlock(&(dentry->d_inode->i_mutex));

fsnotify_link_count(target)
d_delete(dentry);
  }

thanks,
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/2] mm: hugetlb: proc: add HugetlbPages field to /proc/PID/smaps

2015-09-07 Thread Pádraig Brady
On 07/09/15 10:52, Pádraig Brady wrote:
> On 07/09/15 07:46, Naoya Horiguchi wrote:
>> On Mon, Sep 07, 2015 at 02:23:44AM +, Horiguchi Naoya(堀口 直也) wrote:
>>> On Mon, Sep 07, 2015 at 02:29:53AM +0100, Pádraig Brady wrote:
>>>> On 20/08/15 09:26, Naoya Horiguchi wrote:
>>>>> Currently /proc/PID/smaps provides no usage info for vma(VM_HUGETLB), 
>>>>> which
>>>>> is inconvenient when we want to know per-task or per-vma base hugetlb 
>>>>> usage.
>>>>> To solve this, this patch adds a new line for hugetlb usage like below:
>>>>>
>>>>>   Size:  20480 kB
>>>>>   Rss:   0 kB
>>>>>   Pss:   0 kB
>>>>>   Shared_Clean:  0 kB
>>>>>   Shared_Dirty:  0 kB
>>>>>   Private_Clean: 0 kB
>>>>>   Private_Dirty: 0 kB
>>>>>   Referenced:0 kB
>>>>>   Anonymous: 0 kB
>>>>>   AnonHugePages: 0 kB
>>>>>   HugetlbPages:  18432 kB
>>>>>   Swap:  0 kB
>>>>>   KernelPageSize: 2048 kB
>>>>>   MMUPageSize:2048 kB
>>>>>   Locked:0 kB
>>>>>   VmFlags: rd wr mr mw me de ht
>>>>>
>>>>> Signed-off-by: Naoya Horiguchi 
>>>>> Acked-by: Joern Engel 
>>>>> Acked-by: David Rientjes 
>>>>> ---
>>>>> v3 -> v4:
>>>>> - suspend Acked-by tag because v3->v4 change is not trivial
>>>>> - I stated in previous discussion that HugetlbPages line can contain page
>>>>>   size info, but that's not necessary because we already have 
>>>>> KernelPageSize
>>>>>   info.
>>>>> - merged documentation update, where the current documentation doesn't 
>>>>> mention
>>>>>   AnonHugePages, so it's also added.
>>>>> ---
>>>>>  Documentation/filesystems/proc.txt |  7 +--
>>>>>  fs/proc/task_mmu.c | 29 +
>>>>>  2 files changed, 34 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git v4.2-rc4/Documentation/filesystems/proc.txt 
>>>>> v4.2-rc4_patched/Documentation/filesystems/proc.txt
>>>>> index 6f7fafde0884..22e40211ef64 100644
>>>>> --- v4.2-rc4/Documentation/filesystems/proc.txt
>>>>> +++ v4.2-rc4_patched/Documentation/filesystems/proc.txt
>>>>> @@ -423,6 +423,8 @@ Private_Clean: 0 kB
>>>>>  Private_Dirty: 0 kB
>>>>>  Referenced:  892 kB
>>>>>  Anonymous: 0 kB
>>>>> +AnonHugePages: 0 kB
>>>>> +HugetlbPages:  0 kB
>>>>>  Swap:  0 kB
>>>>>  KernelPageSize:4 kB
>>>>>  MMUPageSize:   4 kB
>>>>> @@ -440,8 +442,9 @@ indicates the amount of memory currently marked as 
>>>>> referenced or accessed.
>>>>>  "Anonymous" shows the amount of memory that does not belong to any file. 
>>>>>  Even
>>>>>  a mapping associated with a file may contain anonymous pages: when 
>>>>> MAP_PRIVATE
>>>>>  and a page is modified, the file page is replaced by a private anonymous 
>>>>> copy.
>>>>> -"Swap" shows how much would-be-anonymous memory is also used, but out on
>>>>> -swap.
>>>>> +"AnonHugePages" shows the ammount of memory backed by transparent 
>>>>> hugepage.
>>>>> +"HugetlbPages" shows the ammount of memory backed by hugetlbfs page.
>>>>> +"Swap" shows how much would-be-anonymous memory is also used, but out on 
>>>>> swap.
>>>>
>>>> There is no distinction between "private" and "shared" in this "huge page" 
>>>> accounting right?
>>>
>>> Right for current version. And I think that private/shared distinction
>>> gives some help.
>>>
>>>> Would it be possible to account for the huge pages in the 
>>>> {Private,Shared}_{Clean,Dirty} fields?
>>>> Or otherwise split the huge page accounting into shared/private?
>>
>> Sorry, I didn't catch you properly.
>> I think that accounting for hugetlb pages should be done only with 
>> HugetlbPages
>> or any other new field for hugetlb, in order not to break the behavior of 
>> existing
>> fields. 
> 
> On a more general note I'd be inclined to just account
> for hugetlb pages in Rss and {Private,Shared}_Dirty
> and fix any tools that double count.

By the same argument I presume the existing THP "AnonHugePages" smaps field
is not accounted for in the {Private,Shared}_... fields?
I.E. AnonHugePages may also benefit from splitting to Private/Shared?

thanks,
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/2] mm: hugetlb: proc: add HugetlbPages field to /proc/PID/smaps

2015-09-07 Thread Pádraig Brady
On 07/09/15 07:46, Naoya Horiguchi wrote:
> On Mon, Sep 07, 2015 at 02:23:44AM +, Horiguchi Naoya(堀口 直也) wrote:
>> On Mon, Sep 07, 2015 at 02:29:53AM +0100, Pádraig Brady wrote:
>>> On 20/08/15 09:26, Naoya Horiguchi wrote:
>>>> Currently /proc/PID/smaps provides no usage info for vma(VM_HUGETLB), which
>>>> is inconvenient when we want to know per-task or per-vma base hugetlb 
>>>> usage.
>>>> To solve this, this patch adds a new line for hugetlb usage like below:
>>>>
>>>>   Size:  20480 kB
>>>>   Rss:   0 kB
>>>>   Pss:   0 kB
>>>>   Shared_Clean:  0 kB
>>>>   Shared_Dirty:  0 kB
>>>>   Private_Clean: 0 kB
>>>>   Private_Dirty: 0 kB
>>>>   Referenced:0 kB
>>>>   Anonymous: 0 kB
>>>>   AnonHugePages: 0 kB
>>>>   HugetlbPages:  18432 kB
>>>>   Swap:  0 kB
>>>>   KernelPageSize: 2048 kB
>>>>   MMUPageSize:2048 kB
>>>>   Locked:0 kB
>>>>   VmFlags: rd wr mr mw me de ht
>>>>
>>>> Signed-off-by: Naoya Horiguchi 
>>>> Acked-by: Joern Engel 
>>>> Acked-by: David Rientjes 
>>>> ---
>>>> v3 -> v4:
>>>> - suspend Acked-by tag because v3->v4 change is not trivial
>>>> - I stated in previous discussion that HugetlbPages line can contain page
>>>>   size info, but that's not necessary because we already have 
>>>> KernelPageSize
>>>>   info.
>>>> - merged documentation update, where the current documentation doesn't 
>>>> mention
>>>>   AnonHugePages, so it's also added.
>>>> ---
>>>>  Documentation/filesystems/proc.txt |  7 +--
>>>>  fs/proc/task_mmu.c | 29 +
>>>>  2 files changed, 34 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git v4.2-rc4/Documentation/filesystems/proc.txt 
>>>> v4.2-rc4_patched/Documentation/filesystems/proc.txt
>>>> index 6f7fafde0884..22e40211ef64 100644
>>>> --- v4.2-rc4/Documentation/filesystems/proc.txt
>>>> +++ v4.2-rc4_patched/Documentation/filesystems/proc.txt
>>>> @@ -423,6 +423,8 @@ Private_Clean: 0 kB
>>>>  Private_Dirty: 0 kB
>>>>  Referenced:  892 kB
>>>>  Anonymous: 0 kB
>>>> +AnonHugePages: 0 kB
>>>> +HugetlbPages:  0 kB
>>>>  Swap:  0 kB
>>>>  KernelPageSize:4 kB
>>>>  MMUPageSize:   4 kB
>>>> @@ -440,8 +442,9 @@ indicates the amount of memory currently marked as 
>>>> referenced or accessed.
>>>>  "Anonymous" shows the amount of memory that does not belong to any file.  
>>>> Even
>>>>  a mapping associated with a file may contain anonymous pages: when 
>>>> MAP_PRIVATE
>>>>  and a page is modified, the file page is replaced by a private anonymous 
>>>> copy.
>>>> -"Swap" shows how much would-be-anonymous memory is also used, but out on
>>>> -swap.
>>>> +"AnonHugePages" shows the ammount of memory backed by transparent 
>>>> hugepage.
>>>> +"HugetlbPages" shows the ammount of memory backed by hugetlbfs page.
>>>> +"Swap" shows how much would-be-anonymous memory is also used, but out on 
>>>> swap.
>>>
>>> There is no distinction between "private" and "shared" in this "huge page" 
>>> accounting right?
>>
>> Right for current version. And I think that private/shared distinction
>> gives some help.
>>
>>> Would it be possible to account for the huge pages in the 
>>> {Private,Shared}_{Clean,Dirty} fields?
>>> Or otherwise split the huge page accounting into shared/private?
> 
> Sorry, I didn't catch you properly.
> I think that accounting for hugetlb pages should be done only with 
> HugetlbPages
> or any other new field for hugetlb, in order not to break the behavior of 
> existing
> fields. 

On a more general note I'd be inclined to just account
for hugetlb pages in Rss and {Private,Shared}_Dirty
and fix any tools that double count.

> So splitting HugetlbPages into shared/private looks good to me.

Yes this is the most compatible solution,
and will allow one to accurately determine
how much core mem a process is using.

thanks!
Pádraig.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/2] mm: hugetlb: proc: add HugetlbPages field to /proc/PID/smaps

2015-09-06 Thread Pádraig Brady
On 20/08/15 09:26, Naoya Horiguchi wrote:
> Currently /proc/PID/smaps provides no usage info for vma(VM_HUGETLB), which
> is inconvenient when we want to know per-task or per-vma base hugetlb usage.
> To solve this, this patch adds a new line for hugetlb usage like below:
> 
>   Size:  20480 kB
>   Rss:   0 kB
>   Pss:   0 kB
>   Shared_Clean:  0 kB
>   Shared_Dirty:  0 kB
>   Private_Clean: 0 kB
>   Private_Dirty: 0 kB
>   Referenced:0 kB
>   Anonymous: 0 kB
>   AnonHugePages: 0 kB
>   HugetlbPages:  18432 kB
>   Swap:  0 kB
>   KernelPageSize: 2048 kB
>   MMUPageSize:2048 kB
>   Locked:0 kB
>   VmFlags: rd wr mr mw me de ht
> 
> Signed-off-by: Naoya Horiguchi 
> Acked-by: Joern Engel 
> Acked-by: David Rientjes 
> ---
> v3 -> v4:
> - suspend Acked-by tag because v3->v4 change is not trivial
> - I stated in previous discussion that HugetlbPages line can contain page
>   size info, but that's not necessary because we already have KernelPageSize
>   info.
> - merged documentation update, where the current documentation doesn't mention
>   AnonHugePages, so it's also added.
> ---
>  Documentation/filesystems/proc.txt |  7 +--
>  fs/proc/task_mmu.c | 29 +
>  2 files changed, 34 insertions(+), 2 deletions(-)
> 
> diff --git v4.2-rc4/Documentation/filesystems/proc.txt 
> v4.2-rc4_patched/Documentation/filesystems/proc.txt
> index 6f7fafde0884..22e40211ef64 100644
> --- v4.2-rc4/Documentation/filesystems/proc.txt
> +++ v4.2-rc4_patched/Documentation/filesystems/proc.txt
> @@ -423,6 +423,8 @@ Private_Clean: 0 kB
>  Private_Dirty: 0 kB
>  Referenced:  892 kB
>  Anonymous: 0 kB
> +AnonHugePages: 0 kB
> +HugetlbPages:  0 kB
>  Swap:  0 kB
>  KernelPageSize:4 kB
>  MMUPageSize:   4 kB
> @@ -440,8 +442,9 @@ indicates the amount of memory currently marked as 
> referenced or accessed.
>  "Anonymous" shows the amount of memory that does not belong to any file.  
> Even
>  a mapping associated with a file may contain anonymous pages: when 
> MAP_PRIVATE
>  and a page is modified, the file page is replaced by a private anonymous 
> copy.
> -"Swap" shows how much would-be-anonymous memory is also used, but out on
> -swap.
> +"AnonHugePages" shows the ammount of memory backed by transparent hugepage.
> +"HugetlbPages" shows the ammount of memory backed by hugetlbfs page.
> +"Swap" shows how much would-be-anonymous memory is also used, but out on 
> swap.

There is no distinction between "private" and "shared" in this "huge page" 
accounting right?
Would it be possible to account for the huge pages in the 
{Private,Shared}_{Clean,Dirty} fields?
Or otherwise split the huge page accounting into shared/private?

thanks!
Pádraig.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] watchdog: Add support for keepalives triggered by infrastructure

2015-08-04 Thread Pádraig Brady
On 04/08/15 03:13, Guenter Roeck wrote:
> The watchdog infrastructure is currently purely passive, meaning
> it only passes information from user space to drivers and vice versa.
> 
> Since watchdog hardware tends to have its own quirks, this can result
> in quite complex watchdog drivers. A number of scanarios are especially 
> common.
> 
> - A watchdog is always active and can not be disabled, or can not be disabled
>   once enabled. To support such hardware, watchdog drivers have to implement
>   their own timers and use those timers to trigger watchdog keepalives while
>   the watchdog device is not or not yet opened.
> - A variant of this is the desire to enable a watchdog as soon as its driver
>   has been instantiated, to protect the system while it is still booting up,
>   but the watchdog daemon is not yet running.

Just mentioning that patting the watchdog in the boot loader
(by patching grub etc.) can be a more general solution here as it
avoids hangs if the kernel crashes before it runs the watchdog driver,
which is especially true if PXE loaded across the net for example.
Also this tends to be better spaced between boot start and user space loading.

> - Some watchdogs have a very short maximum timeout, in the range of just a few
>   seconds. Such low timeouts are difficult if not impossible to support from
>   user space. Drivers supporting such watchdog hardware need to implement
>   a timer function to augment heartbeats from user space.

Fair enough.

thanks,
Pádraig.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] tags: much faster, parallel "make tags"

2015-05-10 Thread Pádraig Brady
On 10/05/15 14:26, Alexey Dobriyan wrote:
> On Sat, May 09, 2015 at 06:07:18AM +0100, Pádraig Brady wrote:
>> On 08/05/15 14:26, Alexey Dobriyan wrote:
> 
>>>  exuberant()
>>>  {
>>> -   all_target_sources | xargs $1 -a\
>>> +   rm -f .make-tags.*
>>> +
>>> +   all_target_sources >.make-tags.src
>>> +   NR_CPUS=$(getconf _NPROCESSORS_ONLN 2>/dev/null || echo 1)
>>
>> `nproc` is simpler and available since coreutils 8.1 (2009-11-18)
> 
> nproc was discarded because getconf is standartized.

Note getconf doesn't honor CPU affinity which may be fine here?

$ taskset -c 0 getconf _NPROCESSORS_ONLN
4
$ taskset -c 0 nproc
1

>>> +   NR_LINES=$(wc -l <.make-tags.src)
>>> +   NR_LINES=$((($NR_LINES + $NR_CPUS - 1) / $NR_CPUS))
>>> +
>>> +   split -a 6 -d -l $NR_LINES .make-tags.src .make-tags.src.
>>
>> `split -d -nl/$(nproc)` is simpler and available since coreutils 8.8 
>> (2010-12-22)
> 
> -nl/ can't count and always make first file somewhat bigger, which is
> suspicious. What else it can't do right?

It avoids the overhead of reading all data and counting the lines,
by splitting the data into approx equal numbers of lines as detailed at:
http://gnu.org/s/coreutils/split

>>> +   sort .make-tags.* >>$2
>>> +   rm -f .make-tags.*
>>
>> Using sort --merge would speed up significantly?
> 
> By ~1 second, yes.
> 
>> Even faster would be to get sort to skip the header lines, avoiding the need 
>> for sed.
>> It's a bit awkward and was discussed at:
>> http://lists.gnu.org/archive/html/coreutils/2013-01/msg00027.html
>> Summarising that, is if not using merge you can:
>>
>>   tlines=$(($(wc -l < "$2") + 1))
>>   tail -q -n+$tlines .make-tags.* | LC_ALL=C sort >>$2
>>
>> Or if merge is appropriate then:
>>
>>   tlines=$(($(wc -l < "$2") + 1))
>>   eval "eval LC_ALL=C sort -m '<(tail -n+$tlines 
>> .make-tags.'{1..$(nproc)}')'" >>$2
> 
> Might as well teach ctags to do real parallel processing.
> LC_* are set by top level Makefile.
> 
>> p.p.s. You may want to `trap EXIT cleanup` to rm -f .make-tags.*
> 
> The real question is how to kill ctags reliably.
> Naive
> 
>   trap 'kill $(jobs -p); rm -f .make-tags.*' TERM INT
> 
> doesn't work.
> 
> Files are removed, but processes aren't.

Is $(jobs -p) generating the correct list?
On an interactive shell here it is.
Perhaps you need to explicitly use #!/bin/sh -m
at the start to enable job control like that?
Another option would be to append each background $! pid
to a list and kill that list.
Note also you may want to `wait` after the kill too.

cheers,
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] tags: much faster, parallel "make tags"

2015-05-08 Thread Pádraig Brady
On 08/05/15 14:26, Alexey Dobriyan wrote:
> ctags is single-threaded program. Split list of files to be tagged into
> equal parts, 1 part for each CPU and then merge the results.
> 
> Speedup on one 2-way box I have is ~143 s => ~99 s (-31%).
> On another 4-way box: ~120 s => ~65 s (-46%!).
> 
> Resulting "tags" files aren't byte-for-byte identical because ctags
> program numbers anon struct and enum declarations with "__anonNNN"
> symbols. If those lines are removed, "tags" file becomes byte-for-byte
> identical with those generated with current code.
> 
> Signed-off-by: Alexey Dobriyan 
> ---
> 
>  scripts/tags.sh |   36 +++-
>  1 file changed, 31 insertions(+), 5 deletions(-)
> 
> --- a/scripts/tags.sh
> +++ b/scripts/tags.sh
> @@ -152,7 +152,19 @@ dogtags()
>  
>  exuberant()
>  {
> - all_target_sources | xargs $1 -a\
> + rm -f .make-tags.*
> +
> + all_target_sources >.make-tags.src
> + NR_CPUS=$(getconf _NPROCESSORS_ONLN 2>/dev/null || echo 1)

`nproc` is simpler and available since coreutils 8.1 (2009-11-18)

> + NR_LINES=$(wc -l <.make-tags.src)
> + NR_LINES=$((($NR_LINES + $NR_CPUS - 1) / $NR_CPUS))
> +
> + split -a 6 -d -l $NR_LINES .make-tags.src .make-tags.src.

`split -d -nl/$(nproc)` is simpler and available since coreutils 8.8 
(2010-12-22)

> +
> + for i in .make-tags.src.*; do
> + N=$(echo $i | sed -e 's/.*\.//')
> + # -u: don't sort now, sort later
> + xargs <$i $1 -a -f .make-tags.$N -u \
>   -I __initdata,__exitdata,__initconst,   \
>   -I __cpuinitdata,__initdata_memblock\
>   -I __refdata,__attribute,__maybe_unused,__always_unused \
> @@ -211,7 +223,21 @@ exuberant()
>   --regex-c='/DEFINE_PCI_DEVICE_TABLE\((\w*)/\1/v/'   \
>   --regex-c='/(^\s)OFFSET\((\w*)/\2/v/'   \
>   --regex-c='/(^\s)DEFINE\((\w*)/\2/v/'   \
> - --regex-c='/DEFINE_HASHTABLE\((\w*)/\1/v/'
> + --regex-c='/DEFINE_HASHTABLE\((\w*)/\1/v/'  \
> + &
> + done
> + wait
> + rm -f .make-tags.src .make-tags.src.*
> +
> + # write header
> + $1 -f $2 /dev/null
> + # remove headers
> + for i in .make-tags.*; do
> + sed -i -e '/^!/d' $i &
> + done
> + wait
> + sort .make-tags.* >>$2
> + rm -f .make-tags.*

Using sort --merge would speed up significantly?

Even faster would be to get sort to skip the header lines, avoiding the need 
for sed.
It's a bit awkward and was discussed at:
http://lists.gnu.org/archive/html/coreutils/2013-01/msg00027.html
Summarising that, is if not using merge you can:

  tlines=$(($(wc -l < "$2") + 1))
  tail -q -n+$tlines .make-tags.* | LC_ALL=C sort >>$2

Or if merge is appropriate then:

  tlines=$(($(wc -l < "$2") + 1))
  eval "eval LC_ALL=C sort -m '<(tail -n+$tlines .make-tags.'{1..$(nproc)}')'" 
>>$2

Note eval is fine here as inputs are controlled within the script

cheers,
Pádraig.

p.s. To avoid temp files altogether you could wire everything up through fifos,
though that's probably overkill here TBH

p.p.s. You may want to `trap EXIT cleanup` to rm -f .make-tags.*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ftracetest: Do not use usleep

2015-03-30 Thread Pádraig Brady
On 31/03/15 01:48, Namhyung Kim wrote:
> Hi Steve,
> 
> On Mon, Mar 30, 2015 at 05:15:11PM -0400, Steven Rostedt wrote:
>> On Thu, 26 Mar 2015 09:32:23 +0900
>> Namhyung Kim  wrote:
>>
>>> The usleep is only provided on distros from Redhat so running ftracetest
>>> on other distro resulted in failures due to the missing usleep.
>>>
>>> The reason of using [u]sleep in the test was to generate (scheduler)
>>> events. But as we use 'cat trace | grep | wc -l' to read the events,
>>> the command themselves already generate some events before reading the
>>> trace file so no need to call [u]sleep explicitly.
>>
>> Note, opening "trace" via cat stops tracing. There is a possible race
>> where the cat will not produce events. My worry is that if the shell
>> implements its own "cat" command, it may not fork, and open the trace
>> file. Which would not have any events in it, and opening it will
>> disable the rest of the command from having events.
> 
> I understand your point.  But this is not just cat, it needs grep and
> wc also.  So I think there should be scheduler event(s).
> 
>>
>> What about using:
>>
>>  ping localhost -c 1
>>
>> ?
> 
> I'm okay with ping though but worried if some tiny system might lack
> the ping command..

I'd use a fallback method like:

  yield() { sleep .001 || usleep 1 || sleep 1; }

Then just s/usleep 1/yield/

cheers,
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ftracetest: replace usleep by sleep

2015-03-25 Thread Pádraig Brady
On 25/03/15 18:57, Steven Rostedt wrote:
> On Wed, 25 Mar 2015 17:36:34 +
> Luis Henriques  wrote:
> 
>> 'usleep' seems to be a distro-specific utility and may not be
>> available:
>>
>>  [5] event tracing - enable/disable with event level files   [FAIL]
>>  execute: 
>> /home/miguel/linux/tools/testing/selftests/ftrace/test.d/event/event-enable.tc
>>  + . 
>> /home/miguel/linux/tools/testing/selftests/ftrace/test.d/event/event-enable.tc
>>  + [ ! -f set_event -o ! -d events/sched ]
>>  + reset_tracer
>>  + echo nop
>>  + do_reset
>>  + echo
>>  + clear_trace
>>  + echo
>>  + echo sched:sched_switch
>>  + usleep 1
>>  ./ftracetest: 24: 
>> /home/miguel/linux/tools/testing/selftests/ftrace/test.d/event/event-enable.tc:
>>  usleep: not found
>>
>> Replace it with the more standard sleep.
>>
>> Signed-off-by: Luis Henriques 
>> ---
>>  tools/testing/selftests/ftrace/test.d/event/event-enable.tc | 6 +++---
>>  tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc | 6 +++---
>>  2 files changed, 6 insertions(+), 6 deletions(-)
>>
>> diff --git a/tools/testing/selftests/ftrace/test.d/event/event-enable.tc 
>> b/tools/testing/selftests/ftrace/test.d/event/event-enable.tc
>> index 668616d9bb03..abafc0c3605c 100644
>> --- a/tools/testing/selftests/ftrace/test.d/event/event-enable.tc
>> +++ b/tools/testing/selftests/ftrace/test.d/event/event-enable.tc
>> @@ -21,7 +21,7 @@ reset_tracer
>>  do_reset
>>  
>>  echo 'sched:sched_switch' > set_event
>> -usleep 1
>> +sleep 0.001
> 
> We had patches out about this, because sleep 0.001 is not always
> supported either.

really?
In that edge case you might:

  sleep .001 || sleep 1

Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RFC: More functions allowed with O_PATH

2015-01-27 Thread Pádraig Brady
Since fsync(), fdatasync(), syncfs() work on an identifying descriptor,
and all work against a read-only file for example,
should any/all these functions work with a descriptor opened with O_PATH ?

thanks,
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] modsign: use shred to overwrite the private key before deleting it

2015-01-24 Thread Pádraig Brady
On 24/01/15 12:29, Alexander Holler wrote:
> Am 24.01.2015 um 13:09 schrieb Alexander Holler:
>> Am 24.01.2015 um 12:37 schrieb Alexander Holler:
>>> Am 24.01.2015 um 11:45 schrieb Alexander Holler:
>>>
 It uses shred, in the hope it will somedays learn how to shred stuff on
 FLASH based devices securely too, once that has become possible.
>>>
>>> BTW: This is a good example where technology failed to keep the needs of
>>> users in mind.
>>
>> Failed completely.
>>
>> Since ever it's a problem for people to securely delete files on storage.
>>
>> Also it should be very simple to securely erase files on block based
>> devices, people have to try cruel ways in the hope to get securely rid
>> of files nobody else should be able to see ever again.
>>
>> It's almost unbelievable how completely the IT industry (including the
>> field I'm working myself: SW) failed in regard to that since 30 years or
>> even more.
> 
> And it isn't such that this is a new requirement. Humans are doing such 
> since thousands of years. They use fire to get rid of paper documents 
> and even the old egypts were able to destroyed stuff on stones by using 
> simple steps. Just the IT failed completely.
> 
> Really unbelievable.
> 
> So, sorry if anyone got bored by this mail, but I think that really has 
> to be said and repeated.

Well not failed completely, just used a different method (encryption).

As for "shredding", that improves in effectiveness the lower you go.
I.E. it's effective for the whole file system (SSD range), or whole device.

Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] modsign: overwrite keys with zero before deleting them

2015-01-23 Thread Pádraig Brady
On 24/01/15 00:13, Alexander Holler wrote:
> Am 24.01.2015 um 00:58 schrieb David Howells:
>> Alexander Holler  wrote:
>>
>>> This is for the more paranoid people, also it's
>>> questionable what paranoid nowadays means.
>>
>> shred?
> 
> Seems to do the same like when using dd, just that it does it moultiple
> times.
> 
> And according to an article I've read some years ago, overwrriting a
> blocks on harddisks multiple times doesn't really make sense because
> doing it just once is enough (the necessity to do it multiple times
> seems to have been one of these unexplainable myths in the IT) .
> 
> So I've no idea if it's worth to use shred and have no idea if it's part
> of any GNU/Linux system (seems likely as it it's part of coreutils), how
> it's maintained and how long it will be available.
> 
> But if requested, I will replace that dd with shred or just feel free to
> do it yourself.

shred is in the same package as dd (coreutils).
It's a bit more paranoid about syncing.
It also tries to write the exact size of the file,
and then rounded up block sizes to decrease the
chance of file system reallocation.
Agreed on the multiple writes being quite futile these days.
Generally overwriting with dd or shred etc. is only useful
at the device level rather than at the file system level.
Anyway to be slightly more paranoid and explicit you could:

  shred -n1 ./signing_key.priv

Pádraig.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] The SIGINFO signal from BSD

2014-11-06 Thread Pádraig Brady
On 11/05/2014 11:13 PM, Martin Tournoij wrote:
> On Wed, Nov 5, 2014, at 20:31, Austin S Hemmelgarn wrote:
>> The people to talk to about that for the core 
>> utilities on Linux would be the maintainers of the GNU coreutils, or 
>> whatever your distribution might use in their place (I think it's very 
>> unlikely that busybox or toybox would implement it however).
> 
> Well, if the kernel doesn't provide the feature, then we can be sure it
> will never be implemented :-)
> I thought this was a good place to start asking,, and even if GNU
> coreutils opt to not implement this for whatever reasons, other
> applications still can (mine will!)

GNU coreutils dd will already support SIGINFO if available
(and if dd is recompiled with appropriate headers).
dd currently emulates that using SIGUSR1 though as you say
that's awkward to support robustly, and there has been
a recent GNU coreutils change in relation to that:
http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commit;h=27d2c738

I like the idea of making SIGINFO generally available
and GNU coreutils at least would add extra handling
to cp etc. if that was the case.

thanks,
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


nanosleep truncated on 64 bit Linux by 292 billion years

2014-10-26 Thread Pádraig Brady
I noticed that nanosleep() on 64 bit, "only" supports 292 years,
rather than the full potential 292 billion years with 64 bit time_t, due to:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/linux/time.h?id=refs/tags/v3.16#n87

Attached is a program from Paul Eggert that illustrates the bug.
Running this program on a buggy host outputs something like this:

  Setting alarm for 1 second from now ...
  Sleeping for 9223372036854775807.9 seconds...
  After alarm sent off, remaining time is 9223357678.462306617 seconds;
  i.e., nanosleep claimed that it slept for about 293079448610.606445 years.

Gnulib-using applications have a workaround for this bug, but a workaround
shouldn't be necessary.  For what it's worth, the bug is fixed in Solaris 11 
(x86-64),
though it's present in Solaris 10 (64-bit sparc).

thanks,
Pádraig.
#include 
#include 
#include 
#include 
#include 
#include 

static void
check_for_SIGALRM (int sig)
{
  if (sig != SIGALRM)
_exit (1);
}

int
main (void)
{
  static struct sigaction act;
  struct timespec forever, remaining;
  time_t time_t_max = (1ull << (sizeof time_t_max * CHAR_BIT - 1)) - 1;
  act.sa_handler = check_for_SIGALRM;
  sigemptyset (&act.sa_mask);
  sigaction (SIGALRM, &act, NULL);
  forever.tv_sec = time_t_max;
  forever.tv_nsec = 9;
  printf ("Setting alarm for 1 second from now ...\n");
  alarm (1);
  printf ("Sleeping for %lld.%09ld seconds...\n",
  (long long) forever.tv_sec, forever.tv_nsec);
  if (nanosleep (&forever, &remaining) == 0)
return 2;
  if (errno != EINTR)
return 3;
  if (remaining.tv_sec < time_t_max - 10)
{
  printf ("After alarm sent off, remaining time is %lld.%09ld seconds;\n",
  (long long) remaining.tv_sec, remaining.tv_nsec);
  printf ("i.e., nanosleep claimed that it slept for about %f years.\n",
  (forever.tv_sec - remaining.tv_sec) / (24 * 60 * 60 * 364.2425));
  return 4;
}
  printf ("ok\n");
  return 0;
}


Re: [PATCH] fanotify: add a flag to allow setting O_CLOEXEC on event fd

2014-10-02 Thread Pádraig Brady
On 10/02/2014 08:52 AM, Yann Droneaud wrote:
> In order to not potentially break applications which were
> requesting O_CLOEXEC on event file descriptors but which
> actually need it to be not effective as the kernel currently
> ignore the flag, so the file descriptor is inherited accross
> exec regardless of O_CLOEXEC (please forgive me for the
> wording), this patch introduces FAN_FD_CLOEXEC flag to
> fanotify_init() so that application can request O_CLOEXEC
> to be effective.
> Newer application would use FAN_FD_CLOEXEC flag along
> O_CLOEXEC to enable close on exec on newly created
> file descriptor:
> 
>   fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK|FAN_FD_CLOEXEC,
>  O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);

Ugh really?
IMHO there should be widespread or at least known breakage with
O_CLOEXEC before adding messiness like this.
It seems surprising to me that apps that would depend on
O_CLOEXEC being ineffective.

please reconsider this one.

thanks,
Pádraig.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Initramfs FSID altered in 3.14

2014-04-03 Thread Pádraig Brady
On 04/03/2014 06:57 PM, Dave Reisner wrote:
> Hi,
> 
> [This is a repost of a G+ post at Tejun's request]
> 
> With Linux 3.14, you might notice in /proc/self/mountinfo that your
> root's parent FSID is now 0, instead of the 1 that it's been for the
> last N years. Tejun wrote the change (9e30cc9595303b27b48) that caused
> this, but the change comes in a rather innocuous way. Instead of an
> internal kernel mount of sysfs being assigned 0, it's now the initramfs.
> 
> So far, this has already caused switch_root and findmnt (from
> util-linux) to break, cp (from coreutils) to break when using the -x
> flag in early userspace, and it's also been pointed out that systemd's
> readahead code makes assumptions about a device number of 0.

For reference we've changed coreutils not to assume 0 is an invalid device ID:
http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commit;h=d0294ff3

> Are we now supposed to go and change all the assumptions in userspace
> about 0 being special? I'm conflicted. The kernel isn't supposed to
> break userspace, but it seems to me that FSIDs were never something to
> rely on -- similar to the block device numbering scheme.

I would say the kernel doesn't care what the value is,
so to ease compat worries just use >= 1.

cheers,
Pádraig
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/33] [RFC] Non disruptive application core dump infrastructure

2014-03-20 Thread Pádraig Brady
On 03/20/2014 09:39 AM, Janani Venkataraman wrote:
> Hi all,
> 
> The following series implements an infrastructure for capturing the core of an
> application without disrupting its process.
> 
> Kernel Space Approach:
> 
> 1) Posted an RFD to LKML explaining the various kernel-methods being analysed.
> 
> https://lkml.org/lkml/2013/9/3/122
> 
> 2) Went ahead to implement the same using the task_work_add approach and 
> posted an
> RFC to LKML.
> 
> http://lwn.net/Articles/569534/
> 
> Based on the responses, the present approach implements the same in 
> User-Space.
> 
> User Space Approach:
> 
> We didn't adopt the CRIU approach because our method would give us a head
> start, as all that the distro would need is the PTRACE_functionality and 
> nothing
> more which is available from kernel versions 3.4 and above.
> 
> Basic Idea of User Space:
> 
> 1) The threads are held using PTRACE_SEIZE and PTRACE_INTERRUPT.
> 
> 2) The dump is then taken using the following:
> 1) The register sets namely general purpose, floating point and the arch
> specific register sets are collected through PTRACE_GETREGSET calls by
> passing the appropriate register type as parameter.
> 2) The virtual memory maps are collected from /proc/pid/maps.
> 3) The auxiliary vector is collected from /proc/pid/auxv.
> 4) Process state information for filling the notes such as PRSTATUS and
> PRPSINFO are collected from /proc/pid/stat and /proc/pid/status.
> 5) The actual memory is read through process_vm_readv syscall as suggested
> by Andi Kleen.
> 6) Command line arguments are collected from /proc/pid/cmdline
> 
> 3) The threads are then released using PTRACE_DETACH.
> 
> Self Dump:
> 
> A self dump is implemented with the following approach which was adapted
> from CRIU:
> 
> Gencore Daemon
> 
> The programs can request a dump using gencore() API, provided through
> libgencore. This is implemented through a daemon which listens on a UNIX File
> socket. The daemon is started immediately post installation.
> 
> We have provided service scripts for integration with systemd.
> 
> NOTE:
> 
> On systems with systemd, we could make use of socket option, which will avoid
> the need for running the gencore daemon always. The systemd can wait on the
> socket for requests and trigger the daemon as and when required. However, 
> since
> the systemd socket APIs are not exported yet, we have disabled the supporting
> code for this feature.
> 
> libgencore:
> 
> 1) The client interface is a standard library call. All that the dump 
> requester
> does is open the library and call the gencore() API and the dump will be
> generated in the path specified(relative/absolute).
> 
> To Do:
> 
> 1) Presently we wait indefinitely for the all the threads to seize. We can add
> a time-out to decide how much time we need to wait for the threads to be
> seized. This can be passed as command line argument in the case of a third
> party dump and in the case of the self-dump through the library call. We need
> to work on how much time to wait.
> 
> 2) Like mentioned before, the systemd socket APIs are not exported yet and
> hence this option is disabled now. Once these API's are available we can 
> enable
> the socket option.
> 
> We would like to push this to one of the following packages:
> a) util-linux
> b) coreutils
> c) procps-ng
> 
> We are not sure which one would suit this application the best.
> Please let us know your views on the same.

Well from coreutils persepective, they're generally
non Linux specific _commands_, and so wouldn't be
a natural home for this (despite the _core_ in the name :)).

thanks,
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 39/52] tools/perf/build: Automatically build in parallel, based on number of CPUs in the syst

2013-10-08 Thread Pádraig Brady
On 10/08/2013 10:02 AM, Ingo Molnar wrote:
> +ifeq ($(JOBS),)
> +  JOBS := $(shell grep -c ^processor /proc/cpuinfo 2>/dev/null)

nproc is probably ubiquitous enough to use now
(available since coreutils 8.1 (end of 2009))

As well as being more concise, it will take
account of offline CPUs etc.

> +  ifeq ($(JOBS),)
> +JOBS := 1
> +  endif
> +endif
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Copy on write hard links?

2013-09-29 Thread Pádraig Brady
On 09/29/2013 08:14 AM, Richard Weinberger wrote:
> On Sun, Sep 29, 2013 at 7:22 AM, Pádraig Brady  wrote:
>> On 09/25/2013 03:37 PM, richard -rw- weinberger wrote:
>>> On Wed, Sep 25, 2013 at 4:28 PM, Thomas Meyer  wrote:
>>>> Am Mittwoch, den 25.09.2013, 08:59 -0500 schrieb Rob Landley:
>>>>> On 09/24/2013 01:36:56 PM, Thomas Meyer wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Is there such a thing?
>>>>>
>>>>> In the kernel's vfs layer?
>>>>
>>>> Yes, that would be a nice feature!
>>>
>>> You mean reflinks?
>>> Currently only OCFS2 and btrfs support them.
>>> Both using a fs specific ioctl().
>>> IIRC GNU cp uses the btrfs specific one if the --reflink parameter is used.
>>
>> coreutils is waiting for a reflink syscall to materialize
>> rather than adding new per filesystem support
>> http://lwn.net/Articles/335380/
> 
> Is this the correct link? It's a proposal for a reflink() syscall.
> But corrently both OCFS2 and btrfs are using ioctl().
> 
> Digging into GNU coreutils shows that their cp's
> clone_file() only supports the btrfs ioctl().
> I don't know what the GNU folks big plan is, maybe you know more. :-)

The current coreutils plan is to not to call any more file system specific 
ioctls,
rather waiting until a more general syscall is available.

thanks,
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Copy on write hard links?

2013-09-28 Thread Pádraig Brady
On 09/25/2013 03:37 PM, richard -rw- weinberger wrote:
> On Wed, Sep 25, 2013 at 4:28 PM, Thomas Meyer  wrote:
>> Am Mittwoch, den 25.09.2013, 08:59 -0500 schrieb Rob Landley:
>>> On 09/24/2013 01:36:56 PM, Thomas Meyer wrote:
 Hi,

 Is there such a thing?
>>>
>>> In the kernel's vfs layer?
>>
>> Yes, that would be a nice feature!
> 
> You mean reflinks?
> Currently only OCFS2 and btrfs support them.
> Both using a fs specific ioctl().
> IIRC GNU cp uses the btrfs specific one if the --reflink parameter is used.

coreutils is waiting for a reflink syscall to materialize
rather than adding new per filesystem support
http://lwn.net/Articles/335380/

thanks,
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: allow empty symlink targets

2013-05-16 Thread Pádraig Brady
On 05/15/2013 11:03 PM, Al Viro wrote:
> On Wed, May 15, 2013 at 01:38:48PM +0100, P??draig Brady wrote:
 In today's Austin Group meeting, I was tasked to open a new bug that
 would state specifically how the empty symlink is resolved; the intent
 is to allow both Solaris behavior (current directory) and BSD behavior
 (ENOENT).  Meanwhile, everyone was in agreement that the Linux kernel
 has a bug for rejecting the creation of an empty symlink, but once that
 bug is fixed, then Linux can choose either Solaris or BSD behavior for
 how to resolve such a symlink.
> 
> Austin Group Is At It Again, Demands at 11...
> 
> Would you mind explaining who's "everyone" and why would we possibly
> want to honour that agreement of yours?  Functionality in question is
> utterly pointless, seeing that semantics of such symlinks is OS-dependent
> anyway *and* that blanket refusal to traverse such beasts is a legitimate
> option.  What's the point in allowing to create them in the first place?

That's a fair point.
I guess the main reason to allow is for consistency with
other systems that do allow it.

What triggered this was a user who was using ln to
store "non file name" strings in symlinks,
and was surprised by the Linux error here,
and annoyed by the non portability of his script.

cheers,
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: allow empty symlink targets

2013-05-15 Thread Pádraig Brady
On 05/15/2013 03:40 PM, Eric Blake wrote:
> On 05/15/2013 06:38 AM, Pádraig Brady wrote:
>> On 01/17/2013 04:22 PM, Pádraig Brady wrote:
>>> On 01/17/2013 01:03 PM, Pádraig Brady wrote:
>>>> The discussion leading to this is at http://bugs.gnu.org/13447
>>>> In summary other systems allow an empty target for a symlink,
>>>> and POSIX specifies that it should be allowed?
>>>
>>> In relation to this, Eric Blake said:
>>>
>>>> In today's Austin Group meeting, I was tasked to open a new bug that
>>>> would state specifically how the empty symlink is resolved; the intent
>>>> is to allow both Solaris behavior (current directory) and BSD behavior
>>>> (ENOENT).  Meanwhile, everyone was in agreement that the Linux kernel
>>>> has a bug for rejecting the creation of an empty symlink, but once that
>>>> bug is fixed, then Linux can choose either Solaris or BSD behavior for
>>>> how to resolve such a symlink.
>>>>
>>>> It will probably be a bug report similar to this one, which regarded how
>>>> to handle a symlink containing just slashes:
>>>> http://austingroupbugs.net/view.php?id=541
>>
>> Following up from http://austingroupbugs.net/view.php?id=649
>> It seems POSIX will now allow the current Linux behavior of returning ENOENT,
> 
> Huh?  Linux currently doesn't allow the creation of an empty symlink.
> That link mentions the current BSD behavior of returning ENOENT when
> resolving such a symlink (that is, what stat() does when chasing through
> an empty symlink, provided such a symlink is first created).

Ah OK. The standards are hard enough to interpret,
never mind the comments discussing the standards :)
Not helping was that symlink() returns ENOENT in this case too.

>> or the Solaris behavior of allowing empty symlink targets.
> 
> The point made in that bug report is that Linux is buggy for not
> allowing symlink() to create an empty symlink in the first place; once
> you allow the creation of an empty symlink, then how to handle such a
> symlink in stat() is up to you whether to copy Solaris' or BSD's example.

OK cool, that make more sense to me.

Adding in a couple more recipients to garner interest...

thanks,
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: allow empty symlink targets

2013-05-15 Thread Pádraig Brady
On 01/17/2013 04:22 PM, Pádraig Brady wrote:
> On 01/17/2013 01:03 PM, Pádraig Brady wrote:
>> The discussion leading to this is at http://bugs.gnu.org/13447
>> In summary other systems allow an empty target for a symlink,
>> and POSIX specifies that it should be allowed?
> 
> In relation to this, Eric Blake said:
> 
>> In today's Austin Group meeting, I was tasked to open a new bug that
>> would state specifically how the empty symlink is resolved; the intent
>> is to allow both Solaris behavior (current directory) and BSD behavior
>> (ENOENT).  Meanwhile, everyone was in agreement that the Linux kernel
>> has a bug for rejecting the creation of an empty symlink, but once that
>> bug is fixed, then Linux can choose either Solaris or BSD behavior for
>> how to resolve such a symlink.
>>
>> It will probably be a bug report similar to this one, which regarded how
>> to handle a symlink containing just slashes:
>> http://austingroupbugs.net/view.php?id=541

Following up from http://austingroupbugs.net/view.php?id=649
It seems POSIX will now allow the current Linux behavior of returning ENOENT,
or the Solaris behavior of allowing empty symlink targets.

cheers,
Pádraig.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New copyfile system call - discuss before LSF?

2013-03-31 Thread Pádraig Brady
On 03/30/2013 08:08 PM, Andreas Dilger wrote:
> On 2013-03-30, at 12:49 PM, Pavel Machek wrote:
>> Hmm, really? AFAICT it would be simple to provide an
>> open_deleted_file("directory") syscall. You'd open_deleted_file(),
>> copy source file into it, then fsync(), then link it into filesystem.
>>
>> That should have atomicity properties reflected.
> 
> Actually, the open_deleted_file() syscall is quite useful for many
> different things all by itself.  Lots of applications need to create
> temporary files that are unlinked at application failure (without a
> race if app crashes after creating the file, but before unlinking).
> It also avoids exposing temporary files into the namespace if other
> applications are accessing the directory.
> 
> We've added a library routine that does this for Lustre in a hackish
> way (magical filename created in target directory) for being able to
> migrate files between data servers, HSM, defragmentation, rsync, etc.
> 
> Cheers, Andreas

This reminds me of the flink() discussion:
http://marc.info/?l=linux-kernel&m=104965452917349

Also kinda related is the exchangedata() OSX system call to
"atomically exchange data between two files"

thanks,
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] watchdog:improve w83627hf_wdt to timeout in minutes

2013-03-29 Thread Pádraig Brady
On 03/25/2013 04:15 AM, Tony Chung wrote:
> The current maximum of 255 seconds is insufficient.
> For example, crash dump could take 5+ minutes.
> 
> Signed-off-by: Tony Chung 
> ---
>  drivers/watchdog/w83627hf_wdt.c |   73 ++
>  1 files changed, 57 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/watchdog/w83627hf_wdt.c b/drivers/watchdog/w83627hf_wdt.c
> index 92f1326..d26adfb 100644
> --- a/drivers/watchdog/w83627hf_wdt.c
> +++ b/drivers/watchdog/w83627hf_wdt.c
> @@ -58,7 +58,7 @@ MODULE_PARM_DESC(wdt_io, "w83627hf/thf WDT io port (default 
> 0x2E)");
>  static int timeout = WATCHDOG_TIMEOUT;   /* in seconds */
>  module_param(timeout, int, 0);
>  MODULE_PARM_DESC(timeout,
> - "Watchdog timeout in seconds. 1 <= timeout <= 255, default="
> + "Watchdog timeout in seconds. 1 <= timeout <= 15300, default="
>   __MODULE_STRING(WATCHDOG_TIMEOUT) ".");
>  
>  static bool nowayout = WATCHDOG_NOWAYOUT;
> @@ -67,6 +67,29 @@ MODULE_PARM_DESC(nowayout,
>   "Watchdog cannot be stopped once started (default="
>   __MODULE_STRING(WATCHDOG_NOWAYOUT) ")");
>  
> +/* timeout unit in minute */
> +static bool use_minute;
> +static char *unit;
> +static int max_timeout = 15300; /* 255*60 seconds */
> +
> +static inline int wdt_round_secs_to_minute(int t)
> +{
> + return (t + 59) / 60;
> +}
> +
> +static inline int wdt_use_minute(int t)
> +{
> + if (t > 255) {
> + t = wdt_round_secs_to_minute(t);
> + use_minute = 1;
> + unit = "minutes";
> + } else {
> + use_minute = 0;
> + unit = "secs";
> + }
> + return t;
> +}
> +
>  /*
>   *   Kernel methods.
>   */
> @@ -107,6 +130,23 @@ static void w83627hf_unselect_wd_register(void)
>   outb_p(0xAA, WDT_EFER); /* Leave extended function mode */
>  }
>  
> +/* set CRF5 register */
> +static void w83627hf_set_crf5(void)
> +{
> + unsigned char t;
> +
> + outb_p(0xF5, WDT_EFER); /* Select CRF5 */
> + t = inb_p(WDT_EFDR);  /* read CRF5 */
> + t &= ~0x0C;   /* set second mode & disable keyboard
> + turning off watchdog */
> + t |= 0x02;/* enable the WDTO# output low pulse
> + to the KBRST# pin (PIN60) */
> + if (use_minute)
> + t |= 0x80;/* set timeout in minute */
> +
> + outb_p(t, WDT_EFDR);/* Write back to CRF5 */
> +}
> +
>  /* tyan motherboards seem to set F5 to 0x4C ?
>   * So explicitly init to appropriate value. */
>  
> @@ -116,21 +156,16 @@ static void w83627hf_init(void)
>  
>   w83627hf_select_wd_register();
>  
> + w83627hf_set_crf5();
> +
>   outb_p(0xF6, WDT_EFER); /* Select CRF6 */
>   t = inb_p(WDT_EFDR);  /* read CRF6 */
>   if (t != 0) {
> - pr_info("Watchdog already running. Resetting timeout to %d 
> sec\n",
> - timeout);
> + pr_info("Watchdog already running.  Reset timeout to %d %s\n",
> + timeout, unit);

"Resetting" is better as otherwise it might read as an instruction to the user.
Alternatively you could have: "Timeout was reset to ..."

>   outb_p(timeout, WDT_EFDR);/* Write back to CRF6 */
>   }
>  
> - outb_p(0xF5, WDT_EFER); /* Select CRF5 */
> - t = inb_p(WDT_EFDR);  /* read CRF5 */
> - t &= ~0x0C;   /* set second mode & disable keyboard
> - turning off watchdog */
> - t |= 0x02;/* enable the WDTO# output low pulse
> - to the KBRST# pin (PIN60) */
> - outb_p(t, WDT_EFDR);/* Write back to CRF5 */
>  
>   outb_p(0xF7, WDT_EFER); /* Select CRF7 */
>   t = inb_p(WDT_EFDR);  /* read CRF7 */
> @@ -147,6 +182,9 @@ static void wdt_set_time(int timeout)
>  
>   w83627hf_select_wd_register();
>  
> + if (use_minute)
> + w83627hf_set_crf5();

Why is this bit needed in set_time()?

On a general note, one has to be careful when changing
between second and minute modes, if the watchdog is already running
(from the BIOS for example).

thanks,
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/7] watchdog: w83627hf: Enable watchdog only once

2013-03-21 Thread Pádraig Brady
On 03/19/2013 08:02 PM, Guenter Roeck wrote:
> On Tue, Mar 19, 2013 at 05:26:26PM +0000, Pádraig Brady wrote:
>> On 03/10/2013 11:14 PM, Guenter Roeck wrote:
>>> It is unnecessary to enable the logical device and WDT0 each time
>>> the watchdog is accessed. Do it only once during initialization.
>>
>> Is this also the case on systems where the superio
>> chip is used for other things? I've the impression
>> that this may break some systems (though I no longer
>> have the hardware to test). Arbitration of multiple
>> users of the superio device may be managed be a central
>> user space app, or by a kernel level arbitrator.
>>
> Not sure if I understand you correctly.
> 
> You mean some entity might actually disable the watchdog between accesses
> to it by the watchdog driver ? That would make it pretty useless.
> Might as well turn it off entirely if that is the case.
> 
> Or do you refer to _selecting_ the hwmon logical device ? If so, this patch
> is about enabling it only once, not about selecting it only once.

I meant selecting.
Enabling only once is fine.

sorry for the noise,

thanks,
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/7] watchdog: w83627hf: Enable watchdog only once

2013-03-19 Thread Pádraig Brady
On 03/10/2013 11:14 PM, Guenter Roeck wrote:
> It is unnecessary to enable the logical device and WDT0 each time
> the watchdog is accessed. Do it only once during initialization.

Is this also the case on systems where the superio
chip is used for other things? I've the impression
that this may break some systems (though I no longer
have the hardware to test). Arbitration of multiple
users of the superio device may be managed be a central
user space app, or by a kernel level arbitrator.

thanks,
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kswapd craziness round 2

2013-03-19 Thread Pádraig Brady
On 03/08/2013 11:21 PM, Jiri Slaby wrote:
> On 03/08/2013 07:42 AM, Hillf Danton wrote:
>> On Fri, Mar 8, 2013 at 3:37 AM, Jiri Slaby  wrote:
>>> On 03/01/2013 03:02 PM, Hillf Danton wrote:
 On Fri, Mar 1, 2013 at 1:02 AM, Jiri Slaby  wrote:
>
> Ok, no difference, kswap is still crazy. I'm attaching the output of
> "grep -vw '0' /proc/vmstat" if you see something there.
>
 Thanks to you for test and data.

 Lets try to restore the deleted nap, then.
>>>
>>> Oh, it seems to be nice now:
>>> root   579  0.0  0.0  0 0 ?SMar04   0:13 [kswapd0]
>>>
>> Double thanks.
> 
> There is one downside. I'm not sure whether that patch was the culprit.
> My Thunderbird is jerky when scrolling and lags while writing this
> message. The letters sometimes appear later than typed and in groups. Like
> I (kbd): My Thunder
> TB: My Thunder
> I (kbd): b-i-r-d
> TB: is silent
> I (kbd): still typing...
> TB: bird is
> 
> Perhaps it's not only TB.

I notice the same thunderbird issue on the much older 2.6.40.4-5.fc15.x86_64
which I'd hoped would be fixed on upgrade :(

My Thunderbird is using 1957m virt, 722m RSS on my 3G system.
What are your corresponding mem values?

For reference:
http://marc.info/?t=13086502551&r=1&w=2
https://bugzilla.redhat.com/show_bug.cgi?id=712019

thanks,
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: allow empty symlink targets

2013-01-17 Thread Pádraig Brady

On 01/17/2013 01:03 PM, Pádraig Brady wrote:

The discussion leading to this is at http://bugs.gnu.org/13447
In summary other systems allow an empty target for a symlink,
and POSIX specifies that it should be allowed?


In relation to this, Eric Blake said:

> In today's Austin Group meeting, I was tasked to open a new bug that
> would state specifically how the empty symlink is resolved; the intent
> is to allow both Solaris behavior (current directory) and BSD behavior
> (ENOENT).  Meanwhile, everyone was in agreement that the Linux kernel
> has a bug for rejecting the creation of an empty symlink, but once that
> bug is fixed, then Linux can choose either Solaris or BSD behavior for
> how to resolve such a symlink.
>
> It will probably be a bug report similar to this one, which regarded how
> to handle a symlink containing just slashes:
> http://austingroupbugs.net/view.php?id=541

thanks,
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] symlink: allow an empty target string

2013-01-17 Thread Pádraig Brady
POSIX only states that ENOENT should be returned
if an empty string is specified for the link name.
In fact it states the link target...
"shall be treated only as a character string and
 shall not be validated as a pathname".

Signed-off-by: Pádraig Brady 
---
 fs/namei.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 43a97ee..26dd264 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3533,12 +3533,13 @@ SYSCALL_DEFINE3(symlinkat, const char __user *, oldname,
int, newdfd, const char __user *, newname)
 {
int error;
+   int empty;
struct filename *from;
struct dentry *dentry;
struct path path;
unsigned int lookup_flags = 0;
 
-   from = getname(oldname);
+   from = getname_flags(oldname, LOOKUP_EMPTY, &empty);
if (IS_ERR(from))
return PTR_ERR(from);
 retry:
-- 
1.7.6.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Capabilities still can't be inherited by normal programs

2012-12-19 Thread Pádraig Brady

On 12/12/2012 06:29 PM, Andy Lutomirski wrote:

On Sat, Dec 8, 2012 at 3:57 PM, Andy Lutomirski  wrote:


I just tried to search to find actual uses of pI/fI.  Here's what I found:


I downloaded all the Fedora spec files and searched for file
capabilities.  Assuming I didn't mess up, here's what I found:


Just pointing out a handy online search tool
for this sort of thing.

http://searchco.de/?q=%25caps+url:fedora+ext:spec

thanks,
Pǽdraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: urandom is too slow

2012-10-30 Thread Pádraig Brady

On 10/30/2012 06:54 PM, Theodore Ts'o wrote:

On Tue, Oct 30, 2012 at 04:55:22PM +0200, Lasse Kärkkäinen wrote:

Apparently there has been little or no development on urandom even
though the device is in widespread use for disk shredding and such
use. The device emits data at rather slow rate of 19 MB/s even on
modern hardware where other software-based PRNGs could do far
better. An even better option seems to be utilizing AES for
encrypting zeroes, using a random key, allowing for rates up to 500
MB/s with hardware that has AES-NI instructions.

Why is urandom so slow and why isn't AES hardware acceleration utilized?


If you can use a software-based PRNG, you should use one in userspace.
The intended use of urandom is for cryptographic purposes (i.e.,
generating random session keys, long-term public keys, etc.).  If you
just want to wipe a disk, you shouldn't be using /dev/urandom for that
purpose.


For the record, shred uses a user space PRNG for speed
for the last 3 years or so, rather than using /dev/urandom:
http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commit;h=af5723c7

$ shred-old -v -n3 t
shred-old: t: pass 1/3 (random)...
shred-old: t: pass 1/3 (random)...8.3MiB/1000MiB 0%
shred-old: t: pass 1/3 (random)...17MiB/1000MiB 1%
shred-old: t: pass 1/3 (random)...32MiB/1000MiB 3%
...

$ time shred-new -v t
shred-new: t: pass 1/3 (random)...
shred-new: t: pass 1/3 (random)...116MiB/1000MiB 11%
shred-new: t: pass 1/3 (random)...216MiB/1000MiB 21%
shred-new: t: pass 1/3 (random)...340MiB/1000MiB 34%
...

cheers,
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Perf] Adding timeout option

2012-10-22 Thread Pádraig Brady

On 10/21/2012 05:18 AM, abhishek agarwal wrote:


perfmon had "timeout" option and i guess, same do oprofile.

On Sat, Oct 20, 2012 at 1:58 AM, Pádraig Brady  wrote:

On 10/13/2012 08:54 AM, abhishek agarwal wrote:


Hi folks..

I was thinking that why cant we have a timeout option in perf stat
command.  The timeout feature will help us to profile a process for a
stipulated time (preferably in millisecs) and make perf stat return
after that time.
Eg:

perf stat --timeout=10 sleep 100

This will make perf return and report stats after 10 ms...

Hope anyone can shed some more light on the idea



It seems preferable to use the timeout program to do this.
Either sending a handled signal to the perf process like:

 timeout -s HUP 10 perf stat sleep 100

Or even better, just use that to kill the monitored process itself

 perf stat timeout 10 sleep 100


> sleep does not offer a good timeout resolution and it is not a good
> option (according to me) either.

sleep was just following on the command example.
timeout(1) supports subsecond resolution,
through timer_create (CLOCK_REALTIME...) and
timeout indication through signals.

thanks,
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Perf] Adding timeout option

2012-10-20 Thread Pádraig Brady

On 10/13/2012 08:54 AM, abhishek agarwal wrote:

Hi folks..

I was thinking that why cant we have a timeout option in perf stat
command.  The timeout feature will help us to profile a process for a
stipulated time (preferably in millisecs) and make perf stat return
after that time.
Eg:

perf stat --timeout=10 sleep 100

This will make perf return and report stats after 10 ms...

Hope anyone can shed some more light on the idea


It seems preferable to use the timeout program to do this.
Either sending a handled signal to the perf process like:

timeout -s HUP 10 perf stat sleep 100

Or even better, just use that to kill the monitored process itself

perf stat timeout 10 sleep 100

cheers,
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] procfs: don't need a PATH_MAX allocation to hold a string representation of an int

2012-09-07 Thread Pádraig Brady

On 09/07/2012 01:48 PM, Jeff Layton wrote:

On Fri,  7 Sep 2012 08:34:53 -0400
Jeff Layton  wrote:


Signed-off-by: Jeff Layton
---
  fs/proc/base.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 1b6c84c..58e801b 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2758,7 +2758,8 @@ static void *proc_self_follow_link(struct dentry *dentry, 
struct nameidata *nd)
pid_t tgid = task_tgid_nr_ns(current, ns);
char *name = ERR_PTR(-ENOENT);
if (tgid) {
-   name = __getname();
+   /* 10 for max length of an int in decimal + NULL terminator */
+   name = kmalloc(11, GFP_KERNEL);


^
Bah...my mistake. This should be "12", since it's possible (though
unlikely) that this value could be negative. Is there a better way to
express "strlen of max representation of an int in decimal" ?


See INT_BUFSIZE_BOUND() in:
http://git.sv.gnu.org/gitweb/?p=gnulib.git;a=blob;f=lib/intprops.h;hb=HEAD

cheers,
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Question about the fallocate system call

2012-07-26 Thread Pádraig Brady
On 07/26/2012 03:30 PM, Jidong Xiao wrote:
> Hi,
> 
> I just have a simple question about fallocate.
> 
> I want to test the punch hole function of fallocate(). So I wrote such
> a simple program:
> 
> yosemite:/mnt # cat test.c
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> 
> int main(void)
> {
> int fd;
> 
> fd = open("testfile", O_RDWR);
> fallocate(fd,FALLOC_FL_PUNCH_HOLE,0,500*1024*1024);
> close(fd);
> 
> return 0;
> }
> 
> I created a file called "testfile" whose size is 1GB, however, when I
> run the above program, the size of the testfile simply won't change,
> if I use stat command to check the file status, nothing is changed when I
> execute the above program. My filesystem is ext4, as I understand,
> ideally when I run the above program, the file size should decrease
> from 1GB to 512MB, is there anything wrong with the program or I just
> understood incorrectly?
> 
> Thank you for any inputs/comments.

code looks OK,
but you're not checking the return from fallocate().
I'm guessing it's returning -1 with errno = ENOTSUP

cheers,
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Question about free/used memory on Linux

2007-11-14 Thread Pádraig Brady
Ravinandan Arakali (rarakali) wrote:
> Hi Vaidy,
> What do you think is the right way to get the memory usage of a
> process, I mean the actual physical memory used ? Basically,
> I'm interested in the incremental cost of a process, which
> means, I don't want to include the text segments of shared
> libraries which would remain even after the process is killed
> (since it would be used by other processes).
> 
> Is the RSS field of "ps aux" command the right one or use 
> "pmap" command and look at the "writeable" segments ?

I already commented on this thread with a python script
for reporting RAM usage for programs.

RSS = Private and Shared Resident pages.
You can get the shared value for a process from /proc/$$/smaps
(as is done in the script)

Pádraig.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Laptop's HDD

2007-11-05 Thread Pádraig Brady
Alberto Gonzalez wrote:
> Hi,
> 
> Maybe some of you have been hearing lately about a problem with laptop's hard 
> disk drives being killed by *insert Linux distro here* [1]

I asked about this on the fedora devel list:
http://www.redhat.com/archives/fedora-devel-list/2007-October/msg02324.html

I don't think the kernel should worry about this.
I don't even think distros should change the default settings of the BIOS/disk.

Up to and including Fedora 7 on my laptop, the disk did a load cycle
on average once every 48 seconds. Mounting the filesystems noatime changed this
to once every 108 seconds, which is a little aggressive still but not too bad.

Note fedora 8 will have the relatime option on by default for all filesystems.

Pádraig.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possibility of adding -march=native to x86

2007-10-26 Thread Pádraig Brady
Adrian Bunk wrote:
> On Thu, Oct 25, 2007 at 09:12:45PM +0100, Michael Lothian wrote:
>>> The MPENTIUM4 option does not only set -march=pentium4, it also enables
>>> several other options in arch/i386/Kconfig.cpu resulting in better
>>> performance.
>> How about an autodetect to set the right options here too using cpuid?
>>
>> With a warning of course that the code produced will be specifically
>> for the native cpu that it's compiled on.
> 
> If you don't know or can figure out yourself the CPU you have, you'd 
> better not compile your own kernel...

There also is the added variable of what your version of gcc supports.
The kernel gcc options would have to be the highest common factor.
See also http://www.pixelbeat.org/scripts/gcccpuopt

Pádraig.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Question about free/used memory on Linux

2007-10-22 Thread Pádraig Brady
Ravinandan Arakali (rarakali) wrote:
> Hi kernel gurus,
> I am trying to find out the memory that's used on my linux box.
> I find that there are quite a few confusing metrics. How do
> I find out the "true" used memory ?
> 
> 1. For eg. "free -m" shows free memory (excluding buffers/caches) 
> as 308 MB while I can see(from "df" output) that the the tmpfs 
> partitions take up about 400 MB. So, does "free -m" not consider 
> the tmpfs partitions ?
> 
> 2. I try to add up RSS field of all processes reported by
> "ps aux" command. But is it true that this would be misleading
> in that, shared memory used by, say 2 processes would show
> up twice here although there's only one copy in memory. Also
> does this consider the fact that there's only one copy
> of shared libraries ?

Have a look at this script so show RAM used by programs:
http://www.pixelbeat.org/scripts/ps_mem.py

Note to display totals you will need this patch applied:
http://lkml.org/lkml/2007/8/13/1224

Pádraig.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [v4l-dvb-maintainer] [GIT PATCHES] V4L/DVB changes for 2.6.24

2007-10-11 Thread Pádraig Brady
Aidan Thornton wrote:
> I looked at this recently, and I'm not sure the core em28xx code was
> really that different (at least, pre-userspace). Most of the core
> changes seemed to be related to Markus' driver having (semi-working)
> VBI support. I haven't tried this recently; I disabled it a while back
> because it had a bug that caused a kernel panic half the time when
> attempting to record something with MythTV.
> 
> The in-kernel driver looks mostly sound, though I can't test it
> myself. (One other interesting thing that was added in Markus' driver
> is various v4l1 ioctls, which may be useful to some people.)

Yes, for example VLC doesn't support v4l2 yet.
Here is a patch I back ported to 2.6.17 last year.
http://www.pixelbeat.org/patches/linux-2.6.17-em28xx-v4l1.diff
I didn't try to get it merged as I thought Markus would do it,
but looks like that's unlikely now.

Also here is a patch to allow shared access to the video device
(so you can have a separate tuner program to VLC for example):
http://www.pixelbeat.org/patches/linux-2.6.17-em28xx-shared.diff

> Incidentally, I notice you appear to be developing userspace drivers
> for the tvp5150 and zl10353. Is that really necessary?

It is necessary if Markus wants to stop people merging code back from
his in-kernel driver fork. Call me a cynic, but I'm confused about Markus'
motives in all this.

Markus, please do the right thing and just merge your code!
(and please don't reply this giving reasons you won't/can't do this).

Pádraig.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: A little coding style nugget of joy

2007-09-20 Thread Pádraig Brady
Matt LaPlante wrote:
> Since everyone loves random statistics, here are a few gems to give you a 
> break from your busy day:
> 
> Number of lines in the 2.6.22 Linux kernel source that include one or more 
> trailing whitespaces: 135209
> Bytes saved by removing said whitespace: 151809
> Lines in the (unified) diff: 455437
> Size of the diff: 15M
> People brave enough to submit the patch: ~0

It's gradually getting better so:
http://lwn.net/2001/1129/a/whitespace.php3
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] Btrfs: a copy on write, snapshotting FS

2007-06-19 Thread Pádraig Brady
Vladislav Bolkhovitin wrote:
> 
> I would also suggest one more feature: support for block level
> de-duplication. I mean:
> 
> 1. Ability for Btrfs to have blocks in several files to point to the
> same block on disk
> 
> 2. Support for new syscall or IOCTL to de-duplicate as a single
> transaction two or more blocks on disk, i.e. link them to one of them
> and free others
> 
> 3. De-de-duplicate blocks on disk, i.e. copy them on write
> 
> I suppose that de-duplication itself would be done by some user space
> process that would scan files, determine blocks with the same data and
> then de-duplicate them by using syscall or IOCTL (2).
> 
> That would be very usable feature, which in most cases would allow to
> shrink occupied disk space on 50-90%.

Have you references for this number?
In my experience one gets a lot of benefit from
the much simpler process of "de-duplication" of files.

Note a checksum stored in file metadata,
that is automatically invalidated on write would
speed up user space file de duplification,
and rsync, etc

Pádraig.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sendfile removal

2007-06-01 Thread Pádraig Brady
H. Peter Anvin wrote:
> Eric Dumazet wrote:
>> As I said, this new non blocking feature on the input side (disk), is
>> nice and usefull. (For people scared by splice() syscall :) )
>>
>> Just have to mention it is a change of behavior, and documentation
>> probably needs to reflect this change. "Since linux 2.6.23, sendfile()
>> repects O_NONBLOCK on in_fd as well"
>>
> 
> Fair enough.  Unix has traditionally not acknowledged the possibility of
> nonblocking I/O on conventional files, for some odd reason.

That reminds me of this patch:
http://lkml.org/lkml/2004/10/1/217
which went in for a while but was reverted:
http://lkml.org/lkml/2004/10/17/17

Pádraig
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: max_loop limit

2007-03-22 Thread Pádraig Brady
William Lee Irwin III wrote:
> Any chance we can get some kind of devices set up for partitions of
> loop devices if we're going to redo loopdev setup? That's been a thorn
> in my side for some time.

This script might be of use:
http://www.pixelbeat.org/scripts/lomount.sh

cheers,
Pádraig.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: userspace pagecache management tool

2007-03-07 Thread Pádraig Brady
Andrew Morton wrote:
> On Tue, 06 Mar 2007 12:10:49 +
> P__draig Brady <[EMAIL PROTECTED]> wrote:
>> Perhaps one could possibly just evict pages with _mapcount==0 ?
> 
> That is the present fadvise(FADV_DONTNEED) behaviour.

Ah right. It doesn't invalidate page_mapped() pages.
If that means it doesn't invalidate pages previously cached
by other processes, then great.

However I think what I meant though was fadvise(FADV_DONTNEED)
should only invalidate pages where page_count()<=1

>From include/linux/mm.h

" For pages belonging to inodes, the page_count() is the number of
  attaches, plus 1 if `private' contains something, plus one for
  the page cache itself."

cheers,
Pádraig.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: userspace pagecache management tool

2007-03-06 Thread Pádraig Brady
Andrew Morton wrote:
> Yes.  Let's flesh it out the backup program policy some more:
> 
> - Unconditionally invalidate output files
> 
> - on entry to read(), probe pagecache, record which pages in the range are 
> present
> 
> - on entry to next read(), shoot down those pages from the previous read
>   which weren't in pagecache.
> 
> - But we can do better!  LRU the page's files up to a certain number of pages.
> 
> - Once that point is exceeded, we need to reclaim some pages.  Which
>   ones?  Well, we've been observing all reads, so we can record which pages
>   were referenced once, and which ones were referenced multiple times so we
>   can do arbitrarily complex page aging in there.
> 
> - On close(), nuke all pages which weren't in core during open(), even if
>   this app referenced them multiple times.
> 
> - If the backup program decided to read its input files with mmap we're
>   rather screwed.  We can't intercept pagefaults so the best we can do is
>   to restore the file's pagecache to its previous state on close().
> 
>   Or if it's really a problem, get control in there somehow and
>   periodically poll the pagecache occupancy via mincore(), use madvise()
>   then fadvise() to trim it back.
> 
> That all sounds reasonably doable.  It'd be pretty complex to do it
> in-kernel but we could do it there too.  Problem is if course that the
> above strategy is explicitly optimised for the backup program and if it's
> in-kernel it becomes applicable to all other workloads.

I can see the above being possible, but I can't see the reason
for exposing that complexity to userspace. If I'm the target
audience for that API then it's broken as I'd mess it up,
or would take too long to get it right.

Can't we just fix the posix_fadvise() implementation to
only evict pages paged in by the current process.
Perhaps one could possibly just evict pages with _mapcount==0 ?

cheers,
Pádraig.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: userspace pagecache management tool

2007-03-05 Thread Pádraig Brady
Andrew Morton wrote:
> I've uploaded to http://userweb.kernel.org/~akpm/pagecache-management/ a
> little tool which permits the management of the pagecache usage of
> arbitrary applications.  Effectively it prevents the targetted application
> from using any pagecache at all.

Cool, Kinda like noca?
http://kernel.umbrella.ro/vm/
Though I could easily read your code,
but couldn't immediately figure out what noca was doing.

I used posix_fadvise in an app I did recently:
http://www.pixelbeat.org/programs/dvd-vr/
There is a stream_data() func there that does:

read(src)
write(dst)
posix_fadvise(src)
posix_fadvise(dst)

for performance I found I needed to do it in that order
so that any readahead done with the read(src)
was not thrown away by the posix_fadvise(src).
In addition to the order, one must be careful
to throw away only what you've actually written.

I'm not sure your lib gives enough control over this,
as you essentially do:

posix_fadvise(src)
read(src)
posix_fadvise(dst)
write(dst)

cheers,
Pádraig.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Question about setting affinity in 2.4

2007-02-19 Thread Pádraig Brady
Arjan van de Ven wrote:
> sched_setaffinity takes 3 not 2 parameters.

Yep the interface changed 3 times, hence
it's probably better using the syscall directly.
Search my notes for sched_setaffinity here:
http://www.pixelbeat.org/programming/c_c++_notes.html

Pádraig.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Finding hardlinks

2007-01-11 Thread Pádraig Brady
Frank van Maarseveen wrote:
> On Tue, Jan 09, 2007 at 11:26:25AM -0500, Steven Rostedt wrote:
>> On Mon, 2007-01-08 at 13:00 +0100, Miklos Szeredi wrote:
>>
 50% probability of false positive on 4G files seems like very ugly
 design problem to me.
>>> 4 billion files, each with more than one link is pretty far fetched.
>>> And anyway, filesystems can take steps to prevent collisions, as they
>>> do currently for 32bit st_ino, without serious difficulties
>>> apparently.
>> Maybe not 4 billion files, but you can get a large number of >1 linked
>> files, when you copy full directories with "cp -rl".
> 
> Yes but "cp -rl" is typically done by _developers_ and they tend to
> have a better understanding of this (uh, at least within linux context
> I hope so).

I'm not really following this thread, but that's wrong.
A lot of people use hardlinks to provide snapshot functionality.
I.E. the following can be used to efficiently make snapshots:

rsync /src/ /backup/today
cp -al /backup/today /backup/$Date

See also:

http://www.dirvish.org/
http://www.rsnapshot.org/
http://igmus.org/code/

> Also, just adding hard-links doesn't increase the number of inodes.

I don't think that was the point.

Pádraig.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/