Re: proposed pathchk change, in response to today's POSIX interpretation

2005-01-07 Thread P
Paul Eggert wrote:
Here's a proposed patch to pathchk.  It's not urgent, as pathchk
conforms to POSIX now, but it implements a new -P option suggested
in a POSIX interpretation released today.
-  -p, --portability   check for all POSIX systems, not only this one\n\
+  -p  check for most POSIX systems\n\
+  -P  check for leading \-\\n\
+  --portability   check for all POSIX systems (equivalent to -p -P)\n\
Interesting.
I'm afraid I don't know anything about POSIX standards structure.
You reference the following, which then references other POSIX stuff:
http://www.opengroup.org/austin/interps/doc.tpl?gdid=6232
Is the full definition of what patchk deems
portable available publicly?
I attach my findnl util for reference.
--
Pádraig Brady - http://www.pixelbeat.org
--
#!/bin/sh
#note if files have names with ascii char 1 in them will have this converted
#to \n, ls will say file not found in this case and hence the real filename
#can be inferred.
#
#Note expressions are evaluated left to right and as soon as
#one is matched the rest aren't evaluated (makes sense as doing
#or operation not and. Anyway this implies that should check things
#like filename length before actual valid character combinations tests
#
#Don't need to handle ^~ | ^- etc as always get fully qualified path from find
#
#Even if -n1 specified for xargs there will be no ls processes run
#to display output until the end as the pipe is line buffered and
#there is only 1 line passed to xargs. This is required so paths
#with linefeeds are supported.
#
#How do you add expression to only include filenames with @ most
#1 consequtive character (for e.g. :) I was trying: :{1,1}
#but the second 1 was ignored? I know grep -Ev :{2,} works but
#I need the expression as part of the large expression for obvious
#reasons. For now I'm allowing any filenames with :, but this could
#interoperability problems with NTFS for e.g. which uses this char
#to indicate a seperate stream in the file.
#
#man ascii was useful when writing this.
#
#Hmm just noticed the GNU pathchk utility, which
#should probably be integrated in some way?
#
#Note theoretically the only char UNIX doesn't allow in file/dir names
#is /. This can make things very awkward, especially for future extensions
#like streams etc.

. fslver

Usage() {
ProgName=`basename $0`
echo find Name (directory or file) Lint.
Usage: $ProgName [-1] [-2] [-3] [-p] [[-r] [-f] paths(s) ...]

These options are mutually exclusive (i.e. only the last one takes effect).
-1 is least checking, -3 is most. The default is 2.

-p is most stringent and applies POSIX.1 filename portability testing.
I.E. characters are limited to [A-Za-z0-9_.-] and max name length = 14 and
max path length = 255.

If no path(s) specified then the currrent directory is assumed.
exit
}

#default settings
level=2
MaxNameLen=129
MaxPathLen=2049

for arg
do
case $arg in
-1)
level=1
MaxNameLen=256
MaxPathLen=4097 ;;
-2)
: ;; #defaults set above
-3)
level=3
MaxNameLen=65
MaxPathLen=1025 ;;
-p)
level=p
MaxNameLen=15
MaxPathLen=256 ;;
-h|--help|-help)
Usage ;;
-v|--version)
Version ;;
*)
argsToPassOn=$argsToPassOn $arg ;;
esac
done

#-p = POSIX.1 checking (names = 14 chars) etc.
expressionsp=(.*/[^/]{$MaxNameLen,}$) #name length = MaxNameLen

#-1 = min checking(most still only require shell quoting, but very bad practice)
expressions1=$expressionsp|( +$)  #spaces @ end of name
expressions1=$expressions1|(.*/ [^/]*$)   #spaces @ start of name
expressions1=$expressions1|(.*/[^/]*[ ]{2,}[^/]*$) #2 or more adjacent spaces
expressions1=$expressions1|(.*/-[^/]*$)   #- @ start of name
expressions1=$expressions1|(.*/[^/]* -[^/]*$) #- after space in name

#-2 = default checking (characters requiring shell quoting etc)
expressions2=$expressions1|(.*/[^/]*\{[^/]*,[^/]*\}[^/]*$) #name with {,} pat
expressions2=$expressions2|(.*/[^/]*\[[^/]+\][^/]*$)  #name with [.+] pattern
expressions2=$expressions2|(.*/~[^/]*$)   #~ @ start of name

#-3 = max checking
expressions3=$expressions2|(.*/[^/]*[.]{2,}[^/]*$) #2 or more adjacent .'s
expressions3=$expressions3|(.+[.]$) #trailing .(s)

charactersp=[:alnum:]_./ #/ included as can't be in name and simpler exprs
characters3=$charactersp,~+
characters2=[EMAIL PROTECTED]:;'% 
characters1=$characters2()\$|\\
#any other characters are never OK and that includes \t*? etc.
#Note characters ]- are included in expresssions below.

#The following is clever if I say so myself
expressions=`eval echo -n '$'expressions$level`
characters=`eval echo -n '$'characters$level`

set -f #no globbing
. getfpf -f $argsToPassOn
#forcing -f as expressions assume / before name
#Note could 

sort -nu treatment of 0 empty lines

2005-01-07 Thread Ulrich Hermisson
Hello,
the following behaviour of sort -nu needs not necessarily be considered a 
bug:

$ echo ,2,0,1,-2,3,-4 | tr , \n | sort -nu
-4
-2
1
2
3
This means: An empty line (and likewise: a line not starting with a number) 
is interpreted as having
the key 0. Since it occurs before the line starting with 0 here, the 
latter is missing in the output,
which contains the empty line instead. So one may think that the key 0 
does not appear in the
input, even more so if there are no keys with negative numbers.

Possible solutions: Interpret empty lines (and lines not starting with a 
number) as having the key
plus or minus infinity, or leave it as it is, mentioning the chosen 
convention in the documentation.
Thank you!

With kind regards
Ulrich Hermisson

___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Leading spaces from wc --lines

2005-01-07 Thread Bryce Nesbitt (mailing list account)
The wc command always seems to place leading spaces on all results, 
for example:

hardhat:Log wc 1
79 1741579 1
hardhat:Log wc --lines 1
79 1
Is there a way to get wc --lines to return just the number for easy 
scripting, e.g.:

hardhat:Log wc --lines 1
79
If not, is there a chance this can be changed?
___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: sort -nu treatment of 0 empty lines

2005-01-07 Thread Jim Meyering
Ulrich Hermisson [EMAIL PROTECTED] wrote:

 Hello,

 the following behaviour of sort -nu needs not necessarily be considered a
 bug:

 $ echo ,2,0,1,-2,3,-4 | tr , \n | sort -nu
 -4
 -2

 1
 2
 3

Thanks for the report.
POSIX requires sort, with -n or `-kM,Nn', to interpret an empty field as 0.
The documentation (info sort) already says that an empty field is valid
but doesn't specify how it's interpreted.

I've just added a sentence saying that.
Here's the new description:

`--numeric-sort'
 Sort numerically: the number begins each line; specifically, it
 consists of optional blanks, an optional `-' sign, and zero or more
 digits possibly separated by thousands separators, optionally
 followed by a decimal-point character and zero or more digits.  A
 string of zero digits is interpreted as `0'.  The `LC_NUMERIC'
 locale specifies the decimal-point character and thousands
 separator.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: sort -nu treatment of 0 empty lines

2005-01-07 Thread Jim Meyering
 Jim Meyering [EMAIL PROTECTED] wrote:
  A string of zero digits is interpreted as `0'.

 That sounds to me like it's talking about 00.  Maybe A string
 of no digits?  Or A line that ends or has has nondigit characters
 where the number would be?

Good point.
Thanks.

  by a decimal-point character and zero or more digits.  A string of
  no digits is interpreted as @samp{0}.  The @env{LC_NUMERIC}


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: proposed pathchk change, in response to today's POSIX interpretation

2005-01-07 Thread Paul Eggert
[EMAIL PROTECTED] writes:

 Is the full definition of what patchk deems
 portable available publicly?

The current standard is here:

http://www.opengroup.org/onlinepubs/009695399/utilities/pathchk.html

In addition to http://www.opengroup.org/austin/interps/doc.tpl?gdid=6232,
which I already mentioned, you probably also want to look here:

http://www.opengroup.org/austin/interps/doc.tpl?gdid=6233

 I attach my findnl util for reference.

Yes, that's the sort of thing that pathchk was designed for.
However, I see several quoting problems in findnl, e.g.
...\{

Another example:

 #The following is clever if I say so myself
 expressions=`eval echo -n '$'expressions$level`

isn't portable, since you can't rely on either -n or backslash
handling with echo.  Better is something like this:

eval expressions=\$expressions$level


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils