Re: [9fans] sed question (OT)

2009-11-11 Thread frankg
On Oct 30, 12:58 pm, noah.ev...@gmail.com (Noah Evans) wrote:
 This kind of problem is character processing, which I would argue is
 C's domain. You can massage awk and sed to do the job for you, but at
 least for me it's conceptually simpler to just bang out the following
 C program:

 #include u.h
 #include libc.h
 #include bio.h

 #define isupper(r)      (L'A' = (r)  (r) = L'Z')
 #define islower(r)      (L'a' = (r)  (r) = L'z')
 #define isalpha(r)      (isupper(r) || islower(r))
 #define isspace(r)      ((r) == L' ' || (r) == L'\t' \
                         || (0x0A = (r)  (r) = 0x0D))
 #define toupper(r)      ((r)-'a'+'A')

 void
 usage(char *me)
 {
         fprint(2, %s: usage\n, me);

 }

 void
 main(int argc, char **argv)
 {
         Biobuf in, out;
         int c, waswhite, nwords;

         ARGBEGIN{
         default:
                 usage(argv[0]);
         }ARGEND;
         Binit(in, 0, OREAD);
         Binit(out, 1, OWRITE);

         waswhite = 0;
         nwords = 0;
         while((c = Bgetc(in)) != Beof){
                 if(isalpha(c))
                 if(waswhite)
                 if(nwords  2){
                         if(islower(c))
                                 c = toupper(c);
                         nwords++;
                 }
                 if(isspace(c))
                         waswhite = 1;
                 else
                         waswhite = 0;
                 if(c == '\n')
                         nwords = 0;
                 Bputc(out, c);
         }
         exits(0);

 }

 Noah


Simple, and wrong. You need to initialize waswhite to 1, not 0.



Re: [9fans] sed question (OT)

2009-10-30 Thread Eris Discordia
The script has a small bug one might say: it capitalizes the first two 
words on a line that are _not_ already capitalized. If one of the first two 
words is capitalized then the third will get capitalized.


--On Thursday, October 29, 2009 15:41 + Steve Simon 
st...@quintile.net wrote:



Sorry, not really the place for such questions but...

I always struggle with sed, awk is easy but sed makes my head hurt.

I am trying to capitalise the first tow words on each line (I could use
awk as well but I have to use sed so it seems churlish to start another
process).

capitalising the first word on the line is easy enough:

h
s/^(.).*/\1/
y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
x
s/^.(.*)/\1/
x
G
s/\n//

Though there maye be a much easier/more elegant way to do this,
but for the 2nd word it gets much harder.

What I really want is sam's ability to select a letter and operate on it
rather than everything being line based as sed seems to be.

any neat solutions? (extra points awarded for use of the branch operator
:-)

-Steve









Re: [9fans] sed question (OT)

2009-10-30 Thread Eris Discordia

Listing of file 'sedscr:'


s/^/ /;
s/$/aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ/;
s/ \([a-z]\)\(.*\1\)\(.\)/ \3\2\3/;
s/ \([a-z]\)\(.*\1\)\(.\)/ \3\2\3/;
s/.\{52\}$//;
s/ //;


$ echo This is a test | sed -f sedscr
This Is a test
$ echo someone forgot to capitalize | sed -f sedscr
Someone Forgot to capitalize

This works with '/usr/bin/sed' from a FreeBSD 6.2-RELEASE installation.

Above sed script stolen from:

http://dervish.wsisiz.edu.pl/~bse26236/batutil/help/sed/CAPITALI.HTM

With a minor change: first three words to first two words.




--On Thursday, October 29, 2009 15:41 + Steve Simon 
st...@quintile.net wrote:



Sorry, not really the place for such questions but...

I always struggle with sed, awk is easy but sed makes my head hurt.

I am trying to capitalise the first tow words on each line (I could use
awk as well but I have to use sed so it seems churlish to start another
process).

capitalising the first word on the line is easy enough:

h
s/^(.).*/\1/
y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
x
s/^.(.*)/\1/
x
G
s/\n//

Though there maye be a much easier/more elegant way to do this,
but for the 2nd word it gets much harder.

What I really want is sam's ability to select a letter and operate on it
rather than everything being line based as sed seems to be.

any neat solutions? (extra points awarded for use of the branch operator
:-)

-Steve









Re: [9fans] sed question (OT)

2009-10-30 Thread dave . l

You can do it, definitely.

Caveat: I'm in bed with a virus and the brain's on impulse power
so these are untested and may be highly suboptimal.

Is the input guaranteed to have 2 words on each line?
What are your definitions of words and blanks?

I know from your snippet that there's no leading blanks and no empty  
lines.


Assuming there are 2 words on every line, something like:
h
s/[A-Za-z0-9_-]+(.).*/\1/
y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
G
s/(.)\n([A-Za-z0-9_-]+).(.*)/\2\1\3/

ought to roughly work after your fragment.

If = 2 words per line isn't assumed:
h
t urnofflag
: urnofflag
s/[A-Za-z0-9_-]+[^ A-Za-z0-9_-]*(.).*/\1/
t for2
b cosnot2wds
: for2
y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
G
s/(.)\n([A-Za-z0-9_-]+[^ A-Za-z0-9_-]*).(.*)/\2\1\3/
b
: cosnot2wds
g

Bizarrely, within it's limitations (\n, \0, size limits), sed is, in  
some sense, complete,
since you can store any number of things in the spaces (using  /(.* 
\n)/ etc.) and branch conditionally.


Another insane possibility, since there are only 26 variations, is to  
do:

s/^a/A/
s/^([A-Z][A-Za-z0-9]+[^ A-Za-z0-9_-]*)a/\1A/
s/^b/B/
s/^([A-Z][A-Za-z0-9]+[^ A-Za-z0-9_-]*)b/\1B/

You can of course, use sed to create the above script like so:
echo abcdefghijklmnopqrstuvwxyz | sed ...
Filling in the ellipses is left as an exercise for the already addled  
reader.


BTW: if you're shovelling a lot of this kind of muck,
it may, paradoxically, be easier to do it on the command line and use  
your shell's variables for the repeated bits of regexps, commands etc.
The only caveats are that this technique will curdle your brain even  
more than sed already does
and it may, oddly, be the exception to the rule that rc is more  
elegant than sh, due to caret vs. double-quotes.


Apologies for grandstanding, but I used to do this sort of stuff for a  
living.
I wrote a piece of training courseware for sed once which had far  
worse excesses than the above as examples.

RFC-822 header-reassembly anyone?

I also used to get my intellectual rocks off on stuff like this until  
I finally grew up (in my late 40s).


Dave.

SEE ALSO
teco, assembler, qed.


On 29 Oct 2009, at 15:41, Steve Simon wrote:


Sorry, not really the place for such questions but...

I always struggle with sed, awk is easy but sed makes my head hurt.

I am trying to capitalise the first tow words on each line (I could  
use awk
as well but I have to use sed so it seems churlish to start another  
process).


capitalising the first word on the line is easy enough:

h
s/^(.).*/\1/
y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
x
s/^.(.*)/\1/
x
G
s/\n//

Though there maye be a much easier/more elegant way to do this,
but for the 2nd word it gets much harder.

What I really want is sam's ability to select a letter and operate  
on it

rather than everything being line based as sed seems to be.

any neat solutions? (extra points awarded for use of the branch  
operator :-)


-Steve






Re: [9fans] sed question (OT)

2009-10-30 Thread erik quanstrom
On Fri Oct 30 11:31:24 EDT 2009, dav...@mac.com wrote:
 You can do it, definitely.
 

well played!

- erik



Re: [9fans] sed question (OT)

2009-10-30 Thread W B Hacker

Eris Discordia wrote:
The script has a small bug one might say: it capitalizes the first two 
words on a line that are _not_ already capitalized. If one of the first 
two words is capitalized then the third will get capitalized.


Call me a Dinosaur, but - so long as it is ASCII or EBCDIC it is relatively 
trivial to implement that in hardware AND NOT have the issue of altering any but 
the first two words AND NOT have issues where there is only one word or a 
numeral or punctuation or hidden/control character rather than alpha.


Hint: Among other simple stuff, needs XOR capability.

'Dinosaur' 'coz the last time I did one of the key portions of it was converting 
a Data Printer CT-1064 chaintrain from HP-3000 MKIII use to work with an S-100 
Z-80. That capitalized *every* alpha character, but took just two 74-series IC's 
to replace a pair of lookup-table PROMS.


One would need to add logic to detect space or newline, set/unset a few latches 
- not a lot more.


Could have built it in less time than this thread has been running...

;-)


Bill


--On Thursday, October 29, 2009 15:41 + Steve Simon 
st...@quintile.net wrote:



Sorry, not really the place for such questions but...

I always struggle with sed, awk is easy but sed makes my head hurt.

I am trying to capitalise the first tow words on each line (I could use
awk as well but I have to use sed so it seems churlish to start another
process).

capitalising the first word on the line is easy enough:

h
s/^(.).*/\1/
y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
x
s/^.(.*)/\1/
x
G
s/\n//

Though there maye be a much easier/more elegant way to do this,
but for the 2nd word it gets much harder.

What I really want is sam's ability to select a letter and operate on it
rather than everything being line based as sed seems to be.

any neat solutions? (extra points awarded for use of the branch operator
:-)

-Steve













Re: [9fans] sed question (OT) (OT) (OT)

2009-10-30 Thread Tim Newsham
Call me a Dinosaur, but - so long as it is ASCII or EBCDIC it is relatively 
trivial to implement that in hardware AND NOT have the issue of altering any 
but the first two words AND NOT have issues where there is only one word or a 
numeral or punctuation or hidden/control character rather than alpha.


You should have added an extra (OT) to the subject line.
I'm adding a few more just to be fair.


Could have built it in less time than this thread has been running...


then what have you been doing all this time?


Bill


Tim Newsham
http://www.thenewsh.com/~newsham/



Re: [9fans] sed question (OT) (OT) (OT) (OT) (OT)(OT)(OT)(OT)(OT)(OT)(OT)(OT)(OT)(OT)

2009-10-30 Thread W B Hacker

Tim Newsham wrote:
Call me a Dinosaur, but - so long as it is ASCII or EBCDIC it is 
relatively trivial to implement that in hardware AND NOT have the 
issue of altering any but the first two words AND NOT have issues 
where there is only one word or a numeral or punctuation or 
hidden/control character rather than alpha.


You should have added an extra (OT) to the subject line.
I'm adding a few more just to be fair.


Could have built it in less time than this thread has been running...


then what have you been doing all this time?


Bill


Tim Newsham
http://www.thenewsh.com/~newsham/




Honestly?

Trying to determine what a valid USE for capitalizing exactly the first 'n' 
words on a line might be.


Especially as it calls for ONE or TWO but never THREE or more.

Document 'sideheads', maybe??

- but those may not be limited to 2 words.

The need is as puzzling as some of the solutions..

Bill



Re: [9fans] sed question (OT)

2009-10-30 Thread Noah Evans
This kind of problem is character processing, which I would argue is
C's domain. You can massage awk and sed to do the job for you, but at
least for me it's conceptually simpler to just bang out the following
C program:

#include u.h
#include libc.h
#include bio.h

#define isupper(r)  (L'A' = (r)  (r) = L'Z')
#define islower(r)  (L'a' = (r)  (r) = L'z')
#define isalpha(r)  (isupper(r) || islower(r))
#define isspace(r)  ((r) == L' ' || (r) == L'\t' \
|| (0x0A = (r)  (r) = 0x0D))
#define toupper(r)  ((r)-'a'+'A')

void
usage(char *me)
{
fprint(2, %s: usage\n, me);
}

void
main(int argc, char **argv)
{
Biobuf in, out;
int c, waswhite, nwords;

ARGBEGIN{
default:
usage(argv[0]);
}ARGEND;
Binit(in, 0, OREAD);
Binit(out, 1, OWRITE);

waswhite = 0;
nwords = 0;
while((c = Bgetc(in)) != Beof){
if(isalpha(c))
if(waswhite)
if(nwords  2){
if(islower(c))
c = toupper(c);
nwords++;
}
if(isspace(c))
waswhite = 1;
else
waswhite = 0;
if(c == '\n')
nwords = 0;
Bputc(out, c);
}
exits(0);
}

Noah


On Thu, Oct 29, 2009 at 4:41 PM, Steve Simon st...@quintile.net wrote:
 Sorry, not really the place for such questions but...

 I always struggle with sed, awk is easy but sed makes my head hurt.

 I am trying to capitalise the first tow words on each line (I could use awk
 as well but I have to use sed so it seems churlish to start another process).

 capitalising the first word on the line is easy enough:

                        h
                        s/^(.).*/\1/
                        
 y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
                        x
                        s/^.(.*)/\1/
                        x
                        G
                        s/\n//

 Though there maye be a much easier/more elegant way to do this,
 but for the 2nd word it gets much harder.

 What I really want is sam's ability to select a letter and operate on it
 rather than everything being line based as sed seems to be.

 any neat solutions? (extra points awarded for use of the branch operator :-)

 -Steve





Re: [9fans] sed question (OT)

2009-10-29 Thread Lorenzo Bolla
To capitalize the first letter of each line wouldn't this be enough?

s/^./\u/

L.


On Thu, Oct 29, 2009 at 3:41 PM, Steve Simon st...@quintile.net wrote:

 Sorry, not really the place for such questions but...

 I always struggle with sed, awk is easy but sed makes my head hurt.

 I am trying to capitalise the first tow words on each line (I could use awk
 as well but I have to use sed so it seems churlish to start another
 process).

 capitalising the first word on the line is easy enough:

h
s/^(.).*/\1/

  y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
x
s/^.(.*)/\1/
x
G
s/\n//

 Though there maye be a much easier/more elegant way to do this,
 but for the 2nd word it gets much harder.

 What I really want is sam's ability to select a letter and operate on it
 rather than everything being line based as sed seems to be.

 any neat solutions? (extra points awarded for use of the branch operator
 :-)

 -Steve




Re: [9fans] sed question (OT)

2009-10-29 Thread W B Hacker

Steve Simon wrote:

Sorry, not really the place for such questions but...

I always struggle with sed, awk is easy but sed makes my head hurt.

I am trying to capitalise the first tow words on each line (I could use awk
as well but I have to use sed so it seems churlish to start another process).

capitalising the first word on the line is easy enough:

h
s/^(.).*/\1/
y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
x
s/^.(.*)/\1/
x
G
s/\n//

Though there maye be a much easier/more elegant way to do this,
but for the 2nd word it gets much harder.

What I really want is sam's ability to select a letter and operate on it
rather than everything being line based as sed seems to be.

any neat solutions? (extra points awarded for use of the branch operator :-)

-Steve




I'd be sore tempted to move the needful files into an environment where I could 
use multiple passes of 'rpl' (or 'back in the day' BRIEF).


BFBI .. far less capable tools, perhaps - BUT by the time you've figured out how 
to even *tell* awk or sed what to do, I'm working on some other task...


'If at first you don't succeed - cheat'

YMMV,

Bill



Re: [9fans] sed question (OT)

2009-10-29 Thread erik quanstrom
 To capitalize the first letter of each line wouldn't this be enough?
 
 s/^./\u/

; echo abc def | sed 's/^.\u/' 
sed: s command garbled: s/^.\u/

- erik



Re: [9fans] sed question (OT)

2009-10-29 Thread Iruata Souza
On Thu, Oct 29, 2009 at 2:08 PM, erik quanstrom quans...@quanstro.net wrote:
 To capitalize the first letter of each line wouldn't this be enough?

 s/^./\u/

 ; echo abc def | sed 's/^.\u/'
 sed: s command garbled: s/^.\u/


 i guess you missed the second slash



Re: [9fans] sed question (OT)

2009-10-29 Thread erik quanstrom
On Thu Oct 29 12:31:23 EDT 2009, iru.mu...@gmail.com wrote:
 On Thu, Oct 29, 2009 at 2:08 PM, erik quanstrom quans...@quanstro.net wrote:
  To capitalize the first letter of each line wouldn't this be enough?
 
  s/^./\u/
 
  ; echo abc def | sed 's/^.\u/'
  sed: s command garbled: s/^.\u/
 
 
  i guess you missed the second slash
 

now it is less helpful:

; echo abc def | sed 's/^./\u/'
uabc def

- erik



Re: [9fans] sed question (OT)

2009-10-29 Thread Iruata Souza
On Thu, Oct 29, 2009 at 2:06 PM, Lorenzo Bolla lbo...@gmail.com wrote:
 To capitalize the first letter of each line wouldn't this be enough?
 s/^./\u/

 L.

% echo rwrong | sed 's/^./\u/'
urwrong



Re: [9fans] sed question (OT)

2009-10-29 Thread Lorenzo Bolla
I forgot the 9.
This works for GNU sed version 4.2.1
L.

On Thu, Oct 29, 2009 at 4:33 PM, Iruata Souza iru.mu...@gmail.com wrote:

 On Thu, Oct 29, 2009 at 2:06 PM, Lorenzo Bolla lbo...@gmail.com wrote:
  To capitalize the first letter of each line wouldn't this be enough?
  s/^./\u/
 
  L.

 % echo rwrong | sed 's/^./\u/'
 urwrong




Re: [9fans] sed question (OT)

2009-10-29 Thread Jason Catena
 Sorry, not really the place for such questions but...

Try stackoverflow.com.  They delight in problems such as these.

 I am trying to capitalise the first tow words on each line

I store the original line with h, and then pull it back out repeatedly
with G to mangle it.
I got far enough to translate first second ... to First s with this:

h
s/^(.).*/\1/
y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
G
s/^.([^ ]+ ).*/\1/
s/^.([^ ]+)$/\1/
G
s/^.[^ ]+ (.).*/\1/
#y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
#3y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
s/\n//g

There's a couple problems.  (1) It doesn't handle the case with only
one word on a line, because it's hard to tell, later on, that I pulled
out the single word once already. (2) I'd like to put in one of the
commented-out y commands, but (2a) the first uppercases the entire
pattern space, and (2b) the second refers to line 3 of the entire
file, not line 3 of the pattern space.

 -Steve

Jason Catena