Re: GREP/Regex for dummies?

2011-09-06 Thread Ronald J Kimball
On Tue, Sep 06, 2011 at 07:50:30AM -0700, jefferis wrote:
 Hi,all. I'm trying to do the research and learn the proper syntax, but
 I keep getting confused by the web page descriptions. Just a little
 background: I am a visual learner. I was great in Geometry but my
 algebra teacher said Jeff tries very hard but has no natural
 aptitude :-)  In other words, I can see pictures in my head and
 rotate 3D puzzles, but abstract equations with symbolic language
 become a complete mystery to me. Rather than pester this group for
 basic questions, can anyone recommend a short site or GOOD, SIMPLE
 book that would show me how?

If you want to learn how regular expressions work and how to use them, I'd
recommend Mastering Regular Expressions, from O'Reilly:
http://oreilly.com/catalog/9780596528126

If you just want a syntax reference, they also have a Regular Expression
Pocket Reference: http://oreilly.com/catalog/9780596514273

As to visual learning, you might look for a text that explains regular
expressions in terms of finite automata, with diagrams.  I don't remember
if Mastering Regular Expressions includes some of that.


 In the meantime, I'm faced with a complex replacement problem I
 would appreciate instruction on it.
 
 My client wants me to take a tool tip for hundreds of items and wants
 me to add a tab after the item number and then remove numbers at the
 end of the phrase.
 
 As a sample the numbers and letters up to the first space are the item
 number. Description text is variable in lenght and the last set of
 numbers are to be removed. The codes are only numbers preceded by a
 space.
 
 26119BCZZD002CR01 Edward Piguet Tourbillon with Diamond Bezel White
 and lots of gold stuff  73245

We need a little more information to solve this.

First, what is the form of the item numbers?  Do they always start with the
same digits, or have the same pattern of digits and letters?

Second, what is the context of these strings?  Are they attribute values in
HTML pages, or one per line in text files, or something else?

You need to not just match the strings you want, but also not match the
strings you don't want.  That's why we need that additional information.


That said, your search/replace would look something like this:

Find

(\d{5}[A-Z]{5}\d{3}[A-Z]{2}\d{2}) (.*?) +(\d+)

Replace

\1\t\2

Here I've assumed that item numbers are identified by the pattern of
digits and letters.  It matches the item number, a space, the description,
and then one or more spaces followed by a number.  It replaces this with
the item number, a tab, and the description.


Ronald

-- 
You received this message because you are subscribed to the 
BBEdit Talk discussion group on Google Groups.
To post to this group, send email to bbedit@googlegroups.com
To unsubscribe from this group, send email to
bbedit+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/bbedit?hl=en
If you have a feature request or would like to report a problem, 
please email supp...@barebones.com rather than posting to the group.
Follow @bbedit on Twitter: http://www.twitter.com/bbedit


Re: GREP/Regex for dummies?

2011-09-06 Thread Fletcher Sandbeck
On Sep 6, 2011, at 7:50 AM, jefferis wrote:

 My client wants me to take a tool tip for hundreds of items and wants
 me to add a tab after the item number and then remove numbers at the
 end of the phrase.
 
 As a sample the numbers and letters up to the first space are the item
 number. Description text is variable in lenght and the last set of
 numbers are to be removed. The codes are only numbers preceded by a
 space.
 
 26119BCZZD002CR01 Edward Piguet Tourbillon with Diamond Bezel White
 and lots of gold stuff  73245

BBEdit's help contains a good introduction to regular expressions and GREP.  
I find most of the O'Reilly books little better than the help available online, 
but YMMV.  I usually just google regular expression converting telephone 
numbers or whatever to find common patterns that I need, but of course you do 
need to know the basics to see if you're on the right track.

You can think of the regexp is a little machine that consumes the string 
character by character.  When one part of the pattern can't match any more it 
goes on to the next part.  [A-Z0-9]+ will match a string of one or more capital 
letters or numbers.  [0-9]+ will match a string of numbers.  .* will match 
anything.  So, your example will be matched by the following.  You can test 
this by entering it into the find dialog and then command-g through some 
matches.

  [A-Z0-9]+ .* [0-9]+

To do a replace we identify the parts of the string we want to keep with 
parentheses.  We want to keep the first and second parts and get rid of the 
third.  The parentheses are otherwise basically ignored.  They don't affect the 
match.

  ([A-Z0-9]+) (.*) [0-9]+

To create the replacement we reference the parentheses with \1 and \2 and add 
the tab.

  \1\t\2

Finally, if your data is in a file with one entry per line then you can add ^ 
and $ to the find pattern to ensure that you match only entire lines.  
Otherwise, if the data is embedded in other text you might need something a 
little different.

  Find: ^([A-Z0-9]+) (.*) [0-9]+$
  Replace: \1\t\2

[fletcher]

-- 
You received this message because you are subscribed to the 
BBEdit Talk discussion group on Google Groups.
To post to this group, send email to bbedit@googlegroups.com
To unsubscribe from this group, send email to
bbedit+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/bbedit?hl=en
If you have a feature request or would like to report a problem, 
please email supp...@barebones.com rather than posting to the group.
Follow @bbedit on Twitter: http://www.twitter.com/bbedit


Re: GREP/Regex for dummies?

2011-09-06 Thread Jefferis Peterson
On 9/6/11 12:00 PM, Ronald J Kimball r...@tamias.net wrote:
Thanks for the references!  I'll look into them. I like to understand what
I'm doing and that is why just having the pocket reference may not be enough
for me... Also like looking at formulas alone... It is harder for me to
grasp what I'm doing.
 We need a little more information to solve this.
 
 First, what is the form of the item numbers?  Do they always start with the
 same digits, or have the same pattern of digits and letters?

The first set is a pattern of letters and numbers with no repeating values,
so 0-9, A-Z, would include all possible combinations, and vary in length.
The only consistent is the item ID is followed by a space.
 The trailing text to be removed are all numbers without alphabet
characters.
 Second, what is the context of these strings?  Are they attribute values in
 HTML pages, or one per line in text files, or something else?

I have put each line into BBedit as single lines  followed by \r
I'll probably turn the file into a table with 2 columns and multiple rows in
BBEdit ( Item # followed by description)
 and then place into an html page, but that is the easy part.
 
 You need to not just match the strings you want, but also not match the
 strings you don't want.  That's why we need that additional information.
 
 
 That said, your search/replace would look something like this:
 
 Find
 
 (\d{5}[A-Z]{5}\d{3}[A-Z]{2}\d{2}) (.*?) +(\d+)
 
 Replace
 
 \1\t\2
 
 Here I've assumed that item numbers are identified by the pattern of
 digits and letters.  It matches the item number, a space, the description,
 and then one or more spaces followed by a number.  It replaces this with
 the item number, a tab, and the description.


Jefferis Peterson, Pres.
Web Design and Marketing
http://www.PetersonSales.com
(724)-482-2015


-- 
You received this message because you are subscribed to the 
BBEdit Talk discussion group on Google Groups.
To post to this group, send email to bbedit@googlegroups.com
To unsubscribe from this group, send email to
bbedit+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/bbedit?hl=en
If you have a feature request or would like to report a problem, 
please email supp...@barebones.com rather than posting to the group.
Follow @bbedit on Twitter: http://www.twitter.com/bbedit


Re: GREP/Regex for dummies?

2011-09-06 Thread Jefferis Peterson
On 9/6/11 12:24 PM, Fletcher Sandbeck fletc...@cumuli.com wrote:

 BBEdit's help contains a good introduction to regular expressions and GREP.
 I find most of the O'Reilly books little better than the help available
 online, but YMMV.  I usually just google regular expression converting
 telephone numbers or whatever to find common patterns that I need, but of
 course you do need to know the basics to see if you're on the right track.

I read the reviews and thought I should try something a bit more basic so I
ordered: Regular Expressions Cookbook [Paperback]
Jan Goyvaerts (Author), Steven Levithan (Author)
Which appears to be a better introduction for someone my speed.

For some reason this worked well without the line delimiters. With the ^ $,
it only found one line and no more:
 

  Find: ^([A-Z0-9]+) (.*) [0-9]+$
  Replace: \1\t\2

But without the delimiters, it processed the page perfectly.
Thanks for my first lesson in depth OBI-WAN :-)

I really appreciate it.
Jeff 

Jefferis Peterson, Pres.
Web Design and Marketing
http://www.PetersonSales.com
(724)-482-2015


-- 
You received this message because you are subscribed to the 
BBEdit Talk discussion group on Google Groups.
To post to this group, send email to bbedit@googlegroups.com
To unsubscribe from this group, send email to
bbedit+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/bbedit?hl=en
If you have a feature request or would like to report a problem, 
please email supp...@barebones.com rather than posting to the group.
Follow @bbedit on Twitter: http://www.twitter.com/bbedit


Re: GREP/Regex for dummies?

2011-09-06 Thread Bruce Van Allen

On 2011-09-06, Jefferis Peterson wrote:


Find: ^([A-Z0-9]+) (.*) [0-9]+$
Replace: \1\t\2

But without the delimiters, it processed the page perfectly.
Thanks for my first lesson in depth OBI-WAN :-)


A guess is that the the non-matching lines have some extra 
whitespace at the start or end of the line. Unless I'm quite 
sure about it, I don't assume that a line has no extra 
whitespace, especially at the end. So when using ^  $ in a 
pattern, I tend to include an expression for zero or more 
whitespace characters, \s*, or zero or more tabs or spaces, [ 
\t]*, thus:


Find: ^\s*([A-Z0-9]+) (.*) [0-9]+\*s$
Replace: \1\t\2

An additional note is the dot . matches any character EXCEPT 
the newline (\n or \r), unless you tell it otherwise*. If that 
weren't the case, without the ^  $ the above pattern would find 
ONE match: everything from the first id # to the description of 
the last item in the list.


(*As you move up in levels of regex , you'll encounter pattern 
modifiers, which do things like allowing . to match newline 
characters or making character matches not case-sensitive.)


HTH


   - Bruce

_bruce__van_allen__santa_cruz_ca_

--
You received this message because you are subscribed to the 
BBEdit Talk discussion group on Google Groups.

To post to this group, send email to bbedit@googlegroups.com
To unsubscribe from this group, send email to
bbedit+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/bbedit?hl=en
If you have a feature request or would like to report a problem, 
please email supp...@barebones.com rather than posting to the group.

Follow @bbedit on Twitter: http://www.twitter.com/bbedit


Re: GREP/Regex for dummies?

2011-09-06 Thread Jefferis Peterson
On 9/6/11 2:24 PM, Bruce Van Allen b...@cruzio.com wrote:

 
 Find: ^\s*([A-Z0-9]+) (.*) [0-9]+\*s$
 Replace: \1\t\2

That didn't work either, but I opened a fresh copy of the original text and
removed the \s*  and it worked on all lines

^([A-Z0-9]+) (.*) [0-9]+ $

Just curious... A space in the Find field using Grep... Isn't seen is it?
Like the space before the $  ?

If no spaces are found before or after in your original Find above, does
that stop the search?   IOW, what is the symbol for optional but not
necessary spaces? 

Jefferis Peterson, Pres.
Web Design and Marketing
http://www.PetersonSales.com
(724)-482-2015


-- 
You received this message because you are subscribed to the 
BBEdit Talk discussion group on Google Groups.
To post to this group, send email to bbedit@googlegroups.com
To unsubscribe from this group, send email to
bbedit+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/bbedit?hl=en
If you have a feature request or would like to report a problem, 
please email supp...@barebones.com rather than posting to the group.
Follow @bbedit on Twitter: http://www.twitter.com/bbedit


Re: GREP/Regex for dummies?

2011-09-06 Thread Chip Warden
On Sep 6, 2011, at 12:01 PM, Jefferis Peterson wrote:
 
 I read the reviews and thought I should try something a bit more basic so I
 ordered: Regular Expressions Cookbook [Paperback]
 Jan Goyvaerts (Author), Steven Levithan (Author)
 Which appears to be a better introduction for someone my speed.

Jeff,

I own this book, and it it good for case studies, but not for learning how 
regex works, IMO. 

Just don't want you to be disappointed.

Chip


-- 
You received this message because you are subscribed to the 
BBEdit Talk discussion group on Google Groups.
To post to this group, send email to bbedit@googlegroups.com
To unsubscribe from this group, send email to
bbedit+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/bbedit?hl=en
If you have a feature request or would like to report a problem, 
please email supp...@barebones.com rather than posting to the group.
Follow @bbedit on Twitter: http://www.twitter.com/bbedit


Re: GREP/Regex for dummies?

2011-09-06 Thread Doug McNutt
At 14:58 -0400 9/6/11, Jefferis Peterson wrote:

^([A-Z0-9]+) (.*) [0-9]+ $

Just curious... A space in the Find field using Grep... Isn't seen is it?
Like the space before the $  ?

If no spaces are found before or after in your original Find above, does
that stop the search?   IOW, what is the symbol for optional but not
necessary spaces? 


The spaces are treated as part of the expression. You are demanding their 
presence to get a hit.

That's more obvious in perl where regular expressions are always quoted, 
usually using the / character as the quoting character.

/^([A-Z0-9]+) (.*) [0-9]+ $/

Is what you would use. There are three spaces in it and all are required to get 
a hit.

(.*) followed by a space is curious. The * will match the space and demanding a 
real space can cause problems with the greedy concept where matches go as far 
as possible. (.*?) would turn off the greediness.  The ? mark can also be used 
to specify an optional character as in ( ?) where those parentheses are not 
required unless you want a capture.

Note also that [A-Z] will match only upper case.  There are also \d and \w for 
matching more general letters and digits.

And I still use the Regular Expression Bestiary, a chapter in Programming 
Perl which is another O'Reilly book that's not for geometric readers.


-- 

   Fe++
//  \
Fe++  Fe++
  |   ||
Fe++  Fe++
   \\/
   Fe++

-- 
You received this message because you are subscribed to the 
BBEdit Talk discussion group on Google Groups.
To post to this group, send email to bbedit@googlegroups.com
To unsubscribe from this group, send email to
bbedit+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/bbedit?hl=en
If you have a feature request or would like to report a problem, 
please email supp...@barebones.com rather than posting to the group.
Follow @bbedit on Twitter: http://www.twitter.com/bbedit


Re: GREP/Regex for dummies?

2011-09-06 Thread John Delacour

At 14:58 -0400 06/09/2011, Jefferis Peterson wrote:


IOW, what is the symbol for optional but not
necessary spaces?


This should be near the first page of anything you read about regex.

ab\s*c   will match abc and ab[any amount of white space]c


Paste this into a BBEdit doc and run it from the #! menu:

#!/usr/bin/perl
# put any number  of white space chars or none between b and c
$_ = ab   c  ;
/ab\s*c/ ? print matched : print no match;

JD

--
You received this message because you are subscribed to the 
BBEdit Talk discussion group on Google Groups.

To post to this group, send email to bbedit@googlegroups.com
To unsubscribe from this group, send email to
bbedit+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/bbedit?hl=en
If you have a feature request or would like to report a problem, 
please email supp...@barebones.com rather than posting to the group.

Follow @bbedit on Twitter: http://www.twitter.com/bbedit