Re: GREP/Regex for dummies?
On Tue, Sep 06, 2011 at 07:50:30AM -0700, jefferis wrote: Hi,all. I'm trying to do the research and learn the proper syntax, but I keep getting confused by the web page descriptions. Just a little background: I am a visual learner. I was great in Geometry but my algebra teacher said Jeff tries very hard but has no natural aptitude :-) In other words, I can see pictures in my head and rotate 3D puzzles, but abstract equations with symbolic language become a complete mystery to me. Rather than pester this group for basic questions, can anyone recommend a short site or GOOD, SIMPLE book that would show me how? If you want to learn how regular expressions work and how to use them, I'd recommend Mastering Regular Expressions, from O'Reilly: http://oreilly.com/catalog/9780596528126 If you just want a syntax reference, they also have a Regular Expression Pocket Reference: http://oreilly.com/catalog/9780596514273 As to visual learning, you might look for a text that explains regular expressions in terms of finite automata, with diagrams. I don't remember if Mastering Regular Expressions includes some of that. In the meantime, I'm faced with a complex replacement problem I would appreciate instruction on it. My client wants me to take a tool tip for hundreds of items and wants me to add a tab after the item number and then remove numbers at the end of the phrase. As a sample the numbers and letters up to the first space are the item number. Description text is variable in lenght and the last set of numbers are to be removed. The codes are only numbers preceded by a space. 26119BCZZD002CR01 Edward Piguet Tourbillon with Diamond Bezel White and lots of gold stuff 73245 We need a little more information to solve this. First, what is the form of the item numbers? Do they always start with the same digits, or have the same pattern of digits and letters? Second, what is the context of these strings? Are they attribute values in HTML pages, or one per line in text files, or something else? You need to not just match the strings you want, but also not match the strings you don't want. That's why we need that additional information. That said, your search/replace would look something like this: Find (\d{5}[A-Z]{5}\d{3}[A-Z]{2}\d{2}) (.*?) +(\d+) Replace \1\t\2 Here I've assumed that item numbers are identified by the pattern of digits and letters. It matches the item number, a space, the description, and then one or more spaces followed by a number. It replaces this with the item number, a tab, and the description. Ronald -- You received this message because you are subscribed to the BBEdit Talk discussion group on Google Groups. To post to this group, send email to bbedit@googlegroups.com To unsubscribe from this group, send email to bbedit+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/bbedit?hl=en If you have a feature request or would like to report a problem, please email supp...@barebones.com rather than posting to the group. Follow @bbedit on Twitter: http://www.twitter.com/bbedit
Re: GREP/Regex for dummies?
On Sep 6, 2011, at 7:50 AM, jefferis wrote: My client wants me to take a tool tip for hundreds of items and wants me to add a tab after the item number and then remove numbers at the end of the phrase. As a sample the numbers and letters up to the first space are the item number. Description text is variable in lenght and the last set of numbers are to be removed. The codes are only numbers preceded by a space. 26119BCZZD002CR01 Edward Piguet Tourbillon with Diamond Bezel White and lots of gold stuff 73245 BBEdit's help contains a good introduction to regular expressions and GREP. I find most of the O'Reilly books little better than the help available online, but YMMV. I usually just google regular expression converting telephone numbers or whatever to find common patterns that I need, but of course you do need to know the basics to see if you're on the right track. You can think of the regexp is a little machine that consumes the string character by character. When one part of the pattern can't match any more it goes on to the next part. [A-Z0-9]+ will match a string of one or more capital letters or numbers. [0-9]+ will match a string of numbers. .* will match anything. So, your example will be matched by the following. You can test this by entering it into the find dialog and then command-g through some matches. [A-Z0-9]+ .* [0-9]+ To do a replace we identify the parts of the string we want to keep with parentheses. We want to keep the first and second parts and get rid of the third. The parentheses are otherwise basically ignored. They don't affect the match. ([A-Z0-9]+) (.*) [0-9]+ To create the replacement we reference the parentheses with \1 and \2 and add the tab. \1\t\2 Finally, if your data is in a file with one entry per line then you can add ^ and $ to the find pattern to ensure that you match only entire lines. Otherwise, if the data is embedded in other text you might need something a little different. Find: ^([A-Z0-9]+) (.*) [0-9]+$ Replace: \1\t\2 [fletcher] -- You received this message because you are subscribed to the BBEdit Talk discussion group on Google Groups. To post to this group, send email to bbedit@googlegroups.com To unsubscribe from this group, send email to bbedit+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/bbedit?hl=en If you have a feature request or would like to report a problem, please email supp...@barebones.com rather than posting to the group. Follow @bbedit on Twitter: http://www.twitter.com/bbedit
Re: GREP/Regex for dummies?
On 9/6/11 12:00 PM, Ronald J Kimball r...@tamias.net wrote: Thanks for the references! I'll look into them. I like to understand what I'm doing and that is why just having the pocket reference may not be enough for me... Also like looking at formulas alone... It is harder for me to grasp what I'm doing. We need a little more information to solve this. First, what is the form of the item numbers? Do they always start with the same digits, or have the same pattern of digits and letters? The first set is a pattern of letters and numbers with no repeating values, so 0-9, A-Z, would include all possible combinations, and vary in length. The only consistent is the item ID is followed by a space. The trailing text to be removed are all numbers without alphabet characters. Second, what is the context of these strings? Are they attribute values in HTML pages, or one per line in text files, or something else? I have put each line into BBedit as single lines followed by \r I'll probably turn the file into a table with 2 columns and multiple rows in BBEdit ( Item # followed by description) and then place into an html page, but that is the easy part. You need to not just match the strings you want, but also not match the strings you don't want. That's why we need that additional information. That said, your search/replace would look something like this: Find (\d{5}[A-Z]{5}\d{3}[A-Z]{2}\d{2}) (.*?) +(\d+) Replace \1\t\2 Here I've assumed that item numbers are identified by the pattern of digits and letters. It matches the item number, a space, the description, and then one or more spaces followed by a number. It replaces this with the item number, a tab, and the description. Jefferis Peterson, Pres. Web Design and Marketing http://www.PetersonSales.com (724)-482-2015 -- You received this message because you are subscribed to the BBEdit Talk discussion group on Google Groups. To post to this group, send email to bbedit@googlegroups.com To unsubscribe from this group, send email to bbedit+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/bbedit?hl=en If you have a feature request or would like to report a problem, please email supp...@barebones.com rather than posting to the group. Follow @bbedit on Twitter: http://www.twitter.com/bbedit
Re: GREP/Regex for dummies?
On 9/6/11 12:24 PM, Fletcher Sandbeck fletc...@cumuli.com wrote: BBEdit's help contains a good introduction to regular expressions and GREP. I find most of the O'Reilly books little better than the help available online, but YMMV. I usually just google regular expression converting telephone numbers or whatever to find common patterns that I need, but of course you do need to know the basics to see if you're on the right track. I read the reviews and thought I should try something a bit more basic so I ordered: Regular Expressions Cookbook [Paperback] Jan Goyvaerts (Author), Steven Levithan (Author) Which appears to be a better introduction for someone my speed. For some reason this worked well without the line delimiters. With the ^ $, it only found one line and no more: Find: ^([A-Z0-9]+) (.*) [0-9]+$ Replace: \1\t\2 But without the delimiters, it processed the page perfectly. Thanks for my first lesson in depth OBI-WAN :-) I really appreciate it. Jeff Jefferis Peterson, Pres. Web Design and Marketing http://www.PetersonSales.com (724)-482-2015 -- You received this message because you are subscribed to the BBEdit Talk discussion group on Google Groups. To post to this group, send email to bbedit@googlegroups.com To unsubscribe from this group, send email to bbedit+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/bbedit?hl=en If you have a feature request or would like to report a problem, please email supp...@barebones.com rather than posting to the group. Follow @bbedit on Twitter: http://www.twitter.com/bbedit
Re: GREP/Regex for dummies?
On 2011-09-06, Jefferis Peterson wrote: Find: ^([A-Z0-9]+) (.*) [0-9]+$ Replace: \1\t\2 But without the delimiters, it processed the page perfectly. Thanks for my first lesson in depth OBI-WAN :-) A guess is that the the non-matching lines have some extra whitespace at the start or end of the line. Unless I'm quite sure about it, I don't assume that a line has no extra whitespace, especially at the end. So when using ^ $ in a pattern, I tend to include an expression for zero or more whitespace characters, \s*, or zero or more tabs or spaces, [ \t]*, thus: Find: ^\s*([A-Z0-9]+) (.*) [0-9]+\*s$ Replace: \1\t\2 An additional note is the dot . matches any character EXCEPT the newline (\n or \r), unless you tell it otherwise*. If that weren't the case, without the ^ $ the above pattern would find ONE match: everything from the first id # to the description of the last item in the list. (*As you move up in levels of regex , you'll encounter pattern modifiers, which do things like allowing . to match newline characters or making character matches not case-sensitive.) HTH - Bruce _bruce__van_allen__santa_cruz_ca_ -- You received this message because you are subscribed to the BBEdit Talk discussion group on Google Groups. To post to this group, send email to bbedit@googlegroups.com To unsubscribe from this group, send email to bbedit+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/bbedit?hl=en If you have a feature request or would like to report a problem, please email supp...@barebones.com rather than posting to the group. Follow @bbedit on Twitter: http://www.twitter.com/bbedit
Re: GREP/Regex for dummies?
On 9/6/11 2:24 PM, Bruce Van Allen b...@cruzio.com wrote: Find: ^\s*([A-Z0-9]+) (.*) [0-9]+\*s$ Replace: \1\t\2 That didn't work either, but I opened a fresh copy of the original text and removed the \s* and it worked on all lines ^([A-Z0-9]+) (.*) [0-9]+ $ Just curious... A space in the Find field using Grep... Isn't seen is it? Like the space before the $ ? If no spaces are found before or after in your original Find above, does that stop the search? IOW, what is the symbol for optional but not necessary spaces? Jefferis Peterson, Pres. Web Design and Marketing http://www.PetersonSales.com (724)-482-2015 -- You received this message because you are subscribed to the BBEdit Talk discussion group on Google Groups. To post to this group, send email to bbedit@googlegroups.com To unsubscribe from this group, send email to bbedit+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/bbedit?hl=en If you have a feature request or would like to report a problem, please email supp...@barebones.com rather than posting to the group. Follow @bbedit on Twitter: http://www.twitter.com/bbedit
Re: GREP/Regex for dummies?
On Sep 6, 2011, at 12:01 PM, Jefferis Peterson wrote: I read the reviews and thought I should try something a bit more basic so I ordered: Regular Expressions Cookbook [Paperback] Jan Goyvaerts (Author), Steven Levithan (Author) Which appears to be a better introduction for someone my speed. Jeff, I own this book, and it it good for case studies, but not for learning how regex works, IMO. Just don't want you to be disappointed. Chip -- You received this message because you are subscribed to the BBEdit Talk discussion group on Google Groups. To post to this group, send email to bbedit@googlegroups.com To unsubscribe from this group, send email to bbedit+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/bbedit?hl=en If you have a feature request or would like to report a problem, please email supp...@barebones.com rather than posting to the group. Follow @bbedit on Twitter: http://www.twitter.com/bbedit
Re: GREP/Regex for dummies?
At 14:58 -0400 9/6/11, Jefferis Peterson wrote: ^([A-Z0-9]+) (.*) [0-9]+ $ Just curious... A space in the Find field using Grep... Isn't seen is it? Like the space before the $ ? If no spaces are found before or after in your original Find above, does that stop the search? IOW, what is the symbol for optional but not necessary spaces? The spaces are treated as part of the expression. You are demanding their presence to get a hit. That's more obvious in perl where regular expressions are always quoted, usually using the / character as the quoting character. /^([A-Z0-9]+) (.*) [0-9]+ $/ Is what you would use. There are three spaces in it and all are required to get a hit. (.*) followed by a space is curious. The * will match the space and demanding a real space can cause problems with the greedy concept where matches go as far as possible. (.*?) would turn off the greediness. The ? mark can also be used to specify an optional character as in ( ?) where those parentheses are not required unless you want a capture. Note also that [A-Z] will match only upper case. There are also \d and \w for matching more general letters and digits. And I still use the Regular Expression Bestiary, a chapter in Programming Perl which is another O'Reilly book that's not for geometric readers. -- Fe++ // \ Fe++ Fe++ | || Fe++ Fe++ \\/ Fe++ -- You received this message because you are subscribed to the BBEdit Talk discussion group on Google Groups. To post to this group, send email to bbedit@googlegroups.com To unsubscribe from this group, send email to bbedit+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/bbedit?hl=en If you have a feature request or would like to report a problem, please email supp...@barebones.com rather than posting to the group. Follow @bbedit on Twitter: http://www.twitter.com/bbedit
Re: GREP/Regex for dummies?
At 14:58 -0400 06/09/2011, Jefferis Peterson wrote: IOW, what is the symbol for optional but not necessary spaces? This should be near the first page of anything you read about regex. ab\s*c will match abc and ab[any amount of white space]c Paste this into a BBEdit doc and run it from the #! menu: #!/usr/bin/perl # put any number of white space chars or none between b and c $_ = ab c ; /ab\s*c/ ? print matched : print no match; JD -- You received this message because you are subscribed to the BBEdit Talk discussion group on Google Groups. To post to this group, send email to bbedit@googlegroups.com To unsubscribe from this group, send email to bbedit+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/bbedit?hl=en If you have a feature request or would like to report a problem, please email supp...@barebones.com rather than posting to the group. Follow @bbedit on Twitter: http://www.twitter.com/bbedit