Hi Harvey, Regex is probably not the best thing to use to fix HTML. HTML Tidy will probably be a better solution.
Looking at your regex, a few comments: - Do you really need to use \s (which will match a space, tab, carriage return, new line) or will a space suffice? - The pattern in the capturing parentheses probably could be simplified to something like: .*? -- NOTE: you would wrap that pattern in capturing parentheses and put a trailing space after the closing parenthesis Hard to do regex here, but maybe something like this (untested): src *= *(.*?) NOTE: there is a trailing space in the regex. The replacement string would be something like this (untested again): "$1" Hope this helps. On Jun 28, 5:50 pm, [email protected] wrote: > Thanks for the replies everyone. My mail is with Webdrive so I lost > email shortly after posting this request, so I couldn't check replies or > reply myself any sooner. I managed to find my own solution in the meantime. > > In this case, I only really cared about missing src attributes in img > tags, so this is what I came up with. > > src\s*=\s*([/a-zA-z0-9].*?)(>|( [a-z]+)=) > > Which needs to be run at least twice to clean all attributes in a tag. > > Thanks, > > Harvey. > > On 28/06/2011 10:24 a.m., Matthew Whyte wrote: > > > > > > > > > > > Hi Harvey, > > I don't have a regex handy, but from memory the last time I needed to > > do something similar I used the "clean up HTML" option in Dreamweaver, > > which did the trick. (I don't use Dreamweaver for anything else, I've > > only got it because it came part of the Adobe Suite!) > > > Cheers, > > > Matthew Whyte > > > Managing Director | digiCreative > > > T > > > +64 7 959 8230 > > > F > > > +64 7 974 9059 > > > E > > > [email protected] <mailto:[email protected]> > > > W > > > digicreative.co.nz <http://digicreative.co.nz/> > > > digiCreative > > > 5 King St | PO Box 19492, Hamilton, New Zealand > > > ------------------------------------------------------------------------ > > > The content of this email is confidential and may be legally > > privileged. If it is not intended for you, please email the sender > > immediately and destroy the original message. > > > On Tue, Jun 28, 2011 at 10:17 AM, <[email protected] > > <mailto:[email protected]>> wrote: > > > Hi All, > > > I need to fix up some sloppy HTML which is (in some cases) missing > > quotes around the HTML attributes. > > > eg <img src=filename.jpg width=100 height=100> > > > Does anyone have a tested regex sitting in their collection for > > adding back in those missing quotes? > > > Thanks, > > > Harvey. > > > -- > > Harvey Kane > > > Phone: > > - Auckland: +64 9 950 4133 > > - Wanaka: +64 3 746 8133 > > - Mobile: +64 21 811 951 > > > Email: [email protected] <mailto:[email protected]> > > If you need to contact me urgently, please read my email policy > > www.ragepank.com/email/<http://www.ragepank.com/email/> > > > -- > > NZ PHP Users Group:http://groups.google.com/group/nzphpug > > To post, send email to [email protected] > > <mailto:[email protected]> > > To unsubscribe, send email to > > [email protected] > > <mailto:nzphpug%[email protected]> > > > -- > > NZ PHP Users Group:http://groups.google.com/group/nzphpug > > To post, send email to [email protected] > > To unsubscribe, send email to > > [email protected] > > -- > Harvey Kane > > Phone: > - Auckland: +64 9 950 4133 > - Wanaka: +64 3 746 8133 > - Mobile: +64 21 811 951 > > Email: [email protected] > If you need to contact me urgently, please read my email > policywww.ragepank.com/email/ -- NZ PHP Users Group: http://groups.google.com/group/nzphpug To post, send email to [email protected] To unsubscribe, send email to [email protected]
