Hi,
There are a few reasons why I like the idea of regex in this instance.
The main one being that I'm not actually looking to do a full clean-up
of the HTML - just fix this one specific problem which is breaking other
post-processing regexes that assume attributes are quoted. The offending
HTML is being generated by a WYSIWYG, so any attempts to clean the HTML
externally and reimport are going to be thwarted next time the user
saves their page. Which is also why I prefer the post-processing regex
approach in this case.
But if I was trying to fix/validate a complete document, I definitely
understand why a parser-based approach like Tidy is a better idea than
regexing.
Thanks,
Harvey.
On 28/06/2011 11:39 p.m., .Net2Php wrote:
Hi Harvey,
Regex is probably not the best thing to use to fix HTML. HTML Tidy
will probably be a better solution.
Looking at your regex, a few comments:
- Do you really need to use \s (which will match a space, tab,
carriage return, new line) or will a space suffice?
- The pattern in the capturing parentheses probably could be
simplified to something like: .*?
-- NOTE: you would wrap that pattern in capturing parentheses and put
a trailing space after the closing parenthesis
Hard to do regex here, but maybe something like this (untested):
src *= *(.*?)
NOTE: there is a trailing space in the regex. The replacement string
would be something like this (untested again):
"$1"
Hope this helps.
On Jun 28, 5:50 pm, [email protected] wrote:
Thanks for the replies everyone. My mail is with Webdrive so I lost
email shortly after posting this request, so I couldn't check replies or
reply myself any sooner. I managed to find my own solution in the meantime.
In this case, I only really cared about missing src attributes in img
tags, so this is what I came up with.
src\s*=\s*([/a-zA-z0-9].*?)(>|( [a-z]+)=)
Which needs to be run at least twice to clean all attributes in a tag.
Thanks,
Harvey.
On 28/06/2011 10:24 a.m., Matthew Whyte wrote:
Hi Harvey,
I don't have a regex handy, but from memory the last time I needed to
do something similar I used the "clean up HTML" option in Dreamweaver,
which did the trick. (I don't use Dreamweaver for anything else, I've
only got it because it came part of the Adobe Suite!)
Cheers,
Matthew Whyte
Managing Director | digiCreative
T
+64 7 959 8230
F
+64 7 974 9059
E
[email protected]<mailto:[email protected]>
W
digicreative.co.nz<http://digicreative.co.nz/>
digiCreative
5 King St | PO Box 19492, Hamilton, New Zealand
------------------------------------------------------------------------
The content of this email is confidential and may be legally
privileged. If it is not intended for you, please email the sender
immediately and destroy the original message.
On Tue, Jun 28, 2011 at 10:17 AM,<[email protected]
<mailto:[email protected]>> wrote:
Hi All,
I need to fix up some sloppy HTML which is (in some cases) missing
quotes around the HTML attributes.
eg<img src=filename.jpg width=100 height=100>
Does anyone have a tested regex sitting in their collection for
adding back in those missing quotes?
Thanks,
Harvey.
--
Harvey Kane
Phone:
- Auckland: +64 9 950 4133
- Wanaka: +64 3 746 8133
- Mobile: +64 21 811 951
Email: [email protected]<mailto:[email protected]>
If you need to contact me urgently, please read my email policy
www.ragepank.com/email/<http://www.ragepank.com/email/>
--
NZ PHP Users Group:http://groups.google.com/group/nzphpug
To post, send email to [email protected]
<mailto:[email protected]>
To unsubscribe, send email to
[email protected]
<mailto:nzphpug%[email protected]>
--
NZ PHP Users Group:http://groups.google.com/group/nzphpug
To post, send email to [email protected]
To unsubscribe, send email to
[email protected]
--
Harvey Kane
Phone:
- Auckland: +64 9 950 4133
- Wanaka: +64 3 746 8133
- Mobile: +64 21 811 951
Email: [email protected]
If you need to contact me urgently, please read my email
policywww.ragepank.com/email/
--
Harvey Kane
Phone:
- Auckland: +64 9 950 4133
- Wanaka: +64 3 746 8133
- Mobile: +64 21 811 951
Email: [email protected]
If you need to contact me urgently, please read my email policy
www.ragepank.com/email/
--
NZ PHP Users Group: http://groups.google.com/group/nzphpug
To post, send email to [email protected]
To unsubscribe, send email to
[email protected]