RE: regex help for grabbing values of html tag attributes

2005-03-23 Thread Pascal Peters
Google doesn't put quotes around most attributes. The following works
(takes single or double quotes or even no quotes into consideration).
Watch out for wrapping in the regular expressions. It allows you to find
the value of 1 attribute in one or more tags (see examples).

cfscript
function GetAttributeValue(str,tag,attr){
var regexp =
(#tag#)\s[^]*#attr#=('.*?'|.*?|[^\s]+)[^]*;
var aReturn = ArrayNew(1);
var start = 1;
var stTmp = StructNew();

while(true){
stTmp = REFindNoCase(regexp,str,start,true);
if(stTmp.pos[1] IS 0) break;

ArrayAppend(aReturn,REReplace(Mid(str,stTmp.pos[3],stTmp.len[3]),^[']
(.*)[']$,\1));
start = stTmp.pos[1] + stTmp.len[1];
}

return aReturn;
}
/cfscript
cfhttp url=http://www.google.com/; throwonerror=yes/cfhttp
cfoutput#HTMLCodeFormat(cfhttp.filecontent)#/cfoutput
cfdump var=#GetAttributeValue(cfhttp.filecontent,'a','href')#
cfdump var=#GetAttributeValue(cfhttp.filecontent,'img','src')#
cfdump var=#GetAttributeValue(cfhttp.filecontent,'a|td','class')#

Pascal

 -Original Message-
 From: Burns, John D [mailto:[EMAIL PROTECTED]
 Sent: 22 March 2005 22:59
 To: CF-Talk
 Subject: RE: regex help for grabbing values of html tag attributes
 
 Ben,
 
 I can see what you've got (I think) and it makes sense, but for some
 reason, it's not working.  I'm grabbing the html from www.google.com
and
 running it on it and this is what I've got in my code:
 
 #refindnocase('img.*?src=(.*?).*?',cfhttp.fileContent,0,true)#
 
 I'm using cfdump to display that info and what I see are 2 arrays
(len
 and pos) and both have values of 1 and 0.  I thought that if the first
 value was 1, the second value would be the position of the occurrence
of
 the search string.  I know google has an image, and I'm displaying the
 cfhttp.filecontent in a textarea above so that I can ensure the
results
 are coming back as expected.  Any ideas?  Am I doing something wrong?
 
 
 John Burns
 Certified Advanced ColdFusion MX Developer
 Wyle Laboratories, Inc. | Web Developer
 
 
 -Original Message-
 From: Ben Doom [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, March 22, 2005 4:54 PM
 To: CF-Talk
 Subject: Re: regex help for grabbing values of html tag attributes
 
 Well, I see a couple of problems with what you're using.  First,
you've
 not got a closing  on the attribute.  Second, you've wrapped a regex
 that contains a  in 's, which will error out if you don't escape
the
 inner 's.  You can wrap it with single quotes to fix that.  Also, the
 last * boggles me.  I don't know why it's there.
 
 Or, try this:
 
 '#tag#.*?#att#=(.*?).*?'
 
 where (should be obvious) tag and att are defined as the tag and
 attribute you want.  Please note that if you define them as span and
 class and you have this:
 spanstuff in betweenspan class=bob the whole tag match will
 return both span tags and the stuff in between.  The attribute match
 will return bob.  So, if this might be the case, lemme know and we'll
 tweak the regex.
 
 Not tested, your miles may vary, trix are for kids, etc.
 
 --Ben
 
 Burns, John D wrote:
  6.1.  I was looking at the archives and have come up with this but
  it's erroring
 
  I'm using the img instance because it's easier to test on pages that
  have multiple images...
 
  #refindnocase(img[^]*src=([^]*)*,cfhttp.fileContent,0,true)#
 
 
 
 
 

~|
Find out how CFTicket can increase your company's customer support 
efficiency by 100%
http://www.houseoffusion.com/banners/view.cfm?bannerid=49

Message: http://www.houseoffusion.com/lists.cfm/link=i:4:199743
Archives: http://www.houseoffusion.com/cf_lists/threads.cfm/4
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4
Donations  Support: http://www.houseoffusion.com/tiny.cfm/54


RE: regex help for grabbing values of html tag attributes

2005-03-23 Thread Pascal Peters
Google doesn't put quotes around most attributes. The following works
(takes single or double quotes or even no quotes into consideration).
Watch out for wrapping in the regular expressions. It allows you to find
the value of 1 attribute in one or more tags (see examples).

cfscript
function GetAttributeValue(str,tag,attr){
var regexp =
(#tag#)\s[^]*#attr#=('.*?'|.*?|[^\s]+)[^]*;
var aReturn = ArrayNew(1);
var start = 1;
var stTmp = StructNew();

while(true){
stTmp = REFindNoCase(regexp,str,start,true);
if(stTmp.pos[1] IS 0) break;

ArrayAppend(aReturn,REReplace(Mid(str,stTmp.pos[3],stTmp.len[3]),^[']
(.*)[']$,\1));
start = stTmp.pos[1] + stTmp.len[1];
}

return aReturn;
}
/cfscript
cfhttp url=http://www.google.com/; throwonerror=yes/cfhttp
cfoutput#HTMLCodeFormat(cfhttp.filecontent)#/cfoutput
cfdump var=#GetAttributeValue(cfhttp.filecontent,'a','href')#
cfdump var=#GetAttributeValue(cfhttp.filecontent,'img','src')#
cfdump var=#GetAttributeValue(cfhttp.filecontent,'a|td','class')#

Pascal

 -Original Message-
 From: Burns, John D [mailto:[EMAIL PROTECTED]
 Sent: 22 March 2005 22:59
 To: CF-Talk
 Subject: RE: regex help for grabbing values of html tag attributes
 
 Ben,
 
 I can see what you've got (I think) and it makes sense, but for some
 reason, it's not working.  I'm grabbing the html from www.google.com
and
 running it on it and this is what I've got in my code:
 
 #refindnocase('img.*?src=(.*?).*?',cfhttp.fileContent,0,true)#
 
 I'm using cfdump to display that info and what I see are 2 arrays
(len
 and pos) and both have values of 1 and 0.  I thought that if the first
 value was 1, the second value would be the position of the occurrence
of
 the search string.  I know google has an image, and I'm displaying the
 cfhttp.filecontent in a textarea above so that I can ensure the
results
 are coming back as expected.  Any ideas?  Am I doing something wrong?
 

~|
Logware (www.logware.us): a new and convenient web-based time tracking 
application. Start tracking and documenting hours spent on a project or with a 
client with Logware today. Try it for free with a 15 day trial account.
http://www.houseoffusion.com/banners/view.cfm?bannerid=67

Message: http://www.houseoffusion.com/lists.cfm/link=i:4:199744
Archives: http://www.houseoffusion.com/cf_lists/threads.cfm/4
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
Donations  Support: http://www.houseoffusion.com/tiny.cfm/54


Re: regex help for grabbing values of html tag attributes

2005-03-22 Thread Ben Doom
What version of CF?

--Ben

Burns, John D wrote:
 Does anyone have a regex already written (or would any of you regex
 gurus like you put something up) that could take the source code of an
 HTML file and grab the value of an attribute given the tag and the
 attribute that would be grabbed.  For instance, if I wanted to get the
 value of any classes used on a span tag, it would search for span tags
 and search for a class attribute and return the value within the quotes
 after class=.  Or, for images, it would search for the img tag and
 find the src attribute and return the url listed in there?  I have
 tried a few things but haven't had a whole lot of luck.  Any help would
 be great.  Thanks!
  
 John Burns
 Certified Advanced ColdFusion MX Developer
 Wyle Laboratories, Inc. | Web Developer
  
  
 
 
 

~|
Find out how CFTicket can increase your company's customer support 
efficiency by 100%
http://www.houseoffusion.com/banners/view.cfm?bannerid=49

Message: http://www.houseoffusion.com/lists.cfm/link=i:4:199700
Archives: http://www.houseoffusion.com/cf_lists/threads.cfm/4
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4
Donations  Support: http://www.houseoffusion.com/tiny.cfm/54


RE: regex help for grabbing values of html tag attributes

2005-03-22 Thread Burns, John D
6.1.  I was looking at the archives and have come up with this but it's
erroring

I'm using the img instance because it's easier to test on pages that
have multiple images...

#refindnocase(img[^]*src=([^]*)*,cfhttp.fileContent,0,true)# 


John Burns
Certified Advanced ColdFusion MX Developer
Wyle Laboratories, Inc. | Web Developer
 

-Original Message-
From: Ben Doom [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, March 22, 2005 4:14 PM
To: CF-Talk
Subject: Re: regex help for grabbing values of html tag attributes

What version of CF?

--Ben

Burns, John D wrote:
 Does anyone have a regex already written (or would any of you regex 
 gurus like you put something up) that could take the source code of an

 HTML file and grab the value of an attribute given the tag and the 
 attribute that would be grabbed.  For instance, if I wanted to get the

 value of any classes used on a span tag, it would search for span tags

 and search for a class attribute and return the value within the 
 quotes after class=.  Or, for images, it would search for the img 
 tag and find the src attribute and return the url listed in there?  
 I have tried a few things but haven't had a whole lot of luck.  Any 
 help would be great.  Thanks!
  
 John Burns
 Certified Advanced ColdFusion MX Developer Wyle Laboratories, Inc. | 
 Web Developer
  
  
 
 
 



~|
Logware (www.logware.us): a new and convenient web-based time tracking 
application. Start tracking and documenting hours spent on a project or with a 
client with Logware today. Try it for free with a 15 day trial account.
http://www.houseoffusion.com/banners/view.cfm?bannerid=67

Message: http://www.houseoffusion.com/lists.cfm/link=i:4:199703
Archives: http://www.houseoffusion.com/cf_lists/threads.cfm/4
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
Donations  Support: http://www.houseoffusion.com/tiny.cfm/54


Re: regex help for grabbing values of html tag attributes

2005-03-22 Thread Ben Doom
Well, I see a couple of problems with what you're using.  First, you've 
not got a closing  on the attribute.  Second, you've wrapped a regex 
that contains a  in 's, which will error out if you don't escape the 
inner 's.  You can wrap it with single quotes to fix that.  Also, the 
last * boggles me.  I don't know why it's there.

Or, try this:

'#tag#.*?#att#=(.*?).*?'

where (should be obvious) tag and att are defined as the tag and 
attribute you want.  Please note that if you define them as span and 
class and you have this:
spanstuff in betweenspan class=bob
the whole tag match will return both span tags and the stuff in 
between.  The attribute match will return bob.  So, if this might be the 
case, lemme know and we'll tweak the regex.

Not tested, your miles may vary, trix are for kids, etc.

--Ben

Burns, John D wrote:
 6.1.  I was looking at the archives and have come up with this but it's
 erroring
 
 I'm using the img instance because it's easier to test on pages that
 have multiple images...
 
 #refindnocase(img[^]*src=([^]*)*,cfhttp.fileContent,0,true)# 


~|
Logware (www.logware.us): a new and convenient web-based time tracking 
application. Start tracking and documenting hours spent on a project or with a 
client with Logware today. Try it for free with a 15 day trial account.
http://www.houseoffusion.com/banners/view.cfm?bannerid=67

Message: http://www.houseoffusion.com/lists.cfm/link=i:4:199710
Archives: http://www.houseoffusion.com/cf_lists/threads.cfm/4
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4
Donations  Support: http://www.houseoffusion.com/tiny.cfm/54


RE: regex help for grabbing values of html tag attributes

2005-03-22 Thread Burns, John D
Ben,

I can see what you've got (I think) and it makes sense, but for some
reason, it's not working.  I'm grabbing the html from www.google.com and
running it on it and this is what I've got in my code:

#refindnocase('img.*?src=(.*?).*?',cfhttp.fileContent,0,true)#

I'm using cfdump to display that info and what I see are 2 arrays (len
and pos) and both have values of 1 and 0.  I thought that if the first
value was 1, the second value would be the position of the occurrence of
the search string.  I know google has an image, and I'm displaying the
cfhttp.filecontent in a textarea above so that I can ensure the results
are coming back as expected.  Any ideas?  Am I doing something wrong?


John Burns
Certified Advanced ColdFusion MX Developer
Wyle Laboratories, Inc. | Web Developer
 

-Original Message-
From: Ben Doom [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, March 22, 2005 4:54 PM
To: CF-Talk
Subject: Re: regex help for grabbing values of html tag attributes

Well, I see a couple of problems with what you're using.  First, you've
not got a closing  on the attribute.  Second, you've wrapped a regex
that contains a  in 's, which will error out if you don't escape the
inner 's.  You can wrap it with single quotes to fix that.  Also, the
last * boggles me.  I don't know why it's there.

Or, try this:

'#tag#.*?#att#=(.*?).*?'

where (should be obvious) tag and att are defined as the tag and
attribute you want.  Please note that if you define them as span and
class and you have this:
spanstuff in betweenspan class=bob the whole tag match will
return both span tags and the stuff in between.  The attribute match
will return bob.  So, if this might be the case, lemme know and we'll
tweak the regex.

Not tested, your miles may vary, trix are for kids, etc.

--Ben

Burns, John D wrote:
 6.1.  I was looking at the archives and have come up with this but 
 it's erroring
 
 I'm using the img instance because it's easier to test on pages that 
 have multiple images...
 
 #refindnocase(img[^]*src=([^]*)*,cfhttp.fileContent,0,true)#




~|
Logware (www.logware.us): a new and convenient web-based time tracking 
application. Start tracking and documenting hours spent on a project or with a 
client with Logware today. Try it for free with a 15 day trial account.
http://www.houseoffusion.com/banners/view.cfm?bannerid=67

Message: http://www.houseoffusion.com/lists.cfm/link=i:4:199716
Archives: http://www.houseoffusion.com/cf_lists/threads.cfm/4
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4
Donations  Support: http://www.houseoffusion.com/tiny.cfm/54


Re: regex help for grabbing values of html tag attributes

2005-03-22 Thread Ben Doom
Try
refindnocase('img.*?src=(.*?).*?',cfhttp.fileContent,1,'true')

I think the 0 and the non-quoted true are confusing it.  Just a guess, 
though.  Also, have you verified the contents of cfhttp.filecontent?

--Ben

Burns, John D wrote:
 Ben,
 
 I can see what you've got (I think) and it makes sense, but for some
 reason, it's not working.  I'm grabbing the html from www.google.com and
 running it on it and this is what I've got in my code:
 
 #refindnocase('img.*?src=(.*?).*?',cfhttp.fileContent,0,true)#
 
 I'm using cfdump to display that info and what I see are 2 arrays (len
 and pos) and both have values of 1 and 0.  I thought that if the first
 value was 1, the second value would be the position of the occurrence of
 the search string.  I know google has an image, and I'm displaying the
 cfhttp.filecontent in a textarea above so that I can ensure the results
 are coming back as expected.  Any ideas?  Am I doing something wrong?


~|
Logware (www.logware.us): a new and convenient web-based time tracking 
application. Start tracking and documenting hours spent on a project or with a 
client with Logware today. Try it for free with a 15 day trial account.
http://www.houseoffusion.com/banners/view.cfm?bannerid=67

Message: http://www.houseoffusion.com/lists.cfm/link=i:4:199718
Archives: http://www.houseoffusion.com/cf_lists/threads.cfm/4
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4
Donations  Support: http://www.houseoffusion.com/tiny.cfm/54


Re: regex help for grabbing values of html tag attributes

2005-03-22 Thread Claude Schneegans
What you're trying to do is far from being trivial, however, I'm pretty 
sure that
CF_REextract should help you a lot. See the link below.

-- 
___
REUSE CODE! Use custom tags;
See http://www.contentbox.com/claude/customtags/tagstore.cfm
(Please send any spam to this address: [EMAIL PROTECTED])
Thanks.


~|
Logware (www.logware.us): a new and convenient web-based time tracking 
application. Start tracking and documenting hours spent on a project or with a 
client with Logware today. Try it for free with a 15 day trial account.
http://www.houseoffusion.com/banners/view.cfm?bannerid=67

Message: http://www.houseoffusion.com/lists.cfm/link=i:4:199723
Archives: http://www.houseoffusion.com/cf_lists/threads.cfm/4
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4
Donations  Support: http://www.houseoffusion.com/tiny.cfm/54