[issue13358] HTMLParser incorrectly handles cdata elements.

2011-11-17 Thread Michael Brooks

Michael Brooks firealwayswo...@gmail.com added the comment:

Has anyone else been able to verify this?

On Mon, Nov 7, 2011 at 7:46 AM, Michael Brooks rep...@bugs.python.orgwrote:


 Michael Brooks firealwayswo...@gmail.com added the comment:

 This one should also have a priority change. Tested python 2.7.3

 --MIke

 On Sun, Nov 6, 2011 at 12:54 PM, Michael Brooks rep...@bugs.python.org
 wrote:

 
  Michael Brooks firealwayswo...@gmail.com added the comment:
 
  Yes I am running python 2.7.2.
 
  On Sun, Nov 6, 2011 at 12:52 PM, Ezio Melotti rep...@bugs.python.org
  wrote:
 
  
   Ezio Melotti ezio.melo...@gmail.com added the comment:
  
   Have you tried with the latest 2.7? (see msg147170)
  
   --
   nosy: +ezio.melotti
   stage:  - test needed
  
   ___
   Python tracker rep...@bugs.python.org
   http://bugs.python.org/issue13358
   ___
  
 
  --
 
  ___
  Python tracker rep...@bugs.python.org
  http://bugs.python.org/issue13358
  ___
 

 --

 ___
 Python tracker rep...@bugs.python.org
 http://bugs.python.org/issue13358
 ___


--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13358
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13358] HTMLParser incorrectly handles cdata elements.

2011-11-17 Thread Michael Brooks

Michael Brooks firealwayswo...@gmail.com added the comment:

Ok so until you fix this bug,  i'll be overriding HTMLParser with my fix,
becuase this is a blocking issue for my project.  My HTMLParser must behave
like a browser,  period end of story.

Thanks.

On Thu, Nov 17, 2011 at 9:24 AM, Ezio Melotti rep...@bugs.python.orgwrote:


 Ezio Melotti ezio.melo...@gmail.com added the comment:

 It seems to me that the arguments are parsed correctly, but handle_data is
 called multiple time between handle_starttag and handle_endtag.
 This might happen, e.g. in case the source lines are fed one by one to the
 parser, but in this case seems to happen whenever / is found.
 (The tests didn't detect this because they join the data to avoid buffer
 artifacts.)
 I'm not sure if this can be considered a bug, but the situation can indeed
 be improved.

 --

 ___
 Python tracker rep...@bugs.python.org
 http://bugs.python.org/issue13358
 ___


--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13358
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13358] HTMLParser incorrectly handles cdata elements.

2011-11-17 Thread Michael Brooks

Michael Brooks firealwayswo...@gmail.com added the comment:

Oah,  then there is a misunderstanding.  No browser will parse the html
that is declared within a javascript variable,  it must be treated as a
continues data segment (with cdata properties) until the exit
/\s*script\s* is encountered (and if this tag found anywhere,  even in a
quoted string it will still terminate this data segment,  because its a
cdata element).   The snip of html provided must only be a single data
segment. / alone is not a proper terminator.

Thu, Nov 17, 2011 at 11:17 AM, Ezio Melotti rep...@bugs.python.org wrote:


 Ezio Melotti ezio.melo...@gmail.com added the comment:

 It already behaves like a browser, it just gives you data in chunks
 instead of calling handle_data() only once at the end.  The documentation
 is not clear about this though.  It says that feed() can be called several
 times, but it doesn't say that handle_data() (and possibly other methods)
 might get called more than once.  This seems to always be the case while
 calling feed() several times.

 --

 ___
 Python tracker rep...@bugs.python.org
 http://bugs.python.org/issue13358
 ___


--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13358
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13358] HTMLParser incorrectly handles cdata elements.

2011-11-07 Thread Michael Brooks

Michael Brooks firealwayswo...@gmail.com added the comment:

This one should also have a priority change. Tested python 2.7.3

--MIke

On Sun, Nov 6, 2011 at 12:54 PM, Michael Brooks rep...@bugs.python.orgwrote:


 Michael Brooks firealwayswo...@gmail.com added the comment:

 Yes I am running python 2.7.2.

 On Sun, Nov 6, 2011 at 12:52 PM, Ezio Melotti rep...@bugs.python.org
 wrote:

 
  Ezio Melotti ezio.melo...@gmail.com added the comment:
 
  Have you tried with the latest 2.7? (see msg147170)
 
  --
  nosy: +ezio.melotti
  stage:  - test needed
 
  ___
  Python tracker rep...@bugs.python.org
  http://bugs.python.org/issue13358
  ___
 

 --

 ___
 Python tracker rep...@bugs.python.org
 http://bugs.python.org/issue13358
 ___


--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13358
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13357] HTMLParser parses attributes incorrectly.

2011-11-06 Thread Michael Brooks

New submission from Michael Brooks firealwayswo...@gmail.com:

Open the attached file red_test.html in a browser.  The bad elements are 
blue because the style tag isn't parsed by any known browser.   However,  the 
HTMLParser library will incorrectly recognize them.

--
components: Library (Lib)
files: red_test.html
messages: 147169
nosy: Michael.Brooks
priority: normal
severity: normal
status: open
title: HTMLParser parses attributes incorrectly.
type: behavior
versions: Python 2.7
Added file: http://bugs.python.org/file23618/red_test.html

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13357
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13358] HTMLParser incorrectly handles cdata elements.

2011-11-06 Thread Michael Brooks

New submission from Michael Brooks firealwayswo...@gmail.com:

The HTML tag at the bottom of this page correctly identified has having cdata 
like properties and trigger set_cdata_mode().  Due to the cdata properties of 
this tag,  the only way to end the data segment is with a closing /script 
tag, NO OTHER tag can close this data segment.  Currently in cdata mode the 
HTMLParser will use this regular expression to close this script tag: 
re.compile(r'(/|\Z)'),  however this script tag is setting a variable with 
data that contains /b which will terminate this script tag prematurely.

I have written and tested the following patch on my system:
#used to terminate cdata elements
endtagfind_script = re.compile('(?i)/\s*script\s*')
endtagfind_style = re.compile('(?i)/\s*style\s*')

class html_patch(HTMLParser.HTMLParser):
# Internal -- sets the proper tag terminator based on cdata element type
def set_cdata_mode(self, tag):
#We check if the script is either a style or a script
#based on self.CDATA_CONTENT_ELEMENTS
if tag==style:
self.interesting = endtagfind_style
elif tag==script:
self.interesting = endtagfind_script
else:
self.error(Unknown cdata type:+tag) # should never happen
self.cdata_tag = tag 


This cdata tag isn't parsed properly by HTMLParser,  but it works fine in a 
browser:
script
pwa.setup(
pwa.searchview,
'lhid_searchheader',
'lhid_content',
'lhid_trayhandle',
'lhid_tray',
{'query': 'test',
'tagQuery': '',
'searchScope': '',
'owner': '',
'doCrowding': false,
'isOwner': false,
'albumId': ''
,'experimentalsearchquality': true},
'firealwaysworks'
,
{feedUrl: 
'https://picasaweb.google.com/data/feed/tiny/all?alt=jsonmamp;kind=photoamp;access=publicamp;filter=1amp;q=test',
feedPreload: null},
{NEW_HOMEPAGE:1,NEW_ONE_BAR:1,fr:1,tags:1,search:1,globalsearch:1,globalsearchpromo:1,newfeatureslink:1,cart:1,contentcaching:1,developerlink:1,payments:1,newStrings:1,cccquota:1,signups:1,flashSlideshow:1,URL_SHORTENER_VISIBILITY:1,emailupload:1,photopickeralbumview:1,PWA_NEWUI:1,WILDCARD_QUERY_FEED:1,recentphotos:1,editinpicasa:1,imagesearch:1,froptin:1,FR_CONTINUOUS_CLUSTERING:1,asyncUploads:1,PERFORMANCE_EXPERIMENTS:1,BAKED_PRELOAD_FEEDS:1,albumviewlimit:1,HQ_VIDEOS:1,VIDEO_INFO_DISPLAY:1,CSI:1,EXPERIMENTAL_SEARCH_QUALITY:1,COMMENT_TRANSLATION:1,NEW_COMMENT_STYLE:1,ENABLE_NEW_FLAG_ABUSE_FORM:1,QRCODE:1,CHINA:1,GWS_URL_REDIRECTION:1,FEATURED_PHOTOS:1,COMMENT_SUBSCRIPTION:1,COMMENT_SUBSCRIPTION_SETTING:1,PICASA_MAC:1,AD_ON_SEARCHPAGE:1,API_AUTO_ACCOUNTS:1,FOCUS_GROUP_ACL:1,PHOTOSTREAM:1,BACKEND_ACL:1,ADVANCED_SEARCH:1,FACE_SEARCH:1,CAMERA_SEARCH:1,NOTIFICATION:1,PIXELATED_PREVIEW:1,TRANSPARENT_PIXELATED_PREVIEW:1,NEW_SETTINGS_PAGE:1,VIEW_STARRERS:1,FR_FOCUS_MERGE:1,AD_ON_SEARCH_ONE
 
UP:1,GALLERY_COMMENTS:1,COMMENT_ABUSE_BLOCKING:1,FAVORITE_NOTIFICATION:1,IMAGE_ONLY_LINK:1,RECENT_PHOTOS_SLIDESHOW:1,HEART:1,SMALLER_IMAGE:1,FAST_SLIDESHOW:1,VIEW_CONTACTS:1,COLLABORATIVE_ALBUMS:1,PRINT_MARKETPLACE:1,PRINT_MARKETPLACE_REPLACEMENT:1,VIEW_COUNT:1,POST_TO:1,GAPLUS:1,PICASA_PROMO:1,DOUBLECLICK_PREMIUM_ADS:1,DOUBLECLICK_EXPLORE_MAIN:1,DOUBLECLICK_MYPHOTOS:1,DOUBLECLICK_PUBLIC_GALLERY:1,DOUBLECLICK_USER_ALBUM:1,DOUBLECLICK_USER_PHOTO:1,DOUBLECLICK_VISITOR_ADS:1,PRODUCTION:1,NOSCRIPT:1,UNLISTED_GALLERY:1,GA_TRACKING:1,UNLIMITED_GALLERY:1,PICNIK_EDIT:1,MICROSCOPE_ZOOM:1,FR_V2:1,FAVORITE_SUGGESTION:1,FAVORITE_UPDATE:1,MERGED_PROFILES_SOFTLAUNCH:1,MERGED_PROFILES:1,MERGED_PROFILES_ASYNC:1,NEW_FR_UI:1,GAPLUS_UNMERGED_SOCIALIZATION:1,OPTOUT_ACL_NOTIFICATION:1,HTTPS_VISIBILITY:1,DEFAULT_HTTPS:1,EXTENDED_EXIF:1,DOUBLECLICK_MULTISLOT:1,ONEPICK:1,PER_ALBUM_GEO_VISIBILITY:1,FOCUS_MERGE_LINK_DIALOG_VISIBILITY:1,SHAREBOX_VISIBILITY:1,AUTO_DOWNSIZE:1,BULK_ALBUM_EDITOR_VISIBILITY:1,PROF
 ILE_NAME_CHECK:1,COLLABORATIVE_NAMETAGS:1,NOT_FOUND_404:1,REDIRECT_TO_PLUS:1},
{
'gdataVersion': '4.0',
'updateCartPath': '\x2Flh\x2FupdateCart?rtok=b8S9ibYqrTMF',
'editCaptionsPath': '',
'albumMapPath': '',
'albumKmlUrl': '',
'selectedPhotosPath': 
'\x2Flh\x2FselectedPhotos?tok=QUI1UGxRYk9fNmw1Q2tVeS1DWnY3UlFoTTY1RzRNNWphdzoxMzIwNjAyMzA3NDYx',
'setLicensePath': '',
'setStarPath': 
'\x2Flh\x2FsetStar?tok=QUI1UGxRWW4zY1ZKb3U0TzROZU5tUHhIV3hhRW9HcUYwQToxMzIwNjAyMzA3NDYx',
'peopleManagerPath': '',
'peopleSearchPath': '',
'clusterViewPath': '',
'frOptStatus': 'OptedIn',
'isNameTagsVisible': '','authUserIsPhotosUser': true,
'authUserNickname': 'Some Nickname',
'authUserPortraitUrl': 
'https:\x2F\x2Flh4.googleusercontent.com\x2F-UI9ZfIFfyQI\x2FAAI\x2FAAA\x2Fm0enLvZXYbI\x2Fs32-c\x2Ffirealwaysworks.jpg',
'authUserProfileUrl':'https:\x2F\x2Fprofiles.google.com\x2F115162402406836485912',
 
'authUser':{name:'firealwaysworks',isProfileUser:1,isLoggedIn:1,user:1,isOwner:1
,'showGeo': 0
},
'foreignNickname': '',
'subjects': [
]
,
'owner': {name:'firealwaysworks',nickname:'Michael 
Brooks',portrait:'https:\x2F\x2Flh4.googleusercontent.com\x2F-UI9ZfIFfyQI

[issue13358] HTMLParser incorrectly handles cdata elements.

2011-11-06 Thread Michael Brooks

Changes by Michael Brooks firealwayswo...@gmail.com:


--
type:  - behavior

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13358
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13357] HTMLParser parses attributes incorrectly.

2011-11-06 Thread Michael Brooks

Michael Brooks firealwayswo...@gmail.com added the comment:

Yes, I am running the latest version,  which is python 2.7.2.

On Sun, Nov 6, 2011 at 12:14 PM, Ezio Melotti rep...@bugs.python.orgwrote:


 Ezio Melotti ezio.melo...@gmail.com added the comment:

 Thanks for the report.
 Could you try with the latest 2.7 and see if you can reproduce the
 problem? (see the devguide for instructions.)

 If you can reproduce the issue even on the latest 2.7, it would be great
 if you could provide a patch with a test case like the ones in
 Lib/test/test_htmlparser.py.

 --
 nosy: +ezio.melotti
 stage:  - test needed

 ___
 Python tracker rep...@bugs.python.org
 http://bugs.python.org/issue13357
 ___


--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13357
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13358] HTMLParser incorrectly handles cdata elements.

2011-11-06 Thread Michael Brooks

Michael Brooks firealwayswo...@gmail.com added the comment:

Yes I am running python 2.7.2.

On Sun, Nov 6, 2011 at 12:52 PM, Ezio Melotti rep...@bugs.python.orgwrote:


 Ezio Melotti ezio.melo...@gmail.com added the comment:

 Have you tried with the latest 2.7? (see msg147170)

 --
 nosy: +ezio.melotti
 stage:  - test needed

 ___
 Python tracker rep...@bugs.python.org
 http://bugs.python.org/issue13358
 ___


--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13358
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13357] HTMLParser parses attributes incorrectly.

2011-11-06 Thread Michael Brooks

Michael Brooks firealwayswo...@gmail.com added the comment:

Python 2.7.3 is still affected by both of these issues.

On Sun, Nov 6, 2011 at 12:56 PM, Ezio Melotti rep...@bugs.python.orgwrote:


 Ezio Melotti ezio.melo...@gmail.com added the comment:

 I mean 2.7.3 (i.e. the development version).
 You need to get a clone of Python as explained here:
 http://docs.python.org/devguide/

 --

 ___
 Python tracker rep...@bugs.python.org
 http://bugs.python.org/issue13357
 ___


--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13357
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10599] sgmllib.parse_endtag() is not respecting quoted text

2010-12-01 Thread Michael Brooks

New submission from Michael Brooks firealwayswo...@gmail.com:

In the attached example is a very simple usage of sgmllib that is trying to 
parse:
input value=a href=http://buglink/a

The bug is that sgmllib is parsing this href.  Browsers on the other hand see 
this as the input's value.  

Also keep in mind that escaping of quote marks in HTML is not like python.  \ 
is not a character literal   thus input value=\a 
href=http://buglink/a is still quoted text and the href should not be 
parsed. 

Thank you

--
components: None
files: sgmllib_bug.py
messages: 123016
nosy: Michael.Brooks
priority: normal
severity: normal
status: open
title: sgmllib.parse_endtag() is not respecting quoted text
type: behavior
versions: Python 2.6
Added file: http://bugs.python.org/file19895/sgmllib_bug.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10599
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10599] sgmllib.parse_endtag() is not respecting quoted text

2010-12-01 Thread Michael Brooks

Michael Brooks firealwayswo...@gmail.com added the comment:

Oops, I had a misnomer in my bug report. 
input value=\a href=http://buglink/a is not escaped and there for the 
href should be parsed in this condition but not parsed in the attached 
sgmllib_bug.py.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10599
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com