This is an automated email from the git hooks/post-receive script.

kanashiro-guest pushed a change to annotated tag upstream/3.72
in repository libhtml-parser-perl.

        at  acbad4a   (tag)
   tagging  295ddd70d43196c874abf14ac23f55e7c406b85a (commit)
  replaces  upstream/3.71
 tagged by  Lucas Kanashiro
        on  Wed Jan 20 23:04:34 2016 -0200

- Log -----------------------------------------------------------------
Upstream version 3.72
Version: GnuPG v1


Antonio Radici (1):
      Reference HTML::LinkExttor [RT#43164]

Barbie (1):
      fix to TokeParser to correctly handle option configuration

Chip Salzenberg (1):
      Avoid crash (referenced pend_text instead of skipped_text)

Damyan Ivanov (1):
      Short description of the htextsub example

David Steinbrunner (2):
      typo fix
      typo fixes

François Perrad (1):
      Fix for cross-compiling with Buildroot

Gisle Aas (55):
      Start using GIT to track the sources.
      Patch by CHORNY that provide compatibility with older perls.
      Recognize the </script> and </style> end tags even if quoted.
      Parse the <iframe> content in literal/CDATA mode.
      Release 3.57
      Recognize the Unicode BOM in utf8_mode as well [RT#27522]
      Avoid ending up with '/' keys attribute in Link headers.
      Suppress "Parsing of undecoded UTF-8 will give garbage" warning with 
attr_encoded [RT#29089]
      Don't hardcode source line numbers [RT#38114]
      Release 3.58
      Restore perl-5.6 compatiblity for HTML::HeadParser
      Tell git to ignore the dist tarballs
      Update for GIT and other tweaks.
      More meta info
      Release 3.59
      Release 3.60.
      Test that triggers the crash that Chip fixed
      Complete documented list of literal tags
      Release 3.61
      Avoid "my" variable $p masks earlier declaration warning from test
      Doc patch: Make it clearer what the return value from ->parse is
      Update TODO list
      Release 3.62
      Take more care to prepare the char range for encode_entities [RT#50170]
      decode_entities confused by trailing incomplete entity
      Release 3.63
      Convert files to UTF-8
      Don't allow decode_entities() to generate illegal Unicode chars
      Copyright 2009
      Remove rendundant (repeated) test
      Make parse_file() method use 3-arg open [RT#49434]
      Release 3.64
      Eliminate buggy entities_decode_old
      Release 3.65
      Fix entity decoding in utf8_mode for the title header
      Release 3.66
      chmod +x [RT#58016]
      Release 3.67
      Declare the encoding of the POD to be utf8
      Release 3.68
      Documentation fix; encode_utf8 mixup [RT#71151]
      Make it clearer that there are 2 (actually 3) options for handing "UTF-8 
      Github is the official repo
      Can't be bothered to try to fix the failures that occur on perl-5.6
      Release 3.69
      Comment typo fix
      Release 3.70
      Transform ':' in headers to '-' [RT#80524]
      Release 3.71
      Merge branch 'master' of
      Avoid more clang casting warnings
      Remove trailing whitespace
      Ensure entities expand to utf8 sequences under 'utf8_mode' [RT#99755]
      Copyright 2016
      Release 3.72

Jacques Germishuys (1):
      Silence clang warning

Jon Jensen (1):
      Aesthetic change: remove extra ;

Lucas Kanashiro (1):
      Imported Upstream version 3.72

Mike South (1):
      Suppress warning when encode_entities is called with undef [RT#27567]

Nicholas Clark (1):
      bleadperl 2154eca7 breaks HTML::Parser 3.66 [RT#60368]

Salvatore Bonaccorso (1):
      Fixed endianness typo [RT#50811]

Ville Skyttä (12):
      Spelling fixes.
      Test multi-value headers.
      Documentation improvements.
      Do not terminate head parsing on the <object> element (added in HTML 4.0).
      Add support for HTML 5 <meta charset> and new HEAD elements.
      HTTP::Header doc typo fix.
      Do not bother tracking style or script, they're ignored.
      Bring HTML 5 head elements up to date with WD-html5-20090423.
      Improve HeadParser performance.
      Documentation fixes.
      Trim surrounding whitespace from extracted URLs.
      Merge pull request #6 from dsteinbrunner/patch-1

Yves Orton (1):
      Fix Issue #3 / RT #84144: HTML::Entities::decode_entities() needs to call 
SV_CHECK_THINKFIRST() before checking READONLY flag

Zefram (1):
      HTML::Parser doesn't compile with perl 5.8.0.

aas (110):
      First revsion.
      Fake-compile regexps using anonymous subs.  More documentation.
      Removed trailing whitespace and unexpanded the text (replaced initial 
space with
      Fixed copyright message.
      Moved from ../base
      Avoid quotes in hash key.
      First revision.
      Added test based on RFC1866
      Included additional ISO-8859/1 entities listed in rfc1866 (section 14).
      Typo fix by Bob Dalgleish <>
      First version.  Posted on the mailing list 1996-07-08.
      Clear links when calling parse_file().
      Parse <link> attributes in head.  Renamed header Base: to Content-Base:
      Slightly better documentation.
      Renamed Base: to Content-Base: and Implemented Link:
      First revision.
      Got Ambiguous use of {links} resolved to {"links"}
      Added support for <embed src="..."> as suggested by Hans de Graaff
      Added <frame src="..."> to the things recognized
      Added an example to the documentation.
      Added test to check that the links method work when there are no links in 
the parsed document.
      Avoid 'Can't use an undefined value as an ARRAY reference message' when 
no links are found in the document.
      Must escape literal $ in regular expression.
      $p->eof instead of $p->parse(undef)
      Support netscape_buggy_comment() and implement the eof() method.
      Added two new start() parameters; $attrseq and $origtext.
      First revision.
      Allow "_" in attribute names since Netscape really use this in their 
      Initialize from all <meta> as X-Meta-Foo
      Parser was very confused about "</" when it did not start an end tag.
      $p->links now truncates the list.
      Added SYNOPSIS to all libraries since perl5.003_97 warns if it
      Updated the documentation.
      Only modify arguments in void context.  Requires 5.004
      Doc bug spotted by Martijn Koster
      Know about <applet code=URL>.  Patch from Daniel V Klein 
      Check for Bill Simpson-Young's problem.
      Might introduce ";" for things that look like entities but is not.  
Reported by Bill Simpson-Young <>
      Documentation update.
      =head2 replaced by =item
      Reformatting by Martijn.
      Replaced netscape_buggy_comment() with strict_comment().  Documentation 
      Pass original text to end() method.  Patch by Brian McCauley
      First revision.
      Added documentation.
      Fix TableStripper example bug.
      First revision.
      Optimized by moving lookup of !$self->{'_strict_comment'} out of the
      Document how chuck size influence efficiency.  Reduce chunk size in
      Special case for plain start tags give 2.5% speed up.
      Use last instead of return to get of the the while-loop in parse().
      Added a BUGS section.
      Added $VERSION.
      use strict;
      Don't call the text() method with zero length text any more.
      First revision.
      Increment version number.
      First revision.
      Added Changes.
      Added some more real content.
      New (more interesting) date.
      First revision.
      Splitted test based on wheater URI::URL is available or not.
      Only make the URI::URL module required if a $base URL is given
      Make it work even without HTTP::Headers installed.
      Provide our own header object implementation.  Does not depend on
      First revision.
      Make it work better.
      New tests.
      Documentation flikking.
      2.15 changes.
      Used to be called parser.t
      Replaced with a real test.
      Some more HTML.
      Broke HEX entities &#xFF
      The old t/parser.t is now t/cases.t
      Always clean up tmpfile.
      Make it release 2.16 instead.
      Updated manual page.
      Never split words (a sequence of non-space) between two invocations of
      parse_file now use smaller chunks.
      Document smaller chunk.
      Incremented version number (sub-modules changed).
      Make it better subclass-able by calling $self->_found_link each time a
      Provide a parse_file method that cares about the return value from
      Test $p->parse_file method
      Documentation fix.
      2.18 changes.
      Don't leave space and end of chunk when trying to avoid breaking words.
      First revision.
      Added HTML::TokeParser
      Much more stuff.
      Reference to TokeParser
      First revision.
      Added documentation.
      Added Author address
      Updated with new manual page.  Mention HTML::TokeParser.
      More tests.
      Support reading from plain strings and from globs.
      Netscape comment patch by Peter Orbaek <>.
      Protect eval from $SIG{__DIE__}
      Incremented version number.

bulk88 (1):

gisle (892):
      Removed wrong expired address
      Various spell fixes.
      Fixed my email address.
      Documentation update.
      New year.
      Incremented version number.
      From: Clinton Wong <>
      Better recognition of GLOBs in parse_file().
      Added t/parsefile.t
      First revision.
      Test parsing of large inline documents too.
      More efficient parsing of large inline documents.
      Don't die just because the filename passed to $p->parse_file() can't
      Document that the scalar passed to the constructor must stay the same
      Get rid of the file in the end.
      Documentation update.
      Updated mailing list address.  Removed formatted HTML::Parser manpage.
      Get rid of $Id$ line again.
      Summarized 2.24
      Asjustment of parse_file() change description.
      First revision.
      End tags are recognized.
      Recongnize processing instructions.
      Beginning of declaration and comment matching.
      Parse declarations.
      Parse start tags too.
      Push PL_sv_yes
      More testing.
      Free memory assosiated with tokens arrays for premature and error parsing.
      First revision.
      Set DISTNAME.
      First revision.
      Added some real XS glue.
      Small adjustments.
      Real callbacks for text and end tags.
      Added copyright notice.
      Added rest of callbacks.
      Set up method callbacks.
      strict_comment().  A few small tweaks.
      Callbacks now get a reference to the parser object as 1st argument.
      Keep white space together.
      Make test compatible with HTML::Parser 3 which have its own DESTROY 
      New parse_file() implementation to keep in sync with HTML::Parser's
      Some tweaks here and there.
      Attribute keys are now already lowercased
      pass_cbdata boolean
      Added typemap.
      First revision.
      Added README
      Also set up processing instructions.
      Incremented version number.
      Implemented strict comments.
      Implemented keep_case option.
      Added accum attribute.
      Fill accum array as various tokens are found.
      Incremented version number again.
      Allow ':' in identifiers (isHALNUM).
      Allow ":" in attribute names because it is used by Microsoft.
      Version 2.25
      Don't print filtered any more.
      Check for $self->{parse_file_stop}
      Avoid parse_file() duplication.
      Summarized 2.25 changes.
      Minor detail.
      First revision.
      Look for $self->{parse_file_stop} in $self->parse_file loop.
      Added lib files and t files.
      <XMP>...</XMP> support.
      <xmp> support.
      Increased version number again.
      Replaced <xmp> support with the more general literal_mode.
      Added TODO list.
      We did not get out of literal mode as we should.
      Another todo item.
      More todo things.
      Another break.
      Killed some unneeded conditionals.
      2.99_04 release.
      New release again.
      Incremented version number.
      Implemented xml_mode.
      Implemented bool_attr_value
      If no bool_attr_val is set, then it will take the value of the attribute.
      First revision.
      Added Solaris hints to avoid gcc compilator bug.
      Inline decode_entities function.
      Updated todo.
      Load HTML::Entities.
      2.99_06 release.
      Rely on XS implementation of decode_entities_old.
      Integrated HTML-Parser-XS version 2.99_06.
      Version 2.99_07
      Attribute values entities are now expanded in the start callback.
      New bool attribute: decode_text_entities.
      Call the bool_parser_attr() function strict_comment() in order to avoid
      Got back old README text.
      Updated bug section.
      We got problems with ERROR.  Trying with FAIL instead:
      Tweaks to make it compile with perl5.004_04 too.
      Avoid calling SvREFCNT_inc() in void context (mostly).
      Make a copy of assigned 'bool_attr_val'.
      Fix serious memory leak.  We allocated an SV for text content twice.
      In xml_mode, don't report empty start tags with an extra parameter,
      Added line number counting as an option.
      Summarized _07 changes.
      Make it compile on perl5.004_05.
      Need to push references to PVAVs onto the accum array.
      More newRV-fixing when pushing array elements into an array.
      Implemented v2_compat flag.
      Reply on $p->v2_compat to set up method callbacks.
      Implemented by taking advantage of $p->accum.
      Also filter process instructions.
      Moved to
      Set up start-callback function instead of relying on method callbacks.
      Passing callbacks in ctor did not work (Need to try to set callbacks
      Close file to make sure it is not empty..
      Warn if unlink($filename) fails.
      Close filehandle before trying to unlink it.
      close files.
      Better unlink warning.
      Don't catch exceptions when trying to call ctor key arguments as a
      Moved comment parsing out of html_parse_decl into its own procedure.
      Added a process instruction to the stuff.
      Rely on the complete process instructions to be available is second
      Implemented 'default' handler.  All document text is passed to this
      Summarized 2.99_08
      Grammar fixes by Michael A. Chase <>
      Added binmode() to test since it was done to the $p->parse_file method
      Incremented version number to 2.99_09
      From: "John Hurst" <>
      close($io) as workaround for perl-close bug.
      Some minor cleanup.
      All specific parsing now delegated to parse functions.  Simplifies
      Select parse function by an array lookup instead of a series of if-tests.
      First revision.
      Set up dependecy for pfunc.h
      Added mkpfunc.
      Use type 'bool' for boolean attributes in PSTATE
      Added mkhctype.
      #include "hctype.h"
      First revision.
      Build "hctype.h"
      Use hctype-macros to implement strict names.
      Prepare for 2.99_09
      Avoid \z which did not do the right thing for perl5.004
      Avoid \z which don't work for perl5.004
      Better alpha release summary
      Summarized 2.99_10
      The old POD is back.
      Added documentation note.
      Parse <!> as an empty comment.  Hooks for marked_section implementation.
      Incomplete marked section support.
      Markde CDATA/RCDATA sections now work.
      Make marked section support deselectable.
      Don't leak any $@ messages.
      Be case insensitive when matching the end tag in literal_mode.
      Added even more link tags as suggested by
      Complete marked section support.
      Put magic number into the header of p_state.
      Ask if marked sections should be there.
      Implemented unbroken_text option.
      Implemented attr_pos().
      Gramar changes from Michael A. Chase.
      Gramar fixes by Michael A. Chase.
      Text change.
      Make attr_pos "work" for boolean attributes too.
      Report end of previous attribute/tag as first number for attr_pos
      Callbacks are now set up with _cb suffix.
      For the constructor arguments, we now use _cb as suffix for those that
      pass_cbdata renamed to pass_self.
      pass_cbdata renamed as pass_self
      Expanded TODO section.
      One more optimization to think about.
      Summarized 2.99_12.
      Gramar corrections by Michael A. Chase
      Case insensitive yes.
      Documentation patch from Michael.
      Various documentation updates.
      More updates to documentation.
      First revision.
      First revision.
      Test accum filling.
      Added two new tests.
      Make it possible to unset callbacks.
      First revision.
      HCTYPE_NOT_SPACE_EQ_SLASH_GT 0x40 was not initialized.
      First revision.
      Two more tests.
      Summarize 2.99_13.
      From: "Michael A. Chase" <>
      Some more todo.
      In perl5.004_05 we can't return PL_sv_undef safely.
      Forgot a little detail.
      Fixes by Michael A. Chase
      Documentation update by Michael A. Chase.
      One more todo option.
      Incremented version number.
      Prepare for 2.99_14.
      Better warning if undefined document is passed in.
      First revision.
      First revision.
      Renamed as tokenpos.h
      Added another .h file.  Made marked section support the default.
      First take at normalizing everything to call html_handle().  We still
      Now also html_parse_start() calls html_handle().
      Version 2.99_15
      Added handler stuct array to pstate.  Replaced $p->callback and
      Basically set up callback loop.
      Set up all basic arguments.
      Trimmed out various boolean attributes.  The ones eliminated are:
      Implemented cdata argspec.
      Updated TODO list.
      Killed all the routines that was replaced by html_handle().
      Direct method calls.
      Added MAC to copyright notice.
      New callback interface.
      token1 indentifier in attrspec
      Allow handler to be specified as an array of two values too.
      Look for MS_IGNORE in html_handle().
      New syntax.
      Move to new syntax.
      Better default handlers.
      Took out accum test.
      Fit with new way of doing things.
      Avoid reporting empty text segments.
      Set up our own accumulator array.
      Changed sequence of handler arguments.
      Reversed order of $p->handler arguments.
      Added tokenpos.h
      We did copy from the wrong place.
      First revision.
      Added largetags.
      Killed unused $a
      Support "event" in argspec.
      Test with ">" after ms.
      Documentation update from MAC
      MAC patch to support accumulator array in html_handle().
      version => 3 ctor option.
      Artificial end tag should have empty origtext.
      Test that artificial end tag get empty origtext.
      api_version => 3
      api_version => 3.
      Don't ask about marked sections any more.
      Don't eat newline after "]]>"
      Fix some obvious memory leaks.
      ]]> dont swallow "\n" any more.
      "realloc" as parameter name created problems.  Fix by Paul Schinder 
      Patch from MAC that makes it into a real test.
      Documentation patch from MAC.
      Working array dest.
      Use internal array-as-handler-destination-support.  Patch by MAC.
      Since we are faster we need longer speed test.
      Moved some functions out of Parser.xs
      Added copyright
      Dropped html_ prefix.
      First revision.
      Moved stuff out of Parser.xs
      More H files.
      More stuff.
      Some attrspec renaming.
      Minor spellfix.
      beta now
      Does not make sense in XS parser world.
      Moved literal_mode_elem to hparser.c
      Remove some commented-out code.
      Documentation patch from MAC.
      Updated it.
      Reduce length of speed test.
      Initial support for offset.
      pending_text gone.
      Added offset.
      Document offset.
      Working "offset" in attrspec.
      First revision.
      Added offset.
      First revision.
      New case.
      Added t/attrspec.t
      Doc patch from MAC.
      One more.
      Typo fix by MAC.
      Fix tokens reported in the artificial case.  Patch by MAC.
      <a "> core dump.
      First revision.
      Back out some more changes.
      Take out linepos
      For boolean attributes would could get very strange values unless
      Bug tokens for artificial tag fixed by MAC.
      Language fixes by Michael.
      Documentation update from MAC.
      Minor layout fixes by MAC.
      Another DOC patch.
      Don't make empty token/tokenpos arrays.
      Changed behaviour.
      Renamed token1 as token0
      av_extend() token/tokenpos arrays.
      For artificial end tag we don't report any tokenpos, but report tokens.
      Update from me.
      Rename bool_attr_value
      Doc patch from MAC.
      Renamed attrspec.t as argspec.t
      Renamed attrspec as argspec.
      Introduced enum argspec_opcode.
      Renamed opcode as argcode and OP_ as ARG_
      enum argcode
      Nothing much.
      First revision.
      Renamed bool_attr_value as boolean_attribute_value
      Added eg/hrefsub
      Added a BUGS section.
      argspec length
      Documented literal string in argspec.
      Off by one error when reporting literal end token.
      First revison.
      Added htext.
      First revision.
      Added t/exit-via-next.t
      Argspec undef
      First revision.
      Added eg/hstrip
      Doc patch from MAC.
      Typo fixes.
      One more attrspec cusin.
      Simplified hrefsub by working right to left.  Patch by MAC.
      Protect " inside $new_v
      Better fail message.
      Taken out debug stuff.
      Renamed cdata_flag as is_cdata
      Added usage string.
      Added short description of each file.
      Need a statement after a label.  Fix pointed out by
      Some more thoughts.
      MAC improvement (remove stuff from left)
      A generic bug.  Don't test for it any more.
      t/exit-via-next.t gone
      if we killed all attributed, kill any extra whitespace too
      Some adjustments by MAC.
      Fix core dump.
      Simplified check_handler()
      First revision.
      Don't get double refcnt decrement if argspec_compile() or
      Remove debugging output.
      Allow h->argspec to be NULL in report_event()
      Don't allow handler arguments to be grouped as an array reference.
      First revision.
      Added two more tests.
      Yet another update.
      Statement that is not correct any more.
      Documentation update.
      Documented return value from $p->handler().
      Doc patch from MAC.
      Added <�� as test case.
      A little more precision.
      First revision.
      Added a comment.
      Fix core dump reported by Doug MacEachern.
      First revision.
      Test netscape_buggy_comment too.
      Test process too.
      carp about netscape_buggy_comment instead of a warning.
      First revision.
      Note about depreciate state of this module.
      Updated again.
      Another update.
      Changed name of hash entry to _hparser_xs_state.
      Two more sections.
      First revision.
      Make \\ reserved in argspec literals so we can use it as escape character 
      More to go.
      One more change.
      Allow handlers to call $p->eof to abort parsing.
      $p->eof in handlers is now supported.
      Updates to the examples.
      Handler $p->eof
      First revision.
      Added many new tests.
      Added header.
      Various documentation and english tweaks from MAC.
      Don't use a Perl-hash for argspec any more.  Instead we simply use a
      I also decided to take a swing at the IGNORE handler.  Any false value
      Summarized 2.99_96
      Minor tweak.
      Yet another one of those useless tweaks.
      Test patch from Michael:
      Final POD tweaks from Michael.
      3.00 and some minor doc tweaks.
      Added MAC to Copyright messages
      Avoid calling method callbacks as options.
      Killed DISTNAME
      Make '3.00' a string.
      Removed beta blurb.
      First revision.
      After ispell
      Use "" instead of &ignore.  Patch by MAC.
      One additional paragraph from MAC.
      After MAC hacking.
      3.00 ready.
      Assertion was backwards.
      The hash function has probably changed so we need sorting to ensure
      Use ~-magic to trigger deallocation when IV that points to struct p_state 
goes away.
      Summarized new stuff.
      Tweaks before 3.01
      Added an "also"
      Make _hparser_xs_state into a reference to the IV-pointer
      Adjusted because _hparser_xs_state is now a reference to the IV-pointer.
      Introduced init().
      Reuse earlier 'Not a reference to a hash'-message.
      First revision.
      Added comment parsing.
      2000 copyright.
      Version 3.03 (new year)
      Prepare for 3.03
      We did not get out of comment mode for comments ending with an
      Try 3 dashes in a row.
      Fixed marked_sections without an s
      Back out option checking patch by MAC.
      Kill documentation of init().
      Minor doc tweaks by me.
      Backed out some of 3.03 patch.
      One more thing.
      Some typos fixed.
      xml_mode should prevent special treatment of <script>, <style>...
      Fix example.  Some more text.
      Don't enter CDATA mode for some tags in XML mode.
      Don't enter literal_mode when XML mode is enabled
      No Literal mode for XML.
      Special CDATA parsing for XML is gone now.
      Moved HTML::Filter to Decpreciated section.
      Implemented unbroken_text.
      Did not set is_cdata when we got out of outer level CDATA MS.
      Get the offset correct when alternating between CDATA/!CDATA modes.
      Don't initialize handler before we have to.  I am still wondering
      First revision.
      Also try <xmp>...</xmp>
      Don't keep text unbroken between unreported tags.
      An extra newline...
      New test.
      Fix last test.
      unbroken text done
      3.05 soon ready.
      require 3.00
      From: James Walden <>
      First revision.
      First revision.
      Fixed warning.
      Avoid some "statement not reached" from picky compilers.
      From: Doug MacEachern <>
      Version number is now 3.06
      Added eg/htextsub
      Fix for 5.004.  By avoiding OUTPUT: RETVAL we don't get sv_2mortal()
      Incremented version number.
      Copyright 2000.
      Only continue with declaration parsing when we find "DOCTYPE" or 
"ENTITY".  Based on patch by la mouton <>.
      First revision.
      Added t/declaration.t
      First revision.
      A short comment.
      Added hanchor.
      Typo fix.
      Fixed typo spotted by Jamie McCarthy <>.
      Match typo fix in
      Avoid access to freed() memory.
      Version number is now 3.08
      Changes for 3.08
      Document that the $p->parse() argument should not be modified.
      Added a litle description of what 'token0' is for process and comment
      Documentation update as suggested by Paul Makepeace 
      Make a mortal copy of the self argument passed to a handler.
      Another change in 3.09
      More mortal copies.  SPAGAIN after flush_pending_text()
      Get %linkElements from HTML::Tagset.
      Grab link data from HTML::Tagset
      Rely on HTML::Tagset
      Spelling patch from David Dyck <>
      PREREQ_PM HTML::Tagset.
      Get it to compile with "Optimierender Microsoft (R) 32-Bit
      A change missing in the log.
      Deal with unicode entities.
      Copyright 2000
      Added unicode entities from HTML4.0.1 spec.
      Deal with numification.
      Added uentities.
      Only 9 tests.
      Check for overflow.
      Better overflow check.
      Test overflow detection.
      Avoid failure under unicode.
      Don't set UNICODE_ENTITIES if $] > 5.006.
      Prompt for -DUNICODE_ENTITIES
      Don't test if UNICODE_SUPPORT is not enabled.
      Fix infinite loop in case the handler triggered by ->eof
      Incremented version number: 3.14
      Allow declaration parsing to take place for lowercase <!doctype ...>
      Release 3.14
      Escape new hash keys that happens to be perl keywords.
      $p->get_tag() can now take multiple tag names to match.
      Test with multiple arguments to $p->get_tag
      Really hide debugging code.
      UTF8 entities has already been done.
      Require 5.7.0 or better in order to offer "Unicode entities".
      Disable GET_CONTEXT for threaded perls because "we want efficiency".
      Get out a few more dTHXs by passing context with pTHX_ and aTHX_
      Release 3.15.
      Document that HTML::Tagset is a PREREQUISITE.
      Weaken then libwww-perl PREREQUISITE.
      Deleted note about v2 compatibility.
      Use INT2PTR instead of cast directly between pointers and IV.
      Set up INT2PTR unless perl provide it.
      Version 3.16 and Copyright -2001.
      A few more ideas.
      use strict
      unbroken_text now works across ignored tags.
      unbroken text behaviour fixed.
      Test one more range.
      Fix decoding of unicode entities.
      Copyright 2001.
      Always update size.
      Added _decode_entities(). Reindent.
      Export _decode_entities()
      Added t/entities2.t
      Forgot about pTHX_ from grow_gap().
      Release 3.17.
      Removed ANNOUNCEMENT.
      C++ comment left over from debugging removed.
      Release 3.18.
      Use get_hv() as documented in perlapi.
      Avoid global entity2char.  Patch by Sarathy. Version 3.19
      Support @attr argspec.
      Allow @{....} in argspec to signal flatting of array.
      Implemented ignore_tags/ignore_elements/report_tags
      Documents filter methods.
      Added test for @attr and @{...}
      Test new filter methods.
      Renamed report_tags as report_only_tags.
      Release 3.19_90
      Allow array references passed into $p->ignore_tags.
      Doc update about the effect on offset/length under unbroken_text
      The netscape_buggy_comment now gives mandatory warning
      Clear ignoring_element on eof.
      Simplify ARG_ATTR code a bit.
      Simplify by using ignore_tags/ignore_elements.
      No need for end_h
      Minor stylistic issue.
      Simplify by using report_only_tags
      Optimize tag reporting.  Image text should not be array ref.
      Doc tweak for report_only_tags()
      Version 3.19_91
      User filters.
      Use filters.
      Make it possible to pass key/value arguments to the constructor.
      Attr needed for textify.
      Introduced HTML::PullParser.
      Support parsing from doc => $str
      Test HTML::PullParser
      Reference HTML::PullParser instead of HTML::TokeParser.
      A clearer separation between 'doc' and 'file' parsing.
      Release 3.19_92
      Track unicode support as of perl@9359
      Avoid sv_catpvf(sv, "%c",...) as it wants to upgrade
      Doc fix.
      Release 3.19_93
      Support "tag" argspec.
      Document "tag" argspec.
      Prev patch broke lowercasing of tagnames.
      Test "tag" argspec
      Example of PullParser usage.
      Doc update.
      Implemented tracing of line and column numbers.
      Column numbers was off by one.
      Print line/column numbers instead.
      Test col/line.
      Get offsets/line- and column- numbers correct when skipping
      Release 3.19_94
      Include description of HTML::PullParser.  Remove description of 
      Ref hform example in doc.
      Release 3.20
      Don't promise any utf8 option.
      Avoid compiler warnings on some some compilers.  The DEC C said:
      Fix memory leak in filters.
      Optimize: Reuse the same SV for filtering by tagnames.
      Release 3.21
      Decode &apos;
      Parse <textarea> in literal mode, but not with is_cdata flag set.
      Release 3.22
      Moved filter testing code up a bit.  The ignore_elements filter
      Release 3.23
      Support parsing from code.
      use strict.
      Added start_document and end_document events (as for SAX).
      Implemented skipped_text argspec.
      Fixed interaction between unbroken_text and skipped_text.
      Implemented offset_end argspec.
      Doc update.  Release 3.24.
      Test offset_end.
      Release 3.24.
      Fix plaintext parsing.
      <plaintext> fixed.
      Some more state that was not reset on EOF.
      perl5.004_04 did not have ERRSV
      croak(0) was not present for 5.6.0
      From: "Stephane Barizien" <>
      Release 3.25
      Don't encode \r as suggested by Sean M. Burke.
      Make 'make clean' also clean up generated *.h files
      From: "Timur I. Bakeyev" <>
      Another example program.
      Avoid warnings emitted by perl-5.7.3
      From: Guy Albertelli II <>
      Added a few tests.  Resorted.
      More doc updates explaining C<case sensitive>
      Calling perl_call_* without G_EVAL always means trouble.
      Dont get fooled by an emtpy http-equiv
      We already had a RETHROW macro defined.
      Release 3.26
      First revision.
      Added eg/hlc to the example programs.
      Typo spotted by Marc Lehmann <>.
      From: "Sean M. Burke" <>
      Test encode_entities_numeric
      Release 3.27
      Fixed typo.  Spotted by Sean.
      Pass context around instead of using dTHX;  This should be faster.
      Make <!454554> be treated as a comment unless strict_comment is enabled.
      Version 3.28.
      avoid Visual C warning.  Patch by
      Don't use the pfunc by default.  On Intel P4 that saves about 3000 bytes 
on the binary but there was no easy to measure speed difference.
      xml_mode implies strict_names also for end tags.
      64-bit fix from Doug Larrick <>
      Documentation patch: <textarea> is also literal mode.
      MSIE compatibility stuff.
      Need <!-- for strange <script> behaviour to show up.
      Allow crap in end tags as MSIE does.
      The name token name 'empty' was not good.
      Parse <! "<>"> as comment (MSIE compat).
      Implement 'strict_end' to control acceptance of junk at the end of end 
      Parse with <--comments> like this if we can't find the real thing.
      Release 3.29.
      From: Steve Hay <>
      Avoid RETVAL warnings as reported by Steve Hay <>
      Perl-5.7 should be gone by now.
      Better fix for the RETVAL warnings.  Use PPCODE for the parse functions.
      Missing unicode support noted.
      Also PPCODify handler().  Fixed return value for eof().
      The assert() apparently needs my_perl so ignore it.
      Documentation: Don't reference perl 5.7 any more.
      Release 3.30.
      Release 3.31
      Stale stuff.
      If the document ends with "some kind of unterminated markup", then
      Show skipped reason in the official way.
      Updated documentation.
      Include $Id$.
      Let the get_text() and get_trimmed_text() methods take multiple
      Document the </script> inside quotes case as a BUG.
      Typo spotted by S Page <>
      Apply patch (partly) from S Page <> that adds some 
      Note that parsing of Unicode does not work yet.
      Added dump script.
      Release 3.32.
      Implement get_phrase().
      Make get_text() expand most skipped tags to " "
      We don't support 5.004 any more.  For some strange reason the
      Release 3.33
      Fix release date for 3.33
      Avoid core dump when the stack get reallocated during the parse() call.
      Added testcase for the stack realloc bug to the test suite.
      Release 3.34
      No need to redeclare SP.
      From: "Croome, Paul" <>
      Release 3.35
      When an attribute occurs use the first one in 'attr' instead of
      Compute hash only once.
      Release 3.36
      Silence 'gcc -Wall' - the prev_token might be a real issue.
      Time to ditch the v2 synopsis.
      Improve the handling of surrogate pairs.  Based on patch by
      Match perl's rules for Unicode non-chars.
      Avoid temp modification of argspec strings.
      Must also upgrade chars after the gap.  Otherwise we might produce
      Release 3.37
      Make closing of <plaintext> configurable.
      Release 3.38
      Parse <title> in literal mode.
      Updated copyright year.
      Make the UTF8-ness of strings parsed propagate.
      Disable Unicode stuff for perl < 5.8.  I still want HTML-Parser
      Get offsets right for Unicode string.
      Removed Unicode noop.
      Test Unicode parsing behaviour.
      Don't consider perl-5.6 Unicode capable.
      Release 3.39_90
      Usually there is only one <title>.
      Unicode basically done.
      Convert to use
      Header is not done if we see the Unicode BOM.
      Unicode is not supported.
      Unicode BOM tests.
      UTF-8 BOM warning only when Unicode is avalable.
      BOM tests.
      Some behaviour seen in KHTML sources.
      Implement quote behaviour for <script> tags.
      Test quote behaviour.
      Propagate UTF-8-ness during flushing at eot.
      If literal tags are unterminated, flush them out with the text
      Make Unicode BOM warnings optional and document them.
      This change was supposed to go somewhere else.
      Document that these modules need decoded chars to parse.
      Release 3.39_91
      Some new MSIE comptibility issues.
      MSIE compatibility: Expand unterminated entities in 'dtext' and
      Improve decode_entities() documentation.
      Test parsing of Unicode from file.
      Try to describe Unicode issues better.
      Added attribute 'utf8_mode'.
      Sort documentation; boolean attributes, argspecs, events.
      Test utf8_mode.
      Fix utf8_mode semantics.  The entities are now decoded as UTF-8.
      Release 3.39_92.
      Simpler HTML link.
      Trigger UTF8 warning if anything in the first chunk looks like hibit UTF8.
      The utf8_mode produce garbage for older perls.
      Least expensive tests first.
      Release 3.40.
      Make it work with perl-5.005
      Release 3.41
      Use push_header for all headers added.  Do not want to loose any values.  
Better to duplicate fields.
      Silence warnings from the HP C compiler about char/U8 mismatches.
      Typo in r2.26
      Avoid sv_catpvn_utf8_upgrade; make us perl-5.8.0 compatible.
      perl-5.8.0 does not have utf8::is_utf8.
      Release 3.42.
      Fix test failure on Windows.
      Forgot to set repl_utf8 flag which might lead to utf8 corruption.
      Release 3.43
      Fix the handling of quoted strings.
      Release 3.44.
      Fix stack leak.
      Release 3.45.
      Explain affected code.
      From APEE build log with the HP native C compiler.
      Fix typo spotted by Stefan Funke <>.
      From: Norbert Kiesel <>
      Test pod correctness and fix up missing =back.
      use strict;
      Don't treat 0xA0 as space, since it's not really and XML agrees.
      Try parsing of \x0420.
      Release 3.46
      From: Norbert Kiesel <>
      Make unbroken_text the default for HTML::TokeParser.
      Silence all the diag noise.
      Skip blocks needs to be called SKIP for it to work.
      perl-5.8.0 is just too buggy for HTML-Parser.
      Faster load time with XSLoader.
      Make the source ASCII only.
      Better use of Test::More.
      An explicit binmode() make this test pass with perl-5.8.0
      encode &apos by default.
      Make tests pass for perl-5.6.
      It seems to work with perl-5.8.0 now.
      Add empty_element_tag and xml_pic attributes.
      xml_pic has been added
      Need to look for '/>' in more places when strict_names isn't enabled.
      Make empty_element_tag default on for HTML::TokeParser.
      Documentation tweaks.
      Add some empty elements tests.
      Rename as empty_element_tags (with s)
      Release 3.47.
      Test empty_element_tags/xml_pic.
      Fix typo.
      Don't enable empty_element_tags by default.  It breaks HTML::Form :(
      Adjust token counts now that empty_element_tags is not the default.
      marked_sections omit first 3 bytes "<!["  from "skipped_text"
      perl 5.6 is required.
      Release 3.48
      First revision.
      Events could still fire after a handler has signaled eof.
      marked_sections with text ending in square bracket parsed wrong
      Release 3.49.
      Updated copyright year.
      From: Steve Hay <>
      Release 3.50.
      Typos spotted by
      Improved MSIE compatibility.  Only the Latin-1 entities
      First revision.
      More tests.
      One more ref.
      Updated documentation.
      Release 3.51.
      Typo fixes are also in 3.51.
      Add some results.
      Link to
      Added HTML-Parser to the result table.
      Safari results.
      Documentation typo fix.
      Make sure 'start_document' is triggered exactly once per document.
      Documentation tweaks.  Recommend empty_element_tags.
      Documentation typo fixes.
      Release 3.52.
      ignore_element treated </script> like <script>.
      Release 3.53.
      Enabling of empty_element_tag interacted badly with literal mode.
      Release 3.54.
      Yaakov Belch was responsible for release 3.53 and 3.54.
      Test that empty_element_tags works for <script/> too.
      Consider <!a'b> a comment by itself.
      From: Gisle Aas <>
      Treat <> at end as text.
      Test <!a'b> comments.
      Release 3.55.
      Support threads cloning.  Contributed by Bo Lindbergh.
      New test file.
      Release 3.56.
      Restore perl-5.6 compatiblity.
      New year.
      Remove debug printout.
      State Test::More dependency.
      Don't require whitespace between declaration tokens.
      Extra plaintext test from Alex Kapranoff <>.
      Alex Kapranoff claims the closing_plaintext behaviour only occured
      Implement backquote() attribute as requested by Alex Kapranoff.


No new revisions were added by this update.

Alioth's /usr/local/bin/git-commit-notice on 

Pkg-perl-cvs-commits mailing list

Reply via email to