This is an automated email from the git hooks/post-receive script.

kanashiro-guest pushed a change to branch master
in repository libhtml-parser-perl.

      from  2c2921f   releasing package libhtml-parser-perl version 3.71-2
      adds  c9940d5   First revsion.
      adds  a06b717   Fake-compile regexps using anonymous subs.  More 
      adds  520b920   Removed trailing whitespace and unexpanded the text 
(replaced initial space with tab where possible.)
      adds  3e56ac9   Fixed copyright message.
      adds  4a0c7ac   Moved from ../base
      adds  fac3ed4   Avoid quotes in hash key.
      adds  aeb6d0b   First revision.
      adds  f61c1ec   Added test based on RFC1866
      adds  3d86f12   Included additional ISO-8859/1 entities listed in rfc1866 
(section 14). More comments.
      adds  c1abfc7   Typo fix by Bob Dalgleish <>
      adds  ae62354   First version.  Posted on the mailing list 1996-07-08.
      adds  87a6568   Clear links when calling parse_file().
      adds  aaadedc   Parse <link> attributes in head.  Renamed header Base: to 
Content-Base: to be compatimble with HTTP/1.1.
      adds  7f4f938   Slightly better documentation.
      adds  0027ad6   Renamed Base: to Content-Base: and Implemented Link:
      adds  1fab0b1   First revision.
      adds  9f73735   Got Ambiguous use of {links} resolved to {"links"}
      adds  49b715e   Added support for <embed src="..."> as suggested by Hans 
de Graaff
      adds  a1e34b8   Added <frame src="..."> to the things recognized
      adds  6e95e54   Added an example to the documentation.
      adds  b4823db   Added test to check that the links method work when there 
are no links in the parsed document.
      adds  09552bf   Avoid 'Can't use an undefined value as an ARRAY reference 
message' when no links are found in the document.
      adds  9e095d5   Must escape literal $ in regular expression.
      adds  d13a8a8   $p->eof instead of $p->parse(undef)
      adds  c615f75   Support netscape_buggy_comment() and implement the eof() 
      adds  3a961f8   Added two new start() parameters; $attrseq and $origtext.
      adds  5472d51   First revision.
      adds  41442e4   Allow "_" in attribute names since Netscape really use 
this in their bookmarks.html
      adds  6843a65   Initialize from all <meta> as X-Meta-Foo
      adds  f39f6db   Parser was very confused about "</" when it did not start 
an end tag.
      adds  d861a10   $p->links now truncates the list.
      adds  4d8587c   Added SYNOPSIS to all libraries since perl5.003_97 warns 
if it is not present.
      adds  d26df9b   Updated the documentation.
      adds  bfe5f10   Only modify arguments in void context.  Requires 5.004
      adds  6a90d09   Doc bug spotted by Martijn Koster
      adds  20eaee6   Know about <applet code=URL>.  Patch from Daniel V Klein 
      adds  c15026e   Check for Bill Simpson-Young's problem.
      adds  bdcb447   Might introduce ";" for things that look like entities 
but is not.  Reported by Bill Simpson-Young <>
      adds  00def24   Documentation update.
      adds  0fa9bae   =head2 replaced by =item
      adds  671fe46   Reformatting by Martijn.
      adds  f8b44fc   Replaced netscape_buggy_comment() with strict_comment().  
Documentation update.
      adds  6200d9f   Pass original text to end() method.  Patch by Brian 
McCauley <>.
      adds  0d321af   First revision.
      adds  6060a3c   Added documentation.
      adds  1861f51   Fix TableStripper example bug.
      adds  37c7810   First revision.
      adds  a7ac97c   Optimized by moving lookup of !$self->{'_strict_comment'} 
out of the parser loop.  I got a 5% speedup by this.
      adds  301b665   Document how chuck size influence efficiency.  Reduce 
chunk size in parse_file().
      adds  aafe0c0   Special case for plain start tags give 2.5% speed up.
      adds  cdcad86   Use last instead of return to get of the the while-loop 
in parse().
      adds  bcace72   Added a BUGS section.
      adds  da6c9ff   Added $VERSION.
      adds  619498f   use strict;
      adds  02f1974   Don't call the text() method with zero length text any 
      adds  d15b6e6   First revision.
      adds  bc54567   Increment version number.
      adds  8549602   First revision.
      adds  2f017b5   Added Changes.
      adds  ecbcc0a   Added some more real content.
      adds  7addd67   New (more interesting) date.
      adds  0b06822   First revision.
      adds  6c83110   Splitted test based on wheater URI::URL is available or 
      adds  28cca7a   Only make the URI::URL module required if a $base URL is 
given to the constructor.
      adds  c20ebc1   Make it work even without HTTP::Headers installed. 
Documentation update.
      adds  d9ec2ab   Provide our own header object implementation.  Does not 
depend on HTTP::Headers any more.
      adds  5739159   First revision.
      adds  839d89e   Make it work better.
      adds  4413674   New tests.
      adds  b119720   Documentation flikking. Increment version number.
      adds  8f8cd20   2.15 changes.
      adds  5a59b70   Typo.
      adds  9e01a15   Tweaks.
      adds  99e3b2f   Used to be called parser.t
      adds  3554904   Replaced with a real test.
      adds  3ec15d1   Some more HTML.
      adds  13eb5d4   Broke HEX entities &#xFF
      adds  951ac50   The old t/parser.t is now t/cases.t
      adds  af954c4   Always clean up tmpfile.
      adds  05813f8   Make it release 2.16 instead.
      adds  5b32205   Updated manual page.
      adds  78b7dd2   Never split words (a sequence of non-space) between two 
invocations of $self->text.
      adds  8596106   2.17
      adds  a188fd1   parse_file now use smaller chunks.
      adds  652da52   Document smaller chunk.
      adds  8c132c0   Incremented version number (sub-modules changed).
      adds  8436834   Make it better subclass-able by calling 
$self->_found_link each time a new link is found.  The default implementation 
of _found_link will call a callback or add links to $self->{'links'}.
      adds  259d299   Provide a parse_file method that cares about the return 
value from $self->parse.
      adds  c986edb   Test $p->parse_file method
      adds  c509339   Documentation fix.
      adds  1beb530   2.18 changes.
      adds  f80c12d   Don't leave space and end of chunk when trying to avoid 
breaking words.
      adds  2370229   2.19
      adds  2e3c549   First revision.
      adds  250f8d8   Added HTML::TokeParser
      adds  d1a60b6   Much more stuff.
      adds  075e0d7   Reference to TokeParser
      adds  333610c   tokeparser.t
      adds  6ad3097   First revision.
      adds  63a2742   Added documentation.
      adds  ecc12a3   2.20
      adds  2028576   Added Author address
      adds  6822a32   Updated with new manual page.  Mention HTML::TokeParser.
      adds  84e09d7   More tests.
      adds  1855a64   Support reading from plain strings and from globs.
      adds  4f9fe5f   Netscape comment patch by Peter Orbaek <>.
      adds  cbffd18   2.21
      adds  6ccd81d   Protect eval from $SIG{__DIE__}
      adds  09cf24f   2.22
      adds  dca87c0   Incremented version number.
      adds  6500748   Removed wrong expired address
      adds  e4a4253   Various spell fixes.
      adds  b5b2377   Fixed my email address.
      adds  e2610b8   Documentation update.
      adds  b8c33a5   New year.
      adds  3037ade   Incremented version number.
      adds  dd7036e   2.23.
      adds  43e80d9   From: Clinton Wong <> Subject: 
HTML::LinkExtor patch To: Date: Tue, 29 Jun 1999 14:02:31 -0700 
      adds  40f48a3   Better recognition of GLOBs in parse_file().
      adds  5e63027   Added t/parsefile.t
      adds  82a8d37   First revision.
      adds  8734107   Test parsing of large inline documents too.
      adds  652e7e4   More efficient parsing of large inline documents.
      adds  9911265   Don't die just because the filename passed to 
$p->parse_file() can't be opened.
      adds  fa195ca   Document that the scalar passed to the constructor must 
stay the same during parsing.
      adds  605ce2e   Get rid of the file in the end.
      adds  a535c93   Documentation update.
      adds  34bb597   Updated mailing list address.  Removed formatted 
HTML::Parser manpage.
      adds  8d3f59b   Get rid of $Id$ line again.
      adds  0751bcc   Summarized 2.24
      adds  28aa662   Asjustment of parse_file() change description.
      adds  bce2144   First revision.
      adds  ea58d53   -Wall
      adds  d5eb670   End tags are recognized.
      adds  4fff0df   Recongnize processing instructions.
      adds  f0f0686   Beginning of declaration and comment matching.
      adds  4ed8f5b   Parse declarations.
      adds  b6eac7c   Parse start tags too.
      adds  cdb2901   Push PL_sv_yes
      adds  28ece90   More testing.
      adds  b842c98   Free memory assosiated with tokens arrays for premature 
and error parsing.
      adds  6fe8b7b   Bye.
      adds  e60f1d5   Updated.
      adds  caf3f9a   First revision.
      adds  b80576d   Makefile
      adds  98d25c3   Set DISTNAME.
      adds  b1e3155   First revision.
      adds  2f54742   Added some real XS glue.
      adds  d1e556d   Small adjustments.
      adds  f3009af   Real callbacks for text and end tags.
      adds  389c799   Added copyright notice.
      adds  09390c1   Added rest of callbacks.
      adds  0a27464   Set up method callbacks.
      adds  20eaae0   strict_comment().  A few small tweaks.
      adds  7b4e642   Callbacks now get a reference to the parser object as 1st 
      adds  ce1fbd2   Keep white space together.
      adds  7d634f4   Make test compatible with HTML::Parser 3 which have its 
own DESTROY method.
      adds  4e8d89b   New parse_file() implementation to keep in sync with 
HTML::Parser's method.
      adds  29bf96c   Some tweaks here and there.
      adds  1b41f5b   Attribute keys are now already lowercased
      adds  94578bb   Reduction.
      adds  c3fd0f3   pass_cbdata()
      adds  e830513   pass_cbdata boolean
      adds  cdeb9a5   Added typemap.
      adds  674bd40   First revision.
      adds  33aa362   Added README
      adds  7132db8   Also set up processing instructions.
      adds  ca097df   Incremented version number.
      adds  2e775fd   Implemented strict comments.
      adds  f0c2ea2   Implemented keep_case option.
      adds  a6c4ee1   Added accum attribute.
      adds  e18e983   Fill accum array as various tokens are found.
      adds  48983ab   Incremented version number again.
      adds  b815650   Allow ':' in identifiers (isHALNUM).
      adds  f49fdee   Allow ":" in attribute names because it is used by 
      adds  b6ec709   Version 2.25
      adds  f87a6a4   Don't print filtered any more.
      adds  c2629e8   Check for $self->{parse_file_stop}
      adds  6de9c1a   Avoid parse_file() duplication.
      adds  41bc120   Summarized 2.25 changes.
      adds  2daa27d   Minor detail.
      adds  152aaf1   First revision.
      adds  866b5a8   Look for $self->{parse_file_stop} in $self->parse_file 
      adds  dffb25f   Added lib files and t files.
      adds  1a37f26   <XMP>...</XMP> support.
      adds  fc1f387   <xmp> support.
      adds  e6c340e   Increased version number again.
      adds  92c4f4a   Replaced <xmp> support with the more general 
literal_mode. This allow us to parse <script>/<style>/<xmp> better.
      adds  1036a7d   Added TODO list.
      adds  ee10358   We did not get out of literal mode as we should.
      adds  6db0714   Another todo item.
      adds  a527a14   More todo things.
      adds  2f97397   Another break.
      adds  cc31506   Killed some unneeded conditionals.
      adds  2e92ca8   2.99_04 release.
      adds  2d12922   2.99_05
      adds  6b00300   New release again.
      adds  09cb022   Blush!
      adds  29becee   Incremented version number.
      adds  684cb62   Implemented xml_mode.
      adds  591ee30   Implemented bool_attr_value
      adds  22da862   If no bool_attr_val is set, then it will take the value 
of the attribute.
      adds  bf83788   First revision.
      adds  a489e6e   Added Solaris hints to avoid gcc compilator bug.
      adds  febfe91   Inline decode_entities function.
      adds  74fc347   Updated todo.
      adds  1f05327   Load HTML::Entities.
      adds  62914f6   2.99_06 release.
      adds  21f3243   Rely on XS implementation of decode_entities_old.
      adds  2fcd66a   Integrated HTML-Parser-XS version 2.99_06.
      adds  9504886   Version 2.99_07
      adds  21e0637   Attribute values entities are now expanded in the start 
      adds  451071f   New bool attribute: decode_text_entities. Implemented 
access to all boolean parser attributes with a single aliased function.
      adds  455addc   Call the bool_parser_attr() function strict_comment() in 
order to avoid an extra version with ix = 0.
      adds  befc0c1   Got back old README text.
      adds  ae2f6ab   Updated bug section.
      adds  a5d933c   We got problems with ERROR.  Trying with FAIL instead:
      adds  c131626   Tweaks to make it compile with perl5.004_04 too.
      adds  224278c   Avoid calling SvREFCNT_inc() in void context (mostly).
      adds  4481d20   Make a copy of assigned 'bool_attr_val'.
      adds  9ce9ba8   Fix serious memory leak.  We allocated an SV for text 
content twice.
      adds  f4c7cb9   In xml_mode, don't report empty start tags with an extra 
parameter, but instead append an artificial endtag.  This end tag is marked 
special by having an orig_text argument which is empty.
      adds  fd092ff   Added line number counting as an option.
      adds  60abef9   Summarized _07 changes.
      adds  a5674bb   Make it compile on perl5.004_05.
      adds  5b0a3e5   Need to push references to PVAVs onto the accum array.
      adds  b516151   More newRV-fixing when pushing array elements into an 
      adds  e574f46   Implemented v2_compat flag.
      adds  34b2f00   Reply on $p->v2_compat to set up method callbacks.
      adds  3803d69   Implemented by taking advantage of $p->accum.
      adds  dc524e4   Also filter process instructions.
      adds  1bd0175   Moved to
      adds  599c447   Set up start-callback function instead of relying on 
method callbacks. Use (instead of URI::URL) for absolutizing URIs.
      adds  53e8e8d   Passing callbacks in ctor did not work (Need to try to 
set callbacks before trying plain attribute.)
      adds  e2f2865   Close file to make sure it is not empty..
      adds  70bd6e3   Warn if unlink($filename) fails.
      adds  b13ee56   Close filehandle before trying to unlink it.
      adds  ff667e6   close files.
      adds  1f56240   Better unlink warning.
      adds  72674d0   Don't catch exceptions when trying to call ctor key 
arguments as a method.
      adds  ecca1f2   Moved comment parsing out of html_parse_decl into its own 
      adds  006e0a4   Added a process instruction to the stuff.
      adds  2981299   Rely on the complete process instructions to be available 
is second argument.  Without this we would need special stuff in xml_mode.
      adds  e58fc57   Implemented 'default' handler.  All document text is 
passed to this callback when no other callback have shown any interest.  If 
accum is activated, then default will never be activated.
      adds  4037883   Summarized 2.99_08
      adds  eac7315   Grammar fixes by Michael A. Chase <>
      adds  12e3bbe   Added binmode() to test since it was done to the 
$p->parse_file method
      adds  c957465   Incremented version number to 2.99_09
      adds  b49fbc9   From: "John Hurst" <> Subject: tags with 
links for LinkExtor To: <> Date: Thu, 11 Nov 1999 09:31:06 +1300 
Reply-To: "John Hurst" <>
      adds  49ccf7c   close($io) as workaround for perl-close bug.
      adds  9da8ced   Some minor cleanup.
      adds  a8c4957   All specific parsing now delegated to parse functions.  
Simplifies html_parse a bit.
      adds  fdaa3ad   Select parse function by an array lookup instead of a 
series of if-tests.
      adds  a82ccc3   First revision.
      adds  6ee5aad   Set up dependecy for pfunc.h
      adds  8aea7b1   Added mkpfunc.
      adds  fc7c529   Use type 'bool' for boolean attributes in PSTATE
      adds  ae5005b   Added mkhctype.
      adds  7da3e98   #include "hctype.h"
      adds  67fdba7   First revision.
      adds  61698e9   Build "hctype.h"
      adds  0c4bada   Use hctype-macros to implement strict names.
      adds  f62c93d   Prepare for 2.99_09
      adds  5977782   Avoid \z which did not do the right thing for perl5.004
      adds  8bcc344   Avoid \z which don't work for perl5.004
      adds  51b9b0e   Better alpha release summary
      adds  30b63d6   2.99_10.
      adds  58552a3   Summarized 2.99_10
      adds  5e56c6e   The old POD is back.
      adds  0c8c718   Added documentation note.
      adds  2a88688   Parse <!> as an empty comment.  Hooks for marked_section 
      adds  bb412e4   Incomplete marked section support.
      adds  56c374c   Markde CDATA/RCDATA sections now work.
      adds  9d3fe62   Make marked section support deselectable.
      adds  6c89d3c   Don't leak any $@ messages.
      adds  0e10592   Be case insensitive when matching the end tag in 
      adds  f3ad6f0   2.99_11.
      adds  a808222   Added even more link tags as suggested by Sean M. Burke 
      adds  fa33b86   Complete marked section support.
      adds  6ae3831   Put magic number into the header of p_state.
      adds  4bfb7cf   Ask if marked sections should be there.
      adds  c2a4455   Implemented unbroken_text option.
      adds  964dd0a   Implemented attr_pos().
      adds  7e39aec   Gramar changes from Michael A. Chase.
      adds  30abe0c   Gramar fixes by Michael A. Chase.
      adds  ead3fc7   Text change.
      adds  8e2025c   Make attr_pos "work" for boolean attributes too.
      adds  c43e9cd   Report end of previous attribute/tag as first number for 
      adds  914e182   Callbacks are now set up with _cb suffix.
      adds  19bf184   For the constructor arguments, we now use _cb as suffix 
for those that are callbacks.
      adds  bc88bb9   pass_cbdata renamed to pass_self.
      adds  1f3b2ae   pass_cbdata renamed as pass_self
      adds  e50d243   Expanded TODO section.
      adds  ccaa58a   One more optimization to think about.
      adds  2b45fef   Summarized 2.99_12.
      adds  ef480ce   2.99_13
      adds  c0d8b7a   Gramar corrections by Michael A. Chase
      adds  3aeb8e0   Case insensitive yes.
      adds  dba106e   Documentation patch from Michael.
      adds  ec9f035   Various documentation updates.
      adds  7607ba3   More updates to documentation.
      adds  2f06b38   First revision.
      adds  c0bb160   First revision.
      adds  2a702cf   Test accum filling.
      adds  e0b7ed5   Added two new tests.
      adds  b6f3998   Make it possible to unset callbacks.
      adds  dea8270   First revision.
      adds  d846366   HCTYPE_NOT_SPACE_EQ_SLASH_GT 0x40 was not initialized.
      adds  bcb0bff   First revision.
      adds  6b5729f   Two more tests.
      adds  c03ec6f   Summarize 2.99_13.
      adds  ecee086   From: "Michael A. Chase" <> Subject: 
[PATCH]HTML::Parser-XS-2.9913_mac-1 To: "libwww" <>, "Gisle Aas" 
<> Date: Wed, 24 Nov 1999 19:38:53 -0800
      adds  d1483af   Some more todo.
      adds  62afd18   In perl5.004_05 we can't return PL_sv_undef safely.
      adds  e2cf2cb   Forgot a little detail.
      adds  1b7637a   Fixes by Michael A. Chase
      adds  d3abcf6   Documentation update by Michael A. Chase.
      adds  6a59e84   One more todo option.
      adds  0bb7d1f   Incremented version number.
      adds  0c41ef0   Prepare for 2.99_14.
      adds  de0db9e   Better warning if undefined document is passed in.
      adds  a9ea92b   First revision.
      adds  4af9067   First revision.
      adds  f6e4066   Renamed as tokenpos.h
      adds  b45e345   Added another .h file.  Made marked section support the 
      adds  c643086   First take at normalizing everything to call 
html_handle().  We still don't call it for E_START.
      adds  5865986   Now also html_parse_start() calls html_handle().
      adds  aa58179   Version 2.99_15
      adds  7c8b452   Added handler stuct array to pstate.  Replaced 
$p->callback and $p->accum with $p->handler.
      adds  fc64935   Basically set up callback loop.
      adds  f6e2762   Set up all basic arguments.
      adds  9592693   Trimmed out various boolean attributes.  The ones 
eliminated are:
      adds  a49b144   Implemented cdata argspec.
      adds  a7ca7b2   Updated TODO list.
      adds  86317eb   Killed all the routines that was replaced by 
      adds  d010f68   attrspec_compile()
      adds  08c5f14   Direct method calls.
      adds  8b4ae25   Added MAC to copyright notice.
      adds  5694886   New callback interface.
      adds  aef7aa3   token1 indentifier in attrspec don't use references as 
method names.
      adds  2dc5af0   Allow handler to be specified as an array of two values 
      adds  6637919   Look for MS_IGNORE in html_handle().
      adds  0d58634   New syntax.
      adds  d143cd9   Move to new syntax.
      adds  7865ad2   Better default handlers.
      adds  211b79c   Took out accum test.
      adds  4e13ac2   Fit with new way of doing things.
      adds  b6d2cac   Avoid reporting empty text segments.
      adds  bbbc0da   Set up our own accumulator array.
      adds  bc3b75a   Changed sequence of handler arguments.
      adds  2fac1da   Reversed order of $p->handler arguments.
      adds  afeacdb   Added tokenpos.h
      adds  084f106   2.99_15
      adds  dd703e1   We did copy from the wrong place.
      adds  e6203d4   First revision.
      adds  931de5e   Added largetags.
      adds  0a84940   Killed unused $a
      adds  15732d3   Support "event" in argspec.
      adds  665b220   2.99_16.
      adds  ae198c2   2.99_16
      adds  7265bba   Test with ">" after ms.
      adds  950516e   Documentation update from MAC
      adds  da6420d   MAC patch to support accumulator array in html_handle().
      adds  707d15e   version => 3 ctor option. Documentation update.
      adds  d362b0d   Artificial end tag should have empty origtext.
      adds  53aa8c3   Test that artificial end tag get empty origtext.
      adds  c7da5e0   api_version.
      adds  bebb78c   api_version => 3
      adds  134487f   api_version => 3.
      adds  34b0f82   Don't ask about marked sections any more.
      adds  abe38d3   Don't eat newline after "]]>"
      adds  a20ed1a   Fix some obvious memory leaks.
      adds  ab0b5d4   ]]> dont swallow "\n" any more.
      adds  daf6a29   2.99_17
      adds  2f5c728   "realloc" as parameter name created problems.  Fix by 
Paul Schinder <>
      adds  640097d   Patch from MAC that makes it into a real test.
      adds  aa090bb   Documentation patch from MAC.
      adds  a58c7fe   Working array dest.
      adds  2a28e9a   Use internal array-as-handler-destination-support.  Patch 
by MAC.
      adds  21efabd   Since we are faster we need longer speed test.
      adds  ace6600   Moved some functions out of Parser.xs
      adds  c53ff4c   Prettifying.
      adds  1b6a295   Added copyright
      adds  c7fbece   Dropped html_ prefix.
      adds  dd2577f   Update.
      adds  3401fd7   First revision.
      adds  cdd72a3   Moved stuff out of Parser.xs
      adds  2a55a91   More H files.
      adds  bcfe686   More stuff.
      adds  6172d8d   2.99_90
      adds  ed8bbff   Some attrspec renaming.
      adds  3285d4e   2.99_90
      adds  ef5bbcc   Minor spellfix.
      adds  906754c   beta now
      adds  f7587ee   Does not make sense in XS parser world.
      adds  48ba894   literal_mode_elem
      adds  b669e30   Moved literal_mode_elem to hparser.c
      adds  26c3953   Remove some commented-out code.
      adds  1e684a2   Documentation patch from MAC.
      adds  033a5ce   Updated it.
      adds  9fd5597   Reduce length of speed test.
      adds  502b5f1   Initial support for offset.
      adds  6423b02   pending_text gone.
      adds  6913607   Update.
      adds  f1e1d6b   Added offset. Removed pending_text. Some shuffling of 
fields in p_state.
      adds  5f349c0   Document offset.
      adds  382c757   Working "offset" in attrspec.
      adds  71bc81a   First revision.
      adds  b29df0d   Added offset.
      adds  ca836a8   Updated.
      adds  afdf3cd   2.99_91
      adds  48bf5ed   First revision.
      adds  e11730e   New case.
      adds  ba145d4   Added t/attrspec.t
      adds  50fc9c4   Doc patch from MAC.
      adds  e3844ed   One more.
      adds  fba8fd2   Typo fix by MAC.
      adds  89122be   Fix tokens reported in the artificial case.  Patch by MAC.
      adds  c5c532c   <a "> core dump.
      adds  d0f564d   First revision.
      adds  84dd1c7   Back out some more changes.
      adds  322f98a   Take out linepos
      adds  ceea58e   For boolean attributes would could get very strange 
values unless strict_names() were on.
      adds  40961cf   Bug tokens for artificial tag fixed by MAC.
      adds  33f7563   Update.
      adds  42e7bbc   Language fixes by Michael.
      adds  638d271   Documentation update from MAC.
      adds  d69b9cf   Minor layout fixes by MAC.
      adds  ae0c48c   Another DOC patch.
      adds  fef90e9   Don't make empty token/tokenpos arrays.
      adds  becb50c   Changed behaviour.
      adds  2ab9fc0   Renamed token1 as token0
      adds  e004567   av_extend() token/tokenpos arrays.
      adds  725e796   token0
      adds  5c0337f   For artificial end tag we don't report any tokenpos, but 
report tokens. Boolean attribute values are reported as 0,0 in tokenpos and in 
tokens we care about bool_attr_value.
      adds  ff8fb15   Update from me.
      adds  681bcc3   Rename bool_attr_value
      adds  2828698   2.99_92
      adds  1cbb1f8   Doc patch from MAC. _93.
      adds  7a454b2   Renamed attrspec.t as argspec.t
      adds  f17f7d7   Renamed attrspec as argspec.
      adds  e16f4bd   Introduced enum argspec_opcode.
      adds  d4bd443   Renamed opcode as argcode and OP_ as ARG_
      adds  1ac0ead   enum argcode
      adds  e720730   Nothing much.
      adds  192452f   First revision.
      adds  8d5e3cc   Renamed bool_attr_value as boolean_attribute_value
      adds  636553f   Added eg/hrefsub
      adds  d5a5321   Added a BUGS section.
      adds  077e53f   Updated.
      adds  5bb60c3   2.99_93
      adds  7587748   argspec length
      adds  ee5d508   _94
      adds  5a1340b   Documented literal string in argspec.
      adds  2b6d8cc   Off by one error when reporting literal end token.
      adds  9ef50a0   First revison.
      adds  0884e1e   shift2
      adds  8ddfa64   Added htext.
      adds  7d4b5b0   First revision.
      adds  c6764ed   Added t/exit-via-next.t
      adds  71921da   IGNORE.
      adds  5268c88   Argspec undef
      adds  88e66ad   First revision.
      adds  fc3263c   Added eg/hstrip
      adds  469d4cf   Doc patch from MAC.
      adds  0131c48   Typo fixes.
      adds  e2f4bfa   One more attrspec cusin.
      adds  15ce8b2   Simplified hrefsub by working right to left.  Patch by 
      adds  b72f733   Protect " inside $new_v
      adds  6194d36   Better fail message.
      adds  80a4e0e   Taken out debug stuff.
      adds  349ba76   Renamed cdata_flag as is_cdata
      adds  93f2b04   Updated.
      adds  fa338ea   Updated.
      adds  d2fff03   Added usage string.
      adds  d783105   Added short description of each file.
      adds  1760568   Need a statement after a label.  Fix pointed out by 
Matthew Langford <>.
      adds  52aeebe   Some more thoughts.
      adds  44c1161   MAC improvement (remove stuff from left)
      adds  58ccb20   A generic bug.  Don't test for it any more.
      adds  fd1bbcd   t/exit-via-next.t gone
      adds  c38ea19   if we killed all attributed, kill any extra whitespace too
      adds  2ac9baf   Some adjustments by MAC.
      adds  1b32a9b   Fix core dump.
      adds  a54634a   Simplified check_handler()
      adds  5b959a8   First revision.
      adds  c571455   Don't get double refcnt decrement if argspec_compile() or 
check_handler() croaks.
      adds  e0eefd8   Remove debugging output.
      adds  5bedc48   Allow h->argspec to be NULL in report_event()
      adds  a226358   Don't allow handler arguments to be grouped as an array 
reference. This created ambiguty when we used and array as handler reference.
      adds  6ed63a3   First revision.
      adds  ce0c373   Added two more tests.
      adds  76cdb14   Yet another update.
      adds  137b279   Statement that is not correct any more.
      adds  b3820e3   Documentation update.
      adds  0e47ea8   $self->{parse_file_stop}
      adds  5e6934e   Documented return value from $p->handler().
      adds  655a098   2.99_94
      adds  7225188   Doc patch from MAC.
      adds  3ab160d   Added <�� as test case.
      adds  9d6d2dc   A little more precision.
      adds  59152d1   First revision.
      adds  70526c6   Added a comment.
      adds  3fad039   Fix core dump reported by Doug MacEachern.
      adds  29412aa   First revision.
      adds  16b2a89   Test netscape_buggy_comment too.
      adds  61ee014   Test process too.
      adds  579c3e0   carp about netscape_buggy_comment instead of a warning.
      adds  16e4908   First revision.
      adds  17a8ec0   Note about depreciate state of this module.
      adds  6bf2447   Updated.
      adds  6055419   Updated again.
      adds  34e9a33   2.99_95
      adds  f0d4f77   Another update.
      adds  4fdb87f   _hparser.
      adds  8a3bb4a   Changed name of hash entry to _hparser_xs_state.
      adds  c0a19f6   Two more sections.
      adds  10066b1   First revision.
      adds  bdecab7   Make \\ reserved in argspec literals so we can use it as 
escape character later.
      adds  63e748f   More to go.
      adds  b868b8f   One more change.
      adds  864f8fb   Allow handlers to call $p->eof to abort parsing.
      adds  6474d9f   $p->eof in handlers is now supported.
      adds  dc36cd3   Updates to the examples.
      adds  5da9dbb   Handler $p->eof
      adds  20efd6f   First revision.
      adds  230f43e   Added many new tests.
      adds  7a6bdcb   Added header.
      adds  f1d4460   Various documentation and english tweaks from MAC.
      adds  d39e044   Don't use a Perl-hash for argspec any more.  Instead we 
simply use a static array.
      adds  9bca9bb   I also decided to take a swing at the IGNORE handler.  
Any false value (usually '' or 0) in the SV pointed to by h->cb triggers the 
      adds  647381c   Summarized 2.99_96
      adds  c498996   Minor tweak.
      adds  d5907cf   Yet another one of those useless tweaks.
      adds  424f8fc   Simplified.
      adds  c10d28a   Test patch from Michael:
      adds  2958990   Final POD tweaks from Michael.
      adds  f306528   3.00 and some minor doc tweaks.
      adds  84a3b09   Added MAC to Copyright messages
      adds  81e2527   Avoid calling method callbacks as options.
      adds  e8a83e8   Killed DISTNAME
      adds  eeb34ef   Make '3.00' a string.
      adds  3c93aaa   Removed beta blurb.
      adds  9d1302e   Added ANNOUNCEMENT
      adds  41ea75d   First revision.
      adds  101fa58   After ispell
      adds  f0245b9   Use "" instead of &ignore.  Patch by MAC.
      adds  ad64c08   One additional paragraph from MAC.
      adds  3c26488   After MAC hacking.
      adds  fa9ce3a   3.00
      adds  8e0c00c   3.00 ready.
      adds  d20c225   Assertion was backwards.
      adds  a7e2dbd   The hash function has probably changed so we need sorting 
to ensure sequence of attr keys.
      adds  fe84f54   Use ~-magic to trigger deallocation when IV that points 
to struct p_state goes away.
      adds  d0cf59d   3.01
      adds  138dee5   Summarized new stuff.
      adds  0a6392c   Tweaks before 3.01
      adds  d59e321   Added an "also"
      adds  1d13bba   Make _hparser_xs_state into a reference to the IV-pointer
      adds  4dde476   Adjusted because _hparser_xs_state is now a reference to 
the IV-pointer.
      adds  c3c1727   Introduced init(). Filled out DIAGNOSTICS.
      adds  28dabf1   Reuse earlier 'Not a reference to a hash'-message.
      adds  e85582e   3.02
      adds  b60ed7a   Rephrasing.
      adds  e0cf45b   First revision.
      adds  99d117c   Added comment parsing.
      adds  58c064c   2000 copyright.
      adds  a3537ad   Version 3.03 (new year)
      adds  5341167   Prepare for 3.03
      adds  fb73147   We did not get out of comment mode for comments ending 
with an odd number of "-" before ">".  Patch by la mouton <>
      adds  3990c0f   Try 3 dashes in a row.
      adds  857be89   Fixed marked_sections without an s
      adds  53d3dcb   Back out option checking patch by MAC.
      adds  65068bc   Kill documentation of init().
      adds  bfb93c5   Minor doc tweaks by me.
      adds  07adcd3   Backed out some of 3.03 patch.
      adds  de4702e   One more thing.
      adds  31e4833   Some typos fixed.
      adds  0d9fd28   xml_mode should prevent special treatment of <script>, 
      adds  4f1936f   Fix example.  Some more text.
      adds  cb224e4   Don't enter CDATA mode for some tags in XML mode.
      adds  b2d95bf   Don't enter literal_mode when XML mode is enabled
      adds  55e9585   No Literal mode for XML.
      adds  6647602   Special CDATA parsing for XML is gone now. Version 3.05
      adds  4305211   Moved HTML::Filter to Decpreciated section.
      adds  ab64ea8   Implemented unbroken_text.
      adds  dd5c0a8   Did not set is_cdata when we got out of outer level CDATA 
      adds  dd6ff05   Get the offset correct when alternating between 
      adds  2777600   Don't initialize handler before we have to.  I am still 
wondering about whether to put unbroken text before or after early return 
because no handler is there to get anything.
      adds  b11a63b   First revision.
      adds  c04823d   Also try <xmp>...</xmp>
      adds  a1ab160   Don't keep text unbroken between unreported tags. Offset 
was wrong for some text.
      adds  649629e   An extra newline...
      adds  6b73622   New test.
      adds  0bf18f8   Fix last test.
      adds  7f6b398   unbroken text done
      adds  b3f2862   3.05 soon ready.
      adds  cfe7289   require 3.00
      adds  354e9aa   From: James Walden <> Subject: 
Patches for building with xlc To: Date: Fri, 4 Feb 2000 
15:34:03 -0800 (PST)
      adds  d37f32a   First revision.
      adds  482ee77   First revision.
      adds  217273e   Fixed warning.
      adds  a5f160b   Avoid some "statement not reached" from picky compilers.
      adds  ba6a7a3   From: Doug MacEachern <> Subject: [PATCH 
HTML-Parser-3.05] v5.5.670 i686-linux-thread-multi To: cc: Date: Wed, 1 Mar 2000 13:38:32 -0800 (PST)
      adds  82a72ae   Version number is now 3.06
      adds  5c651b1   3.06.
      adds  c9f6075   Added eg/htextsub
      adds  72cbc1f   Typo.
      adds  75b749a   Fix for 5.004.  By avoiding OUTPUT: RETVAL we don't get 
sv_2mortal() called on &PL_sv_undef in $p->handler.
      adds  cce43f5   Incremented version number.
      adds  ac849f0   Copyright 2000.
      adds  53ce3bb   Only continue with declaration parsing when we find 
"DOCTYPE" or "ENTITY".  Based on patch by la mouton <>.
      adds  a1ecf83   First revision.
      adds  2440612   Added t/declaration.t
      adds  644964d   3.07.
      adds  0adb4db   First revision.
      adds  65ce3ca   A short comment.
      adds  03c6e2a   Added hanchor.
      adds  5e8fc82   Typo fix.
      adds  79f15c4   Fixed typo spotted by Jamie McCarthy <>.
      adds  6404161   Match typo fix in
      adds  cec1dca   Avoid access to freed() memory.
      adds  0817754   Version number is now 3.08
      adds  c5b7848   Changes for 3.08
      adds  004c3a8
      adds  a2615b0   Document that the $p->parse() argument should not be 
      adds  fe1abfb   Added a litle description of what 'token0' is for process 
and comment events.
      adds  561df94   Documentation update as suggested by Paul Makepeace 
      adds  c84d5e9   3.09
      adds  3aeba52   Make a mortal copy of the self argument passed to a 
handler. Avoid core dump if somebody clobbers the aliased $self argument of a 
      adds  2af94b1   Another change in 3.09
      adds  1b9d96d   More mortal copies.  SPAGAIN after flush_pending_text()
      adds  171cafa   3.10
      adds  1447c66   Typo.
      adds  ae11e8c   Get %linkElements from HTML::Tagset.
      adds  66efa03   Grab link data from HTML::Tagset
      adds  f3caa1c   3.11
      adds  308566f   Rely on HTML::Tagset
      adds  bdd7c83   Spelling patch from David Dyck <>
      adds  7e250e6   PREREQ_PM HTML::Tagset.
      adds  9c03cbf   3.12
      adds  3e200a9   3.12.
      adds  b5b4407   Get it to compile with "Optimierender Microsoft (R) 
32-Bit C/C++-Compiler, Version 12.00.8168, fuer x86". Based on patch by 
Matthias Waldorf <>
      adds  5eaea80   A change missing in the log.
      adds  847fda7   Set up UNICODE_ENTITIES.
      adds  efb5fdc   Deal with unicode entities.
      adds  6aec595   Copyright 2000
      adds  1149a79   Added unicode entities from HTML4.0.1 spec.
      adds  b081b03   Deal with numification.
      adds  76cfea8   Added uentities.
      adds  34d014a   Only 9 tests.
      adds  e808562   Check for overflow.
      adds  943e236   Better overflow check.
      adds  da08b11   Test overflow detection.
      adds  02b1beb   Avoid failure under unicode.
      adds  4542e34   Don't set UNICODE_ENTITIES if $] > 5.006.
      adds  61cf7ab   3.13
      adds  f1c364c   Prompt for -DUNICODE_ENTITIES
      adds  d7fe027   UNICODE_SUPPORT
      adds  5e68b2a   Don't test if UNICODE_SUPPORT is not enabled.
      adds  516b2c3   3.13
      adds  b3822f8   Fix infinite loop in case the handler triggered by ->eof 
actually called ->eof too.
      adds  0116e6c   Incremented version number: 3.14
      adds  33738f0   Allow declaration parsing to take place for lowercase 
<!doctype ...> and <!entity ...>.  In XML mode uppercase versions are still 
      adds  7bcde03   Release 3.14
      adds  bf5da9a   Escape new hash keys that happens to be perl keywords. 
perl-5.004 make a lot of noise about them otherwise.
      adds  b780e56   $p->get_tag() can now take multiple tag names to match. 
Updated documentation.
      adds  1aa27de   Test with multiple arguments to $p->get_tag
      adds  4865380   Really hide debugging code.
      adds  3276912   UTF8 entities has already been done.
      adds  6d316d6   Require 5.7.0 or better in order to offer "Unicode 
      adds  a23bfa9   Disable GET_CONTEXT for threaded perls because "we want 
      adds  d7b846b   Get out a few more dTHXs by passing context with pTHX_ 
and aTHX_
      adds  f409e8e   Release 3.15.
      adds  8aae91a   Document that HTML::Tagset is a PREREQUISITE.
      adds  c163335   Weaken then libwww-perl PREREQUISITE.
      adds  f211f3a   Deleted note about v2 compatibility.
      adds  41b6859   Use INT2PTR instead of cast directly between pointers and 
      adds  1456e83   Set up INT2PTR unless perl provide it.
      adds  2794b54   Version 3.16 and Copyright -2001.
      adds  28abf42   A few more ideas.
      adds  453f4b5   use strict
      adds  356ca57   unbroken_text now works across ignored tags.
      adds  9190a0c   unbroken text behaviour fixed.
      adds  ff7ebdd   Test one more range.
      adds  7047c06   Fix decoding of unicode entities.
      adds  a5090f1   Copyright 2001.
      adds  d5999e6   Always update size.
      adds  1593a10   Reindent.
      adds  eaab8fb   Added _decode_entities(). Reindent.
      adds  37f6348   Export _decode_entities()
      adds  616ddfd   Added t/entities2.t
      adds  532158b   Reindent.
      adds  bbc60ee   3.16
      adds  c1b1d9d   Forgot about pTHX_ from grow_gap().
      adds  52a1e76   Release 3.17.
      adds  7f82c9a   Removed ANNOUNCEMENT.
      adds  5ab6e80   C++ comment left over from debugging removed.
      adds  1d4012c   Release 3.18.
      adds  aa9f9ac   Use get_hv() as documented in perlapi.
      adds  414c555   Avoid global entity2char.  Patch by Sarathy. Version 3.19
      adds  7863287   Support @attr argspec.
      adds  7d4f6ba   Allow @{....} in argspec to signal flatting of array.
      adds  e84ae7c   Implemented ignore_tags/ignore_elements/report_tags
      adds  713cf01   Documents filter methods.
      adds  5ce391b   Added test for @attr and @{...}
      adds  c9f9f0a   Test new filter methods.
      adds  07c57e4   Renamed report_tags as report_only_tags.
      adds  af1c4af   Release 3.19_90
      adds  b987444   Allow array references passed into $p->ignore_tags.
      adds  d989c2d   Doc update about the effect on offset/length under 
      adds  f7f5274   The netscape_buggy_comment now gives mandatory warning 
about deprecation.
      adds  8d6fc30   Clear ignoring_element on eof.
      adds  91369c6   Simplify ARG_ATTR code a bit.
      adds  3d5b3f1   Simplify by using ignore_tags/ignore_elements.
      adds  8001840   No need for end_h
      adds  15f188b   Minor stylistic issue.
      adds  a0cb8a6   Simplify by using report_only_tags
      adds  6ec3e35   Optimize tag reporting.  Image text should not be array 
      adds  687759c   Doc tweak for report_only_tags()
      adds  84e5806   Version 3.19_91
      adds  a06ce57   User filters.
      adds  ad7bf22   Use filters.
      adds  63c1ebc   Make it possible to pass key/value arguments to the 
constructor. The extra info reported for tokens can be changed with *_args 
parameters. Only decode cdata text.
      adds  c7726a8   Attr needed for textify.
      adds  257bf6c   Introduced HTML::PullParser.
      adds  3d30138   Support parsing from doc => $str
      adds  c4558ba   Test HTML::PullParser
      adds  377e104   Reference HTML::PullParser instead of HTML::TokeParser.
      adds  6e95a5d   A clearer separation between 'doc' and 'file' parsing. 
Improved documentation.
      adds  d2e357c   Release 3.19_92
      adds  3ffda98   s/report_only_tags/report_only/
      adds  dce9a1b   Track unicode support as of perl@9359
      adds  21b90d5   Avoid sv_catpvf(sv, "%c",...) as it wants to upgrade the 
SV to UTF8 far to easily.
      adds  c3d68b8   Doc fix.
      adds  19c63a3   Release 3.19_93
      adds  f042574   Support "tag" argspec.
      adds  2039b52   Document "tag" argspec.
      adds  ac7b505   Prev patch broke lowercasing of tagnames.
      adds  2a806e8   Test "tag" argspec
      adds  604cb66   Example of PullParser usage.
      adds  5d5de2b   Doc update.
      adds  b91eec5   Implemented tracing of line and column numbers.
      adds  36e9b77   Column numbers was off by one.
      adds  ff2ff9b   Print line/column numbers instead.
      adds  8ee0d07   Test col/line.
      adds  4b0b742   Get offsets/line- and column- numbers correct when 
skipping marked section markup.
      adds  7f2741f   Release 3.19_94
      adds  dbb574a   Include description of HTML::PullParser.  Remove 
description of HTML::Filter.
      adds  e96a0bf   Ref hform example in doc.
      adds  ca38192   Release 3.20
      adds  cee9521   Don't promise any utf8 option.
      adds  a6a47a0   Avoid compiler warnings on some some compilers.  The DEC 
C said:   cc: Warning: hparser.h, line 42: Trailing comma found in enumerator 
list.   cc: Warning: hparser.c, line 55: Trailing comma found in enumerator 
list.   cc: Warning: hparser.c, line 612: In this statement, the referenced 
type       of the pointer value "buf" is "unsigned char", which is not 
compatible       with "const char".
      adds  d76633f   Fix memory leak in filters.
      adds  bc468b0   Optimize: Reuse the same SV for filtering by tagnames.
      adds  68d9086   Release 3.21
      adds  07b8496   Decode &apos;
      adds  83f1401   Parse <textarea> in literal mode, but not with is_cdata 
flag set.
      adds  3678082   Release 3.22
      adds  af08565   Moved filter testing code up a bit.  The ignore_elements 
filter did not get out of ignore mode if there was no end-event handler 
      adds  1b724eb   Release 3.23
      adds  24cfeff   Support parsing from code.
      adds  11b84e0   use strict.
      adds  b3a8d35   Added start_document and end_document events (as for SAX).
      adds  df4da1b   Implemented skipped_text argspec.
      adds  c1cb2cb   Fixed interaction between unbroken_text and skipped_text.
      adds  49f9c62   Implemented offset_end argspec.
      adds  0a68ae3   Doc update.  Release 3.24.
      adds  b7cfaba   Test offset_end.
      adds  a712a20   Release 3.24.
      adds  2f26b5f   Fix plaintext parsing.
      adds  42d1c59   <plaintext> fixed.
      adds  589004f   Some more state that was not reset on EOF.
      adds  90c25b6   perl5.004_04 did not have ERRSV
      adds  24da747   croak(0) was not present for 5.6.0
      adds  0f8b950   From: "Stephane Barizien" <> Subject: 
HTML::Entities 3.24 does not build on NT To: Date: Thu, 10 May 
2001 14:16:06 +0200 Organization: Oce-Industries
      adds  052fec7   Release 3.25
      adds  20bb42d   Don't encode \r as suggested by Sean M. Burke.
      adds  e612526   Make 'make clean' also clean up generated *.h files
      adds  f68e790   From: "Timur I. Bakeyev" <> Subject: Small 
bug in HTML::HeadParser To: Gisle Aas <> Date: Sat, 29 Dec 
2001 01:12:01 +0100
      adds  437cd3d   Another example program.
      adds  bb77d5f   Avoid warnings emitted by perl-5.7.3
      adds  643b33e   From: Guy Albertelli II <> Subject: 
attr_encoded case_sensitive patch To: CC: Gerd Kortemeyer 
<> Date: Thu, 7 Mar 2002 16:17:21 -0500 (EST) Reply-To:
      adds  c2cf5bd   Added a few tests.  Resorted.
      adds  eb1c492   More doc updates explaining C<case sensitive>
      adds  af13d66   Calling perl_call_* without G_EVAL always means trouble.
      adds  cb51cfb   Dont get fooled by an emtpy http-equiv
      adds  17a3bc5   We already had a RETHROW macro defined.
      adds  0737ec7   Release 3.26
      adds  d9c8ff6   First revision.
      adds  20e5043   Added eg/hlc to the example programs.
      adds  a4b3e5e   Typo spotted by Marc Lehmann <>.
      adds  fbd3467   Typo.
      adds  7046df1   From: "Sean M. Burke" <> Subject: 
HTML::Entities patch To: Gisle Aas <> Date: Fri, 17 Jan 2003 
23:13:52 -0900
      adds  51fb009   Test encode_entities_numeric
      adds  dc9fe52   Release 3.27
      adds  101835e   Fixed typo.  Spotted by Sean.
      adds  2e23db3   Pass context around instead of using dTHX;  This should 
be faster.
      adds  abfc8d2   Make <!454554> be treated as a comment unless 
strict_comment is enabled.
      adds  8d408c4   Version 3.28.
      adds  1a1db37   avoid Visual C warning.  Patch by
      adds  0b4cc41   Don't use the pfunc by default.  On Intel P4 that saves 
about 3000 bytes on the binary but there was no easy to measure speed 
      adds  56e9b9c   xml_mode implies strict_names also for end tags.
      adds  9e94d5a   64-bit fix from Doug Larrick <>
      adds  4501c22   Documentation patch: <textarea> is also literal mode.
      adds  d0571a0   MSIE compatibility stuff.
      adds  8bc90c1   Need <!-- for strange <script> behaviour to show up.
      adds  e1fd702   Allow crap in end tags as MSIE does.
      adds  bbe02ba   The name token name 'empty' was not good.
      adds  67326b1   Parse <! "<>"> as comment (MSIE compat).
      adds  21ad535   Implement 'strict_end' to control acceptance of junk at 
the end of end tags.
      adds  67ed643   Parse with <--comments> like this if we can't find the 
real thing.
      adds  14132c7   Release 3.29.
      adds  27fda3c   From: Steve Hay <> Subject: 
HTML::Parser 3.29 To: Date: Fri, 15 Aug 2003 14:56:39 
+0100 Organization: Radan Computational Limited
      adds  1071102   Avoid RETVAL warnings as reported by Steve Hay 
<> for MSVC++ 6.0 on WinXP:
      adds  c4745a3   Perl-5.7 should be gone by now.
      adds  e46723f   Better fix for the RETVAL warnings.  Use PPCODE for the 
parse functions.
      adds  5df47e4   Missing unicode support noted.
      adds  b71c6ba   Also PPCODify handler().  Fixed return value for eof().
      adds  9b30572   The assert() apparently needs my_perl so ignore it.
      adds  3a0d3e6   Documentation: Don't reference perl 5.7 any more.
      adds  52468f0   Release 3.30.
      adds  101f4f2   Release 3.31
      adds  4f51d9f   Stale stuff.
      adds  654ea79   If the document ends with "some kind of unterminated 
markup", then we did not clear the buffer.  The result is that this markup 
shows up in the beginning of the next document parsed.
      adds  00e0dd5
      adds  c30282a   Show skipped reason in the official way.
      adds  e81c27b   Updated documentation.
      adds  c7c6552   Include $Id$.
      adds  6c93cb4   Let the get_text() and get_trimmed_text() methods take 
multiple end tags as argument.  Based on patch by <>.
      adds  141426e   Document the </script> inside quotes case as a BUG.
      adds  2dc6fc5   Typo spotted by S Page <>
      adds  9672e9b   Apply patch (partly) from S Page <> 
that adds some comments.
      adds  a452209   Note that parsing of Unicode does not work yet.
      adds  8d422a4   Added dump script.
      adds  983b7f4   Release 3.32.
      adds  8f1d150   Implement get_phrase().
      adds  dad360c   Make get_text() expand most skipped tags to " "
      adds  e9607d8   We don't support 5.004 any more.  For some strange reason 
the new tokeparser tests fail to pass.  If there is a big outcry because of 
this I might redeside.
      adds  6ef0947   Release 3.33
      adds  c1d7882   Fix release date for 3.33
      adds  a5cd728   Avoid core dump when the stack get reallocated during the 
parse() call.
      adds  1fd5d8d   Added testcase for the stack realloc bug to the test 
      adds  24ebf15   Release 3.34
      adds  a7d5dee   No need to redeclare SP.
      adds  73adbc8   From: "Croome, Paul" <> 
Subject: A few patches for the POD in HTML::Parser To: 
"''" <> Date: Fri, 12 Dec 2003 
14:42:50 +0100
      adds  6875636   Release 3.35
      adds  e107397   When an attribute occurs use the first one in 'attr' 
instead of the last one.  This is apparently what MSIE and Mozilla do.
      adds  c5c1b06   Compute hash only once.
      adds  c834052   Release 3.36
      adds  740f633   Silence 'gcc -Wall' - the prev_token might be a real 
      adds  d3083c3   Time to ditch the v2 synopsis.
      adds  26f4905   Improve the handling of surrogate pairs.  Based on patch 
by <>.
      adds  9a612de   Match perl's rules for Unicode non-chars.
      adds  7e3c90a   Avoid temp modification of argspec strings. I don't think 
it really matters, but it is possible to imagine shared readonly SVs between 
threads. Patch contributed by <>.
      adds  0f09533   Must also upgrade chars after the gap.  Otherwise we 
might produce a badly encoded SvUTF8(sv).
      adds  03d6aa9   Release 3.37
      adds  e19ba13   Make closing of <plaintext> configurable. Contributed by 
Alex Kapranoff <>
      adds  255ac5e   Release 3.38
      adds  4b46eee   Typo.
      adds  124ec21   Parse <title> in literal mode.
      adds  ccaf5eb   Updated copyright year.
      adds  f548267   Make the UTF8-ness of strings parsed propagate. Patch by 
John Gardiner Myers <>.
      adds  e543479   Disable Unicode stuff for perl < 5.8.  I still want 
HTML-Parser to be compatible with these.
      adds  0882ace   Get offsets right for Unicode string.
      adds  9d927b3   Removed Unicode noop.
      adds  61a9944   Test Unicode parsing behaviour.
      adds  cd23c2c   Don't consider perl-5.6 Unicode capable.
      adds  76128a1   Release 3.39_90
      adds  7d73154   Usually there is only one <title>.
      adds  49f4e2b   Unicode basically done.
      adds  6f9a5a9   Convert to use
      adds  45bad29   Header is not done if we see the Unicode BOM.
      adds  21b8c01   Unicode is not supported.
      adds  4f8064d   Unicode BOM tests.
      adds  c7e3280   UTF-8 BOM warning only when Unicode is avalable.
      adds  ba81bf6   BOM tests.
      adds  fa469aa   Some behaviour seen in KHTML sources.
      adds  733eb2c   Implement quote behaviour for <script> tags. The 
behaviour is derived from behaviour seen in KHTML sources.
      adds  1039f1c   Test quote behaviour.
      adds  5a3466d   Propagate UTF-8-ness during flushing at eot.
      adds  52f7543   If literal tags are unterminated, flush them out with the 
text that follows and restart parsing.
      adds  7271e9b   Make Unicode BOM warnings optional and document them.
      adds  5a8e89b   This change was supposed to go somewhere else.
      adds  4ac0714   Document that these modules need decoded chars to parse.
      adds  05d9609   Release 3.39_91
      adds  82184fb   Some new MSIE comptibility issues.
      adds  cef2249   MSIE compatibility: Expand unterminated entities in 
'dtext' and expose the _decode_entities() routine.
      adds  f073533   Improve decode_entities() documentation.
      adds  c97d2c1   Tweaks.
      adds  57476e3   Simplify.
      adds  0569139   Test parsing of Unicode from file.
      adds  c35b6a1   Try to describe Unicode issues better.
      adds  359fe43   Added attribute 'utf8_mode'.
      adds  3979c0b   Sort documentation; boolean attributes, argspecs, events.
      adds  fc168e2   Test utf8_mode.
      adds  34bfa36   Fix utf8_mode semantics.  The entities are now decoded as 
      adds  4591a34   Release 3.39_92.
      adds  cfa9027   Simpler HTML link.
      adds  9f9ee31   Trigger UTF8 warning if anything in the first chunk looks 
like hibit UTF8.
      adds  cbb8192   The utf8_mode produce garbage for older perls.
      adds  876c70f   Least expensive tests first.
      adds  3388ec5   Release 3.40.
      adds  3d642c0   Make it work with perl-5.005
      adds  440940c   Release 3.41
      adds  c6d3c8a   Use push_header for all headers added.  Do not want to 
loose any values.  Better to duplicate fields.
      adds  5b2c0fa   Silence warnings from the HP C compiler about char/U8 
      adds  e333540   Typo in r2.26
      adds  36fff43   Avoid sv_catpvn_utf8_upgrade; make us perl-5.8.0 
compatible. Patch by Reed Russell <>.
      adds  88f32c4   perl-5.8.0 does not have utf8::is_utf8.
      adds  6f5dc1d   Release 3.42.
      adds  859a842   Fix test failure on Windows.
      adds  9e8cea7   Forgot to set repl_utf8 flag which might lead to utf8 
corruption. This showed as test failure with native compilers on Windows, HP-UX 
and Solaris.
      adds  6276749   Release 3.43
      adds  da7e4da   Fix the handling of quoted strings.
      adds  d6009eb   Release 3.44.
      adds  1c3b2d0   Fix stack leak. Patch contributed by Gurusamy Sarathy 
<>. From ActiveState p4 change #125001.
      adds  87e8f54   Release 3.45.
      adds  ec1d534   Explain affected code.
      adds  7b8850b   From APEE build log with the HP native C compiler. 
HTML-Parser:3876: Warning 430: "Parser.c", line 604 # The variable 'RETVAL' is 
never initialized. HTML-Parser:3879: Warning 11010: Exact position unknown; 
near ["XS_HTML__Entities__probably_utf8_chunk", line 614]. # ["Parser.c", line 
614:18 XS_HTML__Entities__probably_utf8_chunk] Uninitialized variable 'RETVAL'
      adds  fba0404   Fix typo spotted by Stefan Funke <>.
      adds  aed1a26   From: Norbert Kiesel <> Subject: 
Re: HTML::Parser: how can I reset report_tags to report all tags? To: Gisle Aas 
<> Cc: Date: Sun, 19 Jun 2005 16:36:38 
-0700 Organization: TBD Networks
      adds  73785be   Test pod correctness and fix up missing =back.
      adds  25a4576   use strict;
      adds  fda1b7e   Don't treat 0xA0 as space, since it's not really and XML 
agrees. This also creates problems when parsing UTF-8 which is how it supports 
Unicode (
      adds  6a657d7   Try parsing of \x0420.
      adds  ef63a22   Release 3.46
      adds  e627259   From: Norbert Kiesel <> Subject: 
Re: HTML::Parser: how can I reset report_tags to report all tags? To: Gisle Aas 
<> Date: Tue, 21 Jun 2005 11:57:27 -0700 Organization: TBD 
      adds  6a7ec2a   Make unbroken_text the default for HTML::TokeParser.
      adds  137b1ad   Silence all the diag noise.
      adds  7113f57   Skip blocks needs to be called SKIP for it to work.
      adds  8782b89   perl-5.8.0 is just too buggy for HTML-Parser.
      adds  7161c7c   Faster load time with XSLoader.
      adds  774eda0   Make the source ASCII only.
      adds  31dad60   Better use of Test::More.
      adds  d41828f   An explicit binmode() make this test pass with perl-5.8.0
      adds  5da6eb1   encode &apos by default.
      adds  289196d   Make tests pass for perl-5.6.
      adds  81d8e9c   It seems to work with perl-5.8.0 now.
      adds  f562c5a   Typos.
      adds  9475423   Add empty_element_tag and xml_pic attributes.
      adds  d7ef967   xml_pic has been added
      adds  46fe801   Need to look for '/>' in more places when strict_names 
isn't enabled.
      adds  991e983   Make empty_element_tag default on for HTML::TokeParser.
      adds  d792aac   Documentation tweaks.
      adds  86daece   Add some empty elements tests.
      adds  acf1523   Rename as empty_element_tags (with s)
      adds  74d789d   Release 3.47.
      adds  5782978   Test empty_element_tags/xml_pic.
      adds  dad791d   Fix typo.
      adds  b1fd168   Don't enable empty_element_tags by default.  It breaks 
HTML::Form :(
      adds  26cd626   Adjust token counts now that empty_element_tags is not 
the default.
      adds  fba150f   marked_sections omit first 3 bytes "<!["  from 
      adds  0e6a426   perl 5.6 is required.
      adds  f1ec99b   Release 3.48
      adds  031ac9c   First revision.
      adds  c4065e2   Events could still fire after a handler has signaled eof.
      adds  3ba63e6   marked_sections with text ending in square bracket parsed 
      adds  4b43bac   Release 3.49.
      adds  41bc0e9   Updated copyright year.
      adds  3d7e92e   From: Steve Hay <> Subject: [PATCH] 
Fix code-before-declaration error with VC++ in HTML-Parser-3.49 To: Gisle Aas 
<> Date: Tue, 14 Feb 2006 16:25:16 +0000
      adds  5433201   Release 3.50.
      adds  88ad35b   Typos spotted by
      adds  b519469   Improved MSIE compatibility.  Only the Latin-1 entities 
expand without the trailing semicolon.
      adds  4e99699   First revision.
      adds  e222f07   More tests.
      adds  371d20e   One more ref.
      adds  02ef206   Updated documentation.
      adds  fce3e20   Release 3.51.
      adds  e5cfceb   Typo fixes are also in 3.51.
      adds  e00adee   Bye.
      adds  2609435   Add some results.
      adds  c95d5aa   Link to
      adds  10c3c37   Added HTML-Parser to the result table.
      adds  954fc52   Safari results.
      adds  6a0bd24   Documentation typo fix.
      adds  ac2d477   Make sure 'start_document' is triggered exactly once per 
      adds  91e3ee5   Documentation tweaks.  Recommend empty_element_tags.
      adds  9240d53   Documentation typo fixes.
      adds  d20a679   Release 3.52.
      adds  aff7ddd   ignore_element treated </script> like <script>.
      adds  7bfa2d1   Release 3.53.
      adds  cf7b0dc   Enabling of empty_element_tag interacted badly with 
literal mode. Fixes
      adds  8a2a526   Release 3.54.
      adds  3d41f39   Yaakov Belch was responsible for release 3.53 and 3.54.
      adds  b9b9835   Test that empty_element_tags works for <script/> too.
      adds  a43519a   Consider <!a'b> a comment by itself. Feedback from the 
AntiSpam guys at Sophos.
      adds  eabafaf   From: Gisle Aas <> Subject: Re: 
Autoclose for <script> and <style> in HTML::Parser Newsgroups: 
gmane.comp.lang.perl.modules.lwp Cc: Date: 09 Jun 2006 01:50:00 
      adds  8875e60   Treat <> at end as text.
      adds  59c57ce   Test <!a'b> comments.
      adds  ccfa79b   Release 3.55.
      adds  ada5e9c   Support threads cloning.  Contributed by Bo Lindbergh.
      adds  0c9324a   New test file.
      adds  6efeaa8   Release 3.56.
      adds  6db8378   Restore perl-5.6 compatiblity.
      adds  ffa28c5   New year.
      adds  b7f6dbc   Remove debug printout.
      adds  b8aef93   State Test::More dependency.
      adds  bf85099   Don't require whitespace between declaration tokens. 
      adds  e58f986   Extra plaintext test from Alex Kapranoff 
      adds  912ae95   Alex Kapranoff claims the closing_plaintext behaviour 
only occured in Firefox 1.0.  <>
      adds  ee9b1d4   Implement backquote() attribute as requested by Alex 
Kapranoff. <>
      adds  3fc553b   Start using GIT to track the sources.
      adds  2661287   Patch by CHORNY that provide compatibility with older 
      adds  7e60bae   Recognize the </script> and </style> end tags even if 
      adds  e5e4055   Parse the <iframe> content in literal/CDATA mode.
      adds  fdab46a   Release 3.57
      adds  138d548   Recognize the Unicode BOM in utf8_mode as well [RT#27522]
      adds  8cc4600   Avoid ending up with '/' keys attribute in Link headers.
      adds  da2490b   Suppress "Parsing of undecoded UTF-8 will give garbage" 
warning with attr_encoded [RT#29089]
      adds  32a48ac   Don't hardcode source line numbers [RT#38114]
      adds  d04a2ee   Release 3.58
      adds  6f67ef9   Restore perl-5.6 compatiblity for HTML::HeadParser
      adds  04cc7e1   Tell git to ignore the dist tarballs
      adds  f21f13e   Update for GIT and other tweaks.
      adds  baf34a6   More meta info
      adds  2c2bca3   Release 3.59
      adds  a408fd2   Spelling fixes.
      adds  6fda22b   Test multi-value headers.
      adds  4af036e   Documentation improvements.
      adds  9056415   Do not terminate head parsing on the <object> element 
(added in HTML 4.0).
      adds  06f4603   Add support for HTML 5 <meta charset> and new HEAD 
      adds  ca6ece6   HTML::Parser doesn't compile with perl 5.8.0.
      adds  9946fcf   Short description of the htextsub example
      adds  2b5088d   Suppress warning when encode_entities is called with 
undef [RT#27567]
      adds  a540419   Release 3.60.
      adds  bbe0e91   Avoid crash (referenced pend_text instead of skipped_text)
      adds  b893efb   Test that triggers the crash that Chip fixed
      adds  1347607   Reference HTML::LinkExttor [RT#43164]
      adds  2df45a1   Complete documented list of literal tags
      adds  71cfecd   Release 3.61
      adds  a8d27fb   Avoid "my" variable $p masks earlier declaration warning 
from test
      adds  0423689   HTTP::Header doc typo fix.
      adds  2309028   Do not bother tracking style or script, they're ignored.
      adds  4429d49   Bring HTML 5 head elements up to date with 
      adds  7a85e26   Doc patch: Make it clearer what the return value from 
->parse is
      adds  32851f1   Improve HeadParser performance.
      adds  e002d1d   Update TODO list
      adds  f397f25   Release 3.62
      adds  6e91cf4   Take more care to prepare the char range for 
encode_entities [RT#50170]
      adds  b9aae1e   decode_entities confused by trailing incomplete entity
      adds  ddbac59   Release 3.63
      adds  2d1a720   Convert files to UTF-8
      adds  6acae76   Don't allow decode_entities() to generate illegal Unicode 
      adds  914183b   Copyright 2009
      adds  22b36c2   Remove rendundant (repeated) test
      adds  ea81f5e   Make parse_file() method use 3-arg open [RT#49434]
      adds  d30f3be   Release 3.64
      adds  f3cfa07   Fixed endianness typo [RT#50811]
      adds  f1f22f5   Documentation fixes.
      adds  e7b9431   Eliminate buggy entities_decode_old
      adds  f7f7eb7   Release 3.65
      adds  d53f98d   Fix entity decoding in utf8_mode for the title header
      adds  3786975   Release 3.66
      adds  b575543   bleadperl 2154eca7 breaks HTML::Parser 3.66 [RT#60368]
      adds  d8a6e70   chmod +x [RT#58016]
      adds  7b0848b   Release 3.67
      adds  7b355e3   Declare the encoding of the POD to be utf8
      adds  f126b4e   Release 3.68
      adds  0087ee1   Trim surrounding whitespace from extracted URLs.
      adds  1e1fdec   Documentation fix; encode_utf8 mixup [RT#71151]
      adds  5e4e410   Make it clearer that there are 2 (actually 3) options for 
handing "UTF-8 garbage"
      adds  fff22c7   Github is the official repo
      adds  5978308   fix to TokeParser to correctly handle option configuration
      adds  0a6fa6b   Aesthetic change: remove extra ;
      adds  f38f8f1   Can't be bothered to try to fix the failures that occur 
on perl-5.6
      adds  f11ee5b   Release 3.69
      adds  736d02c   Comment typo fix
      adds  af4a57e   Fix for cross-compiling with Buildroot
      adds  fb77cfa   Fix Issue #3 / RT #84144: 
HTML::Entities::decode_entities() needs to call SV_CHECK_THINKFIRST() before 
checking READONLY flag
      adds  6227fd9   Release 3.70
      adds  a8f8283   Transform ':' in headers to '-' [RT#80524]
      adds  4bc4979   Release 3.71
      adds  15d7713   typo fix
      adds  1a1f83d   typo fixes
      adds  5184435   Merge pull request #6 from dsteinbrunner/patch-1
      adds  0a35724   Merge branch 'master' of
      adds  59d39b5   Silence clang warning
      adds  c689e39   Avoid more clang casting warnings
      adds  00139ae   const+static-ing
      adds  4fe7b98   Remove trailing whitespace
      adds  ac31d36   Ensure entities expand to utf8 sequences under 
'utf8_mode' [RT#99755]
      adds  c50db95   Copyright 2016
      adds  ce42c7b   Release 3.72
      adds  295ddd7   Imported Upstream version 3.72
       new  0091495   Merge tag 'upstream/3.72'
       new  6053043   Update debian/changelog
       new  09b5b96   Refresh patches
       new  89db6b7   Update year of upstream copyright
       new  518bb1c   Update packaging copyright
       new  c5773d2   Releasing libhtml-parser-perl version 3.72-1

The 6 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "adds" were already present in the repository and have only
been added to this reference.

Summary of changes:
 Changes                                       | 19 +++++++++
 META.json                                     |  4 +-
 META.yml                                      |  6 +--                                     | 14 ++-----
 Parser.xs                                     | 12 +++---
 README                                        |  2 +-
 debian/changelog                              | 10 +++++
 debian/copyright                              |  3 +-
 debian/patches/debian_examples_location.patch |  2 +-
 debian/patches/example_selfdocs.patch         |  4 +-
 hparser.c                                     | 58 ++++++++++++++-------------
 hparser.h                                     |  5 +--
 lib/HTML/                            |  4 +-
 t/unicode.t                                   | 18 ++++++++-
 14 files changed, 101 insertions(+), 60 deletions(-)

Alioth's /usr/local/bin/git-commit-notice on 

Pkg-perl-cvs-commits mailing list

Reply via email to