This is an automated email from the git hooks/post-receive script. kanashiro-guest pushed a change to branch master in repository libhtml-parser-perl.
from 2c2921f releasing package libhtml-parser-perl version 3.71-2 adds c9940d5 First revsion. adds a06b717 Fake-compile regexps using anonymous subs. More documentation. adds 520b920 Removed trailing whitespace and unexpanded the text (replaced initial space with tab where possible.) adds 3e56ac9 Fixed copyright message. adds 4a0c7ac Moved from ../base adds fac3ed4 Avoid quotes in hash key. adds aeb6d0b First revision. adds f61c1ec Added test based on RFC1866 adds 3d86f12 Included additional ISO-8859/1 entities listed in rfc1866 (section 14). More comments. adds c1abfc7 Typo fix by Bob Dalgleish <bob_dalgle...@develcon.com> adds ae62354 First version. Posted on the mailing list 1996-07-08. adds 87a6568 Clear links when calling parse_file(). adds aaadedc Parse <link> attributes in head. Renamed header Base: to Content-Base: to be compatimble with HTTP/1.1. adds 7f4f938 Slightly better documentation. adds 0027ad6 Renamed Base: to Content-Base: and Implemented Link: adds 1fab0b1 First revision. adds 9f73735 Got Ambiguous use of {links} resolved to {"links"} adds 49b715e Added support for <embed src="..."> as suggested by Hans de Graaff adds a1e34b8 Added <frame src="..."> to the things recognized adds 6e95e54 Added an example to the documentation. adds b4823db Added test to check that the links method work when there are no links in the parsed document. adds 09552bf Avoid 'Can't use an undefined value as an ARRAY reference message' when no links are found in the document. adds 9e095d5 Must escape literal $ in regular expression. adds d13a8a8 $p->eof instead of $p->parse(undef) adds c615f75 Support netscape_buggy_comment() and implement the eof() method. adds 3a961f8 Added two new start() parameters; $attrseq and $origtext. adds 5472d51 First revision. adds 41442e4 Allow "_" in attribute names since Netscape really use this in their bookmarks.html adds 6843a65 Initialize from all <meta> as X-Meta-Foo adds f39f6db Parser was very confused about "</" when it did not start an end tag. adds d861a10 $p->links now truncates the list. adds 4d8587c Added SYNOPSIS to all libraries since perl5.003_97 warns if it is not present. adds d26df9b Updated the documentation. adds bfe5f10 Only modify arguments in void context. Requires 5.004 adds 6a90d09 Doc bug spotted by Martijn Koster adds 20eaee6 Know about <applet code=URL>. Patch from Daniel V Klein <d...@lonewolf.com> adds c15026e Check for Bill Simpson-Young's problem. adds bdcb447 Might introduce ";" for things that look like entities but is not. Reported by Bill Simpson-Young <bill.simpson-yo...@cmis.csiro.au> adds 00def24 Documentation update. adds 0fa9bae =head2 replaced by =item adds 671fe46 Reformatting by Martijn. adds f8b44fc Replaced netscape_buggy_comment() with strict_comment(). Documentation update. adds 6200d9f Pass original text to end() method. Patch by Brian McCauley <b.a.mccau...@bham.ac.uk>. adds 0d321af First revision. adds 6060a3c Added documentation. adds 1861f51 Fix TableStripper example bug. adds 37c7810 First revision. adds a7ac97c Optimized by moving lookup of !$self->{'_strict_comment'} out of the parser loop. I got a 5% speedup by this. adds 301b665 Document how chuck size influence efficiency. Reduce chunk size in parse_file(). adds aafe0c0 Special case for plain start tags give 2.5% speed up. adds cdcad86 Use last instead of return to get of the the while-loop in parse(). adds bcace72 Added a BUGS section. adds da6c9ff Added $VERSION. adds 619498f use strict; adds 02f1974 Don't call the text() method with zero length text any more. adds d15b6e6 First revision. adds bc54567 Increment version number. adds 8549602 First revision. adds 2f017b5 Added Changes. adds ecbcc0a Added some more real content. adds 7addd67 New (more interesting) date. adds 0b06822 First revision. adds 6c83110 Splitted test based on wheater URI::URL is available or not. adds 28cca7a Only make the URI::URL module required if a $base URL is given to the constructor. adds c20ebc1 Make it work even without HTTP::Headers installed. Documentation update. adds d9ec2ab Provide our own header object implementation. Does not depend on HTTP::Headers any more. adds 5739159 First revision. adds 839d89e Make it work better. adds 4413674 New tests. adds b119720 Documentation flikking. Increment version number. adds 8f8cd20 2.15 changes. adds 5a59b70 Typo. adds 9e01a15 Tweaks. adds 99e3b2f Used to be called parser.t adds 3554904 Replaced with a real test. adds 3ec15d1 Some more HTML. adds 13eb5d4 Broke HEX entities ÿ adds 951ac50 The old t/parser.t is now t/cases.t adds af954c4 Always clean up tmpfile. adds 05813f8 Make it release 2.16 instead. adds 5b32205 Updated manual page. adds 78b7dd2 Never split words (a sequence of non-space) between two invocations of $self->text. adds 8596106 2.17 adds a188fd1 parse_file now use smaller chunks. adds 652da52 Document smaller chunk. adds 8c132c0 Incremented version number (sub-modules changed). adds 8436834 Make it better subclass-able by calling $self->_found_link each time a new link is found. The default implementation of _found_link will call a callback or add links to $self->{'links'}. adds 259d299 Provide a parse_file method that cares about the return value from $self->parse. adds c986edb Test $p->parse_file method adds c509339 Documentation fix. adds 1beb530 2.18 changes. adds f80c12d Don't leave space and end of chunk when trying to avoid breaking words. adds 2370229 2.19 adds 2e3c549 First revision. adds 250f8d8 Added HTML::TokeParser adds d1a60b6 Much more stuff. adds 075e0d7 Reference to TokeParser adds 333610c tokeparser.t adds 6ad3097 First revision. adds 63a2742 Added documentation. adds ecc12a3 2.20 adds 2028576 Added Author address adds 6822a32 Updated with new manual page. Mention HTML::TokeParser. adds 84e09d7 More tests. adds 1855a64 Support reading from plain strings and from globs. adds 4f9fe5f Netscape comment patch by Peter Orbaek <p...@daimi.au.dk>. adds cbffd18 2.21 adds 6ccd81d Protect eval from $SIG{__DIE__} adds 09cf24f 2.22 adds dca87c0 Incremented version number. adds 6500748 Removed wrong expired address adds e4a4253 Various spell fixes. adds b5b2377 Fixed my email address. adds e2610b8 Documentation update. adds b8c33a5 New year. adds 3037ade Incremented version number. adds dd7036e 2.23. adds 43e80d9 From: Clinton Wong <clin...@netcom.com> Subject: HTML::LinkExtor patch To: gi...@aas.no Date: Tue, 29 Jun 1999 14:02:31 -0700 (PDT) adds 40f48a3 Better recognition of GLOBs in parse_file(). adds 5e63027 Added t/parsefile.t adds 82a8d37 First revision. adds 8734107 Test parsing of large inline documents too. adds 652e7e4 More efficient parsing of large inline documents. adds 9911265 Don't die just because the filename passed to $p->parse_file() can't be opened. adds fa195ca Document that the scalar passed to the constructor must stay the same during parsing. adds 605ce2e Get rid of the file in the end. adds a535c93 Documentation update. adds 34bb597 Updated mailing list address. Removed formatted HTML::Parser manpage. adds 8d3f59b Get rid of $Id$ line again. adds 0751bcc Summarized 2.24 adds 28aa662 Asjustment of parse_file() change description. adds bce2144 First revision. adds ea58d53 -Wall adds d5eb670 End tags are recognized. adds 4fff0df Recongnize processing instructions. adds f0f0686 Beginning of declaration and comment matching. adds 4ed8f5b Parse declarations. adds b6eac7c Parse start tags too. adds cdb2901 Push PL_sv_yes adds 28ece90 More testing. adds b842c98 Free memory assosiated with tokens arrays for premature and error parsing. adds 6fe8b7b Bye. adds e60f1d5 Updated. adds caf3f9a First revision. adds b80576d Makefile adds 98d25c3 Set DISTNAME. adds b1e3155 First revision. adds 2f54742 Added some real XS glue. adds d1e556d Small adjustments. adds f3009af Real callbacks for text and end tags. adds 389c799 Added copyright notice. adds 09390c1 Added rest of callbacks. adds 0a27464 Set up method callbacks. adds 20eaae0 strict_comment(). A few small tweaks. adds 7b4e642 Callbacks now get a reference to the parser object as 1st argument. adds ce1fbd2 Keep white space together. adds 7d634f4 Make test compatible with HTML::Parser 3 which have its own DESTROY method. adds 4e8d89b New parse_file() implementation to keep in sync with HTML::Parser's method. adds 29bf96c Some tweaks here and there. adds 1b41f5b Attribute keys are now already lowercased adds 94578bb Reduction. adds c3fd0f3 pass_cbdata() adds e830513 pass_cbdata boolean adds cdeb9a5 Added typemap. adds 674bd40 First revision. adds 33aa362 Added README adds 7132db8 Also set up processing instructions. adds ca097df Incremented version number. adds 2e775fd Implemented strict comments. adds f0c2ea2 Implemented keep_case option. adds a6c4ee1 Added accum attribute. adds e18e983 Fill accum array as various tokens are found. adds 48983ab Incremented version number again. adds b815650 Allow ':' in identifiers (isHALNUM). adds f49fdee Allow ":" in attribute names because it is used by Microsoft. adds b6ec709 Version 2.25 adds f87a6a4 Don't print filtered any more. adds c2629e8 Check for $self->{parse_file_stop} adds 6de9c1a Avoid parse_file() duplication. adds 41bc120 Summarized 2.25 changes. adds 2daa27d Minor detail. adds 152aaf1 First revision. adds 866b5a8 Look for $self->{parse_file_stop} in $self->parse_file loop. adds dffb25f Added lib files and t files. adds 1a37f26 <XMP>...</XMP> support. adds fc1f387 <xmp> support. adds e6c340e Increased version number again. adds 92c4f4a Replaced <xmp> support with the more general literal_mode. This allow us to parse <script>/<style>/<xmp> better. adds 1036a7d Added TODO list. adds ee10358 We did not get out of literal mode as we should. adds 6db0714 Another todo item. adds a527a14 More todo things. adds 2f97397 Another break. adds cc31506 Killed some unneeded conditionals. adds 2e92ca8 2.99_04 release. adds 2d12922 2.99_05 adds 6b00300 New release again. adds 09cb022 Blush! adds 29becee Incremented version number. adds 684cb62 Implemented xml_mode. adds 591ee30 Implemented bool_attr_value adds 22da862 If no bool_attr_val is set, then it will take the value of the attribute. adds bf83788 First revision. adds a489e6e Added Solaris hints to avoid gcc compilator bug. adds febfe91 Inline decode_entities function. adds 74fc347 Updated todo. adds 1f05327 Load HTML::Entities. adds 62914f6 2.99_06 release. adds 21f3243 Rely on XS implementation of decode_entities_old. adds 2fcd66a Integrated HTML-Parser-XS version 2.99_06. adds 9504886 Version 2.99_07 adds 21e0637 Attribute values entities are now expanded in the start callback. adds 451071f New bool attribute: decode_text_entities. Implemented access to all boolean parser attributes with a single aliased function. adds 455addc Call the bool_parser_attr() function strict_comment() in order to avoid an extra version with ix = 0. adds befc0c1 Got back old README text. adds ae2f6ab Updated bug section. adds a5d933c We got problems with ERROR. Trying with FAIL instead: adds c131626 Tweaks to make it compile with perl5.004_04 too. adds 224278c Avoid calling SvREFCNT_inc() in void context (mostly). adds 4481d20 Make a copy of assigned 'bool_attr_val'. adds 9ce9ba8 Fix serious memory leak. We allocated an SV for text content twice. adds f4c7cb9 In xml_mode, don't report empty start tags with an extra parameter, but instead append an artificial endtag. This end tag is marked special by having an orig_text argument which is empty. adds fd092ff Added line number counting as an option. adds 60abef9 Summarized _07 changes. adds a5674bb Make it compile on perl5.004_05. adds 5b0a3e5 Need to push references to PVAVs onto the accum array. adds b516151 More newRV-fixing when pushing array elements into an array. adds e574f46 Implemented v2_compat flag. adds 34b2f00 Reply on $p->v2_compat to set up method callbacks. adds 3803d69 Implemented by taking advantage of $p->accum. adds dc524e4 Also filter process instructions. adds 1bd0175 Moved to URI.pm adds 599c447 Set up start-callback function instead of relying on method callbacks. Use URI.pm (instead of URI::URL) for absolutizing URIs. adds 53e8e8d Passing callbacks in ctor did not work (Need to try to set callbacks before trying plain attribute.) adds e2f2865 Close file to make sure it is not empty.. adds 70bd6e3 Warn if unlink($filename) fails. adds b13ee56 Close filehandle before trying to unlink it. adds ff667e6 close files. adds 1f56240 Better unlink warning. adds 72674d0 Don't catch exceptions when trying to call ctor key arguments as a method. adds ecca1f2 Moved comment parsing out of html_parse_decl into its own procedure. adds 006e0a4 Added a process instruction to the stuff. adds 2981299 Rely on the complete process instructions to be available is second argument. Without this we would need special stuff in xml_mode. adds e58fc57 Implemented 'default' handler. All document text is passed to this callback when no other callback have shown any interest. If accum is activated, then default will never be activated. adds 4037883 Summarized 2.99_08 adds eac7315 Grammar fixes by Michael A. Chase <mch...@ix.netcom.com> adds 12e3bbe Added binmode() to test since it was done to the $p->parse_file method adds c957465 Incremented version number to 2.99_09 adds b49fbc9 From: "John Hurst" <jbhu...@ibm.net> Subject: tags with links for LinkExtor To: <lib...@perl.org> Date: Thu, 11 Nov 1999 09:31:06 +1300 Reply-To: "John Hurst" <jbhu...@ibm.net> adds 49ccf7c close($io) as workaround for perl-close bug. adds 9da8ced Some minor cleanup. adds a8c4957 All specific parsing now delegated to parse functions. Simplifies html_parse a bit. adds fdaa3ad Select parse function by an array lookup instead of a series of if-tests. adds a82ccc3 First revision. adds 6ee5aad Set up dependecy for pfunc.h adds 8aea7b1 Added mkpfunc. adds fc7c529 Use type 'bool' for boolean attributes in PSTATE adds ae5005b Added mkhctype. adds 7da3e98 #include "hctype.h" adds 67fdba7 First revision. adds 61698e9 Build "hctype.h" adds 0c4bada Use hctype-macros to implement strict names. adds f62c93d Prepare for 2.99_09 adds 5977782 Avoid \z which did not do the right thing for perl5.004 adds 8bcc344 Avoid \z which don't work for perl5.004 adds 51b9b0e Better alpha release summary adds 30b63d6 2.99_10. adds 58552a3 Summarized 2.99_10 adds 5e56c6e The old POD is back. adds 0c8c718 Added documentation note. adds 2a88688 Parse <!> as an empty comment. Hooks for marked_section implementation. adds bb412e4 Incomplete marked section support. adds 56c374c Markde CDATA/RCDATA sections now work. adds 9d3fe62 Make marked section support deselectable. adds 6c89d3c Don't leak any $@ messages. adds 0e10592 Be case insensitive when matching the end tag in literal_mode. adds f3ad6f0 2.99_11. adds a808222 Added even more link tags as suggested by Sean M. Burke <sbu...@netadventure.net>. adds fa33b86 Complete marked section support. adds 6ae3831 Put magic number into the header of p_state. adds 4bfb7cf Ask if marked sections should be there. adds c2a4455 Implemented unbroken_text option. adds 964dd0a Implemented attr_pos(). adds 7e39aec Gramar changes from Michael A. Chase. adds 30abe0c Gramar fixes by Michael A. Chase. adds ead3fc7 Text change. adds 8e2025c Make attr_pos "work" for boolean attributes too. adds c43e9cd Report end of previous attribute/tag as first number for attr_pos adds 914e182 Callbacks are now set up with _cb suffix. adds 19bf184 For the constructor arguments, we now use _cb as suffix for those that are callbacks. adds bc88bb9 pass_cbdata renamed to pass_self. adds 1f3b2ae pass_cbdata renamed as pass_self adds e50d243 Expanded TODO section. adds ccaa58a One more optimization to think about. adds 2b45fef Summarized 2.99_12. adds ef480ce 2.99_13 adds c0d8b7a Gramar corrections by Michael A. Chase adds 3aeb8e0 Case insensitive yes. adds dba106e Documentation patch from Michael. adds ec9f035 Various documentation updates. adds 7607ba3 More updates to documentation. adds 2f06b38 First revision. adds c0bb160 First revision. adds 2a702cf Test accum filling. adds e0b7ed5 Added two new tests. adds b6f3998 Make it possible to unset callbacks. adds dea8270 First revision. adds d846366 HCTYPE_NOT_SPACE_EQ_SLASH_GT 0x40 was not initialized. adds bcb0bff First revision. adds 6b5729f Two more tests. adds c03ec6f Summarize 2.99_13. adds ecee086 From: "Michael A. Chase" <mch...@ix.netcom.com> Subject: [PATCH]HTML::Parser-XS-2.9913_mac-1 To: "libwww" <lib...@perl.org>, "Gisle Aas" <gi...@aas.no> Date: Wed, 24 Nov 1999 19:38:53 -0800 adds d1483af Some more todo. adds 62afd18 In perl5.004_05 we can't return PL_sv_undef safely. adds e2cf2cb Forgot a little detail. adds 1b7637a Fixes by Michael A. Chase adds d3abcf6 Documentation update by Michael A. Chase. adds 6a59e84 One more todo option. adds 0bb7d1f Incremented version number. adds 0c41ef0 Prepare for 2.99_14. adds de0db9e Better warning if undefined document is passed in. adds a9ea92b First revision. adds 4af9067 First revision. adds f6e4066 Renamed as tokenpos.h adds b45e345 Added another .h file. Made marked section support the default. adds c643086 First take at normalizing everything to call html_handle(). We still don't call it for E_START. adds 5865986 Now also html_parse_start() calls html_handle(). adds aa58179 Version 2.99_15 adds 7c8b452 Added handler stuct array to pstate. Replaced $p->callback and $p->accum with $p->handler. adds fc64935 Basically set up callback loop. adds f6e2762 Set up all basic arguments. adds 9592693 Trimmed out various boolean attributes. The ones eliminated are: adds a49b144 Implemented cdata argspec. adds a7ca7b2 Updated TODO list. adds 86317eb Killed all the routines that was replaced by html_handle(). adds d010f68 attrspec_compile() adds 08c5f14 Direct method calls. adds 8b4ae25 Added MAC to copyright notice. adds 5694886 New callback interface. adds aef7aa3 token1 indentifier in attrspec don't use references as method names. adds 2dc5af0 Allow handler to be specified as an array of two values too. adds 6637919 Look for MS_IGNORE in html_handle(). adds 0d58634 New syntax. adds d143cd9 Move to new syntax. adds 7865ad2 Better default handlers. adds 211b79c Took out accum test. adds 4e13ac2 Fit with new way of doing things. adds b6d2cac Avoid reporting empty text segments. adds bbbc0da Set up our own accumulator array. adds bc3b75a Changed sequence of handler arguments. adds 2fac1da Reversed order of $p->handler arguments. adds afeacdb Added tokenpos.h adds 084f106 2.99_15 adds dd703e1 We did copy from the wrong place. adds e6203d4 First revision. adds 931de5e Added largetags. adds 0a84940 Killed unused $a adds 15732d3 Support "event" in argspec. adds 665b220 2.99_16. adds ae198c2 2.99_16 adds 7265bba Test with ">" after ms. adds 950516e Documentation update from MAC adds da6420d MAC patch to support accumulator array in html_handle(). adds 707d15e version => 3 ctor option. Documentation update. adds d362b0d Artificial end tag should have empty origtext. adds 53aa8c3 Test that artificial end tag get empty origtext. adds c7da5e0 api_version. adds bebb78c api_version => 3 adds 134487f api_version => 3. adds 34b0f82 Don't ask about marked sections any more. adds abe38d3 Don't eat newline after "]]>" adds a20ed1a Fix some obvious memory leaks. adds ab0b5d4 ]]> dont swallow "\n" any more. adds daf6a29 2.99_17 adds 2f5c728 "realloc" as parameter name created problems. Fix by Paul Schinder <schin...@pobox.com> adds 640097d Patch from MAC that makes it into a real test. adds aa090bb Documentation patch from MAC. adds a58c7fe Working array dest. adds 2a28e9a Use internal array-as-handler-destination-support. Patch by MAC. adds 21efabd Since we are faster we need longer speed test. adds ace6600 Moved some functions out of Parser.xs adds c53ff4c Prettifying. adds 1b6a295 Added copyright adds c7fbece Dropped html_ prefix. adds dd2577f Update. adds 3401fd7 First revision. adds cdd72a3 Moved stuff out of Parser.xs adds 2a55a91 More H files. adds bcfe686 More stuff. adds 6172d8d 2.99_90 adds ed8bbff Some attrspec renaming. adds 3285d4e 2.99_90 adds ef5bbcc Minor spellfix. adds 906754c beta now adds f7587ee Does not make sense in XS parser world. adds 48ba894 literal_mode_elem adds b669e30 Moved literal_mode_elem to hparser.c adds 26c3953 Remove some commented-out code. adds 1e684a2 Documentation patch from MAC. adds 033a5ce Updated it. adds 9fd5597 Reduce length of speed test. adds 502b5f1 Initial support for offset. adds 6423b02 pending_text gone. adds 6913607 Update. adds f1e1d6b Added offset. Removed pending_text. Some shuffling of fields in p_state. adds 5f349c0 Document offset. adds 382c757 Working "offset" in attrspec. adds 71bc81a First revision. adds b29df0d Added offset. adds ca836a8 Updated. adds afdf3cd 2.99_91 adds 48bf5ed First revision. adds e11730e New case. adds ba145d4 Added t/attrspec.t adds 50fc9c4 Doc patch from MAC. adds e3844ed One more. adds fba8fd2 Typo fix by MAC. adds 89122be Fix tokens reported in the artificial case. Patch by MAC. adds c5c532c <a "> core dump. adds d0f564d First revision. adds 84dd1c7 Back out some more changes. adds 322f98a Take out linepos adds ceea58e For boolean attributes would could get very strange values unless strict_names() were on. adds 40961cf Bug tokens for artificial tag fixed by MAC. adds 33f7563 Update. adds 42e7bbc Language fixes by Michael. adds 638d271 Documentation update from MAC. adds d69b9cf Minor layout fixes by MAC. adds ae0c48c Another DOC patch. adds fef90e9 Don't make empty token/tokenpos arrays. adds becb50c Changed behaviour. adds 2ab9fc0 Renamed token1 as token0 adds e004567 av_extend() token/tokenpos arrays. adds 725e796 token0 adds 5c0337f For artificial end tag we don't report any tokenpos, but report tokens. Boolean attribute values are reported as 0,0 in tokenpos and in tokens we care about bool_attr_value. adds ff8fb15 Update from me. adds 681bcc3 Rename bool_attr_value adds 2828698 2.99_92 adds 1cbb1f8 Doc patch from MAC. _93. adds 7a454b2 Renamed attrspec.t as argspec.t adds f17f7d7 Renamed attrspec as argspec. adds e16f4bd Introduced enum argspec_opcode. adds d4bd443 Renamed opcode as argcode and OP_ as ARG_ adds 1ac0ead enum argcode adds e720730 Nothing much. adds 192452f First revision. adds 8d5e3cc Renamed bool_attr_value as boolean_attribute_value adds 636553f Added eg/hrefsub adds d5a5321 Added a BUGS section. adds 077e53f Updated. adds 5bb60c3 2.99_93 adds 7587748 argspec length adds ee5d508 _94 adds 5a1340b Documented literal string in argspec. adds 2b6d8cc Off by one error when reporting literal end token. adds 9ef50a0 First revison. adds 0884e1e shift2 adds 8ddfa64 Added htext. adds 7d4b5b0 First revision. adds c6764ed Added t/exit-via-next.t adds 71921da IGNORE. adds 5268c88 Argspec undef adds 88e66ad First revision. adds fc3263c Added eg/hstrip adds 469d4cf Doc patch from MAC. adds 0131c48 Typo fixes. adds e2f4bfa One more attrspec cusin. adds 15ce8b2 Simplified hrefsub by working right to left. Patch by MAC. adds b72f733 Protect " inside $new_v adds 6194d36 Better fail message. adds 80a4e0e Taken out debug stuff. adds 349ba76 Renamed cdata_flag as is_cdata adds 93f2b04 Updated. adds fa338ea Updated. adds d2fff03 Added usage string. adds d783105 Added short description of each file. adds 1760568 Need a statement after a label. Fix pointed out by Matthew Langford <lang...@eng.auburn.edu>. adds 52aeebe Some more thoughts. adds 44c1161 MAC improvement (remove stuff from left) adds 58ccb20 A generic bug. Don't test for it any more. adds fd1bbcd t/exit-via-next.t gone adds c38ea19 if we killed all attributed, kill any extra whitespace too adds 2ac9baf Some adjustments by MAC. adds 1b32a9b Fix core dump. adds a54634a Simplified check_handler() adds 5b959a8 First revision. adds c571455 Don't get double refcnt decrement if argspec_compile() or check_handler() croaks. adds e0eefd8 Remove debugging output. adds 5bedc48 Allow h->argspec to be NULL in report_event() adds a226358 Don't allow handler arguments to be grouped as an array reference. This created ambiguty when we used and array as handler reference. adds 6ed63a3 First revision. adds ce0c373 Added two more tests. adds 76cdb14 Yet another update. adds 137b279 Statement that is not correct any more. adds b3820e3 Documentation update. adds 0e47ea8 $self->{parse_file_stop} adds 5e6934e Documented return value from $p->handler(). adds 655a098 2.99_94 adds 7225188 Doc patch from MAC. adds 3ab160d Added <�� as test case. adds 9d6d2dc A little more precision. adds 59152d1 First revision. adds 70526c6 Added a comment. adds 3fad039 Fix core dump reported by Doug MacEachern. adds 29412aa First revision. adds 16b2a89 Test netscape_buggy_comment too. adds 61ee014 Test process too. adds 579c3e0 carp about netscape_buggy_comment instead of a warning. adds 16e4908 First revision. adds 17a8ec0 Note about depreciate state of this module. adds 6bf2447 Updated. adds 6055419 Updated again. adds 34e9a33 2.99_95 adds f0d4f77 Another update. adds 4fdb87f _hparser. adds 8a3bb4a Changed name of hash entry to _hparser_xs_state. adds c0a19f6 Two more sections. adds 10066b1 First revision. adds bdecab7 Make \\ reserved in argspec literals so we can use it as escape character later. adds 63e748f More to go. adds b868b8f One more change. adds 864f8fb Allow handlers to call $p->eof to abort parsing. adds 6474d9f $p->eof in handlers is now supported. adds dc36cd3 Updates to the examples. adds 5da9dbb Handler $p->eof adds 20efd6f First revision. adds 230f43e Added many new tests. adds 7a6bdcb Added header. adds f1d4460 Various documentation and english tweaks from MAC. adds d39e044 Don't use a Perl-hash for argspec any more. Instead we simply use a static array. adds 9bca9bb I also decided to take a swing at the IGNORE handler. Any false value (usually '' or 0) in the SV pointed to by h->cb triggers the behavior. adds 647381c Summarized 2.99_96 adds c498996 Minor tweak. adds d5907cf Yet another one of those useless tweaks. adds 424f8fc Simplified. adds c10d28a Test patch from Michael: adds 2958990 Final POD tweaks from Michael. adds f306528 3.00 and some minor doc tweaks. adds 84a3b09 Added MAC to Copyright messages adds 81e2527 Avoid calling method callbacks as options. adds e8a83e8 Killed DISTNAME adds eeb34ef Make '3.00' a string. adds 3c93aaa Removed beta blurb. adds 9d1302e Added ANNOUNCEMENT adds 41ea75d First revision. adds 101fa58 After ispell adds f0245b9 Use "" instead of &ignore. Patch by MAC. adds ad64c08 One additional paragraph from MAC. adds 3c26488 After MAC hacking. adds fa9ce3a 3.00 adds 8e0c00c 3.00 ready. adds d20c225 Assertion was backwards. adds a7e2dbd The hash function has probably changed so we need sorting to ensure sequence of attr keys. adds fe84f54 Use ~-magic to trigger deallocation when IV that points to struct p_state goes away. adds d0cf59d 3.01 adds 138dee5 Summarized new stuff. adds 0a6392c Tweaks before 3.01 adds d59e321 Added an "also" adds 1d13bba Make _hparser_xs_state into a reference to the IV-pointer adds 4dde476 Adjusted because _hparser_xs_state is now a reference to the IV-pointer. adds c3c1727 Introduced init(). Filled out DIAGNOSTICS. adds 28dabf1 Reuse earlier 'Not a reference to a hash'-message. adds e85582e 3.02 adds b60ed7a Rephrasing. adds e0cf45b First revision. adds 99d117c Added comment parsing. adds 58c064c 2000 copyright. adds a3537ad Version 3.03 (new year) adds 5341167 Prepare for 3.03 adds fb73147 We did not get out of comment mode for comments ending with an odd number of "-" before ">". Patch by la mouton <k...@3sheep.com> adds 3990c0f Try 3 dashes in a row. adds 857be89 Fixed marked_sections without an s adds 53d3dcb Back out option checking patch by MAC. adds 65068bc Kill documentation of init(). adds bfb93c5 Minor doc tweaks by me. adds 07adcd3 Backed out some of 3.03 patch. adds de4702e One more thing. adds 31e4833 Some typos fixed. adds 0d9fd28 xml_mode should prevent special treatment of <script>, <style>... adds 4f1936f Fix example. Some more text. adds cb224e4 Don't enter CDATA mode for some tags in XML mode. adds b2d95bf Don't enter literal_mode when XML mode is enabled adds 55e9585 No Literal mode for XML. adds 6647602 Special CDATA parsing for XML is gone now. Version 3.05 adds 4305211 Moved HTML::Filter to Decpreciated section. adds ab64ea8 Implemented unbroken_text. adds dd5c0a8 Did not set is_cdata when we got out of outer level CDATA MS. adds dd6ff05 Get the offset correct when alternating between CDATA/!CDATA modes. adds 2777600 Don't initialize handler before we have to. I am still wondering about whether to put unbroken text before or after early return because no handler is there to get anything. adds b11a63b First revision. adds c04823d Also try <xmp>...</xmp> adds a1ab160 Don't keep text unbroken between unreported tags. Offset was wrong for some text. adds 649629e An extra newline... adds 6b73622 New test. adds 0bf18f8 Fix last test. adds 7f6b398 unbroken text done adds b3f2862 3.05 soon ready. adds cfe7289 require 3.00 adds 354e9aa From: James Walden <jam...@ichips.intel.com> Subject: Patches for building with xlc To: lib...@perl.org Date: Fri, 4 Feb 2000 15:34:03 -0800 (PST) adds d37f32a First revision. adds 482ee77 First revision. adds 217273e Fixed warning. adds a5f160b Avoid some "statement not reached" from picky compilers. adds ba6a7a3 From: Doug MacEachern <do...@pobox.com> Subject: [PATCH HTML-Parser-3.05] v5.5.670 i686-linux-thread-multi To: lib...@perl.org cc: perl5-port...@perl.org Date: Wed, 1 Mar 2000 13:38:32 -0800 (PST) adds 82a72ae Version number is now 3.06 adds 5c651b1 3.06. adds c9f6075 Added eg/htextsub adds 72cbc1f Typo. adds 75b749a Fix for 5.004. By avoiding OUTPUT: RETVAL we don't get sv_2mortal() called on &PL_sv_undef in $p->handler. adds cce43f5 Incremented version number. adds ac849f0 Copyright 2000. adds 53ce3bb Only continue with declaration parsing when we find "DOCTYPE" or "ENTITY". Based on patch by la mouton <k...@3sheep.com>. adds a1ecf83 First revision. adds 2440612 Added t/declaration.t adds 644964d 3.07. adds 0adb4db First revision. adds 65ce3ca A short comment. adds 03c6e2a Added hanchor. adds 5e8fc82 Typo fix. adds 79f15c4 Fixed typo spotted by Jamie McCarthy <ja...@mccarthy.org>. adds 6404161 Match typo fix in Parser.pm. adds cec1dca Avoid access to freed() memory. adds 0817754 Version number is now 3.08 adds c5b7848 Changes for 3.08 adds 004c3a8 ActiveState.com adds a2615b0 Document that the $p->parse() argument should not be modified. adds fe1abfb Added a litle description of what 'token0' is for process and comment events. adds 561df94 Documentation update as suggested by Paul Makepeace <paul.makepe...@realprogrammers.com>. adds c84d5e9 3.09 adds 3aeba52 Make a mortal copy of the self argument passed to a handler. Avoid core dump if somebody clobbers the aliased $self argument of a handler. adds 2af94b1 Another change in 3.09 adds 1b9d96d More mortal copies. SPAGAIN after flush_pending_text() adds 171cafa 3.10 adds 1447c66 Typo. adds ae11e8c Get %linkElements from HTML::Tagset. adds 66efa03 Grab link data from HTML::Tagset adds f3caa1c 3.11 adds 308566f Rely on HTML::Tagset adds bdd7c83 Spelling patch from David Dyck <d...@tc.fluke.com> adds 7e250e6 PREREQ_PM HTML::Tagset. adds 9c03cbf 3.12 adds 3e200a9 3.12. adds b5b4407 Get it to compile with "Optimierender Microsoft (R) 32-Bit C/C++-Compiler, Version 12.00.8168, fuer x86". Based on patch by Matthias Waldorf <matthias.wald...@zoom.de> adds 5eaea80 A change missing in the log. adds 847fda7 Set up UNICODE_ENTITIES. adds efb5fdc Deal with unicode entities. adds 6aec595 Copyright 2000 adds 1149a79 Added unicode entities from HTML4.0.1 spec. adds b081b03 Deal with numification. adds 76cfea8 Added uentities. adds 34d014a Only 9 tests. adds e808562 Check for overflow. adds 943e236 Better overflow check. adds da08b11 Test overflow detection. adds 02b1beb Avoid failure under unicode. adds 4542e34 Don't set UNICODE_ENTITIES if $] > 5.006. adds 61cf7ab 3.13 adds f1c364c Prompt for -DUNICODE_ENTITIES adds d7fe027 UNICODE_SUPPORT adds 5e68b2a Don't test if UNICODE_SUPPORT is not enabled. adds 516b2c3 3.13 adds b3822f8 Fix infinite loop in case the handler triggered by ->eof actually called ->eof too. adds 0116e6c Incremented version number: 3.14 adds 33738f0 Allow declaration parsing to take place for lowercase <!doctype ...> and <!entity ...>. In XML mode uppercase versions are still required. adds 7bcde03 Release 3.14 adds bf5da9a Escape new hash keys that happens to be perl keywords. perl-5.004 make a lot of noise about them otherwise. adds b780e56 $p->get_tag() can now take multiple tag names to match. Updated documentation. adds 1aa27de Test with multiple arguments to $p->get_tag adds 4865380 Really hide debugging code. adds 3276912 UTF8 entities has already been done. adds 6d316d6 Require 5.7.0 or better in order to offer "Unicode entities". adds a23bfa9 Disable GET_CONTEXT for threaded perls because "we want efficiency". adds d7b846b Get out a few more dTHXs by passing context with pTHX_ and aTHX_ adds f409e8e Release 3.15. adds 8aae91a Document that HTML::Tagset is a PREREQUISITE. adds c163335 Weaken then libwww-perl PREREQUISITE. adds f211f3a Deleted note about v2 compatibility. adds 41b6859 Use INT2PTR instead of cast directly between pointers and IV. adds 1456e83 Set up INT2PTR unless perl provide it. adds 2794b54 Version 3.16 and Copyright -2001. adds 28abf42 A few more ideas. adds 453f4b5 use strict adds 356ca57 unbroken_text now works across ignored tags. adds 9190a0c unbroken text behaviour fixed. adds ff7ebdd Test one more range. adds 7047c06 Fix decoding of unicode entities. adds a5090f1 Copyright 2001. adds d5999e6 Always update size. adds 1593a10 Reindent. adds eaab8fb Added _decode_entities(). Reindent. adds 37f6348 Export _decode_entities() adds 616ddfd Added t/entities2.t adds 532158b Reindent. adds bbc60ee 3.16 adds c1b1d9d Forgot about pTHX_ from grow_gap(). adds 52a1e76 Release 3.17. adds 7f82c9a Removed ANNOUNCEMENT. adds 5ab6e80 C++ comment left over from debugging removed. adds 1d4012c Release 3.18. adds aa9f9ac Use get_hv() as documented in perlapi. adds 414c555 Avoid global entity2char. Patch by Sarathy. Version 3.19 adds 7863287 Support @attr argspec. adds 7d4f6ba Allow @{....} in argspec to signal flatting of array. adds e84ae7c Implemented ignore_tags/ignore_elements/report_tags adds 713cf01 Documents filter methods. adds 5ce391b Added test for @attr and @{...} adds c9f9f0a Test new filter methods. adds 07c57e4 Renamed report_tags as report_only_tags. adds af1c4af Release 3.19_90 adds b987444 Allow array references passed into $p->ignore_tags. adds d989c2d Doc update about the effect on offset/length under unbroken_text adds f7f5274 The netscape_buggy_comment now gives mandatory warning about deprecation. adds 8d6fc30 Clear ignoring_element on eof. adds 91369c6 Simplify ARG_ATTR code a bit. adds 3d5b3f1 Simplify by using ignore_tags/ignore_elements. adds 8001840 No need for end_h adds 15f188b Minor stylistic issue. adds a0cb8a6 Simplify by using report_only_tags adds 6ec3e35 Optimize tag reporting. Image text should not be array ref. adds 687759c Doc tweak for report_only_tags() adds 84e5806 Version 3.19_91 adds a06ce57 User filters. adds ad7bf22 Use filters. adds 63c1ebc Make it possible to pass key/value arguments to the constructor. The extra info reported for tokens can be changed with *_args parameters. Only decode cdata text. adds c7726a8 Attr needed for textify. adds 257bf6c Introduced HTML::PullParser. adds 3d30138 Support parsing from doc => $str adds c4558ba Test HTML::PullParser adds 377e104 Reference HTML::PullParser instead of HTML::TokeParser. adds 6e95a5d A clearer separation between 'doc' and 'file' parsing. Improved documentation. adds d2e357c Release 3.19_92 adds 3ffda98 s/report_only_tags/report_only/ adds dce9a1b Track unicode support as of perl@9359 adds 21b90d5 Avoid sv_catpvf(sv, "%c",...) as it wants to upgrade the SV to UTF8 far to easily. adds c3d68b8 Doc fix. adds 19c63a3 Release 3.19_93 adds f042574 Support "tag" argspec. adds 2039b52 Document "tag" argspec. adds ac7b505 Prev patch broke lowercasing of tagnames. adds 2a806e8 Test "tag" argspec adds 604cb66 Example of PullParser usage. adds 5d5de2b Doc update. adds b91eec5 Implemented tracing of line and column numbers. adds 36e9b77 Column numbers was off by one. adds ff2ff9b Print line/column numbers instead. adds 8ee0d07 Test col/line. adds 4b0b742 Get offsets/line- and column- numbers correct when skipping marked section markup. adds 7f2741f Release 3.19_94 adds dbb574a Include description of HTML::PullParser. Remove description of HTML::Filter. adds e96a0bf Ref hform example in doc. adds ca38192 Release 3.20 adds cee9521 Don't promise any utf8 option. adds a6a47a0 Avoid compiler warnings on some some compilers. The DEC C said: cc: Warning: hparser.h, line 42: Trailing comma found in enumerator list. cc: Warning: hparser.c, line 55: Trailing comma found in enumerator list. cc: Warning: hparser.c, line 612: In this statement, the referenced type of the pointer value "buf" is "unsigned char", which is not compatible with "const char". adds d76633f Fix memory leak in filters. adds bc468b0 Optimize: Reuse the same SV for filtering by tagnames. adds 68d9086 Release 3.21 adds 07b8496 Decode ' adds 83f1401 Parse <textarea> in literal mode, but not with is_cdata flag set. adds 3678082 Release 3.22 adds af08565 Moved filter testing code up a bit. The ignore_elements filter did not get out of ignore mode if there was no end-event handler registered. adds 1b724eb Release 3.23 adds 24cfeff Support parsing from code. adds 11b84e0 use strict. adds b3a8d35 Added start_document and end_document events (as for SAX). adds df4da1b Implemented skipped_text argspec. adds c1cb2cb Fixed interaction between unbroken_text and skipped_text. adds 49f9c62 Implemented offset_end argspec. adds 0a68ae3 Doc update. Release 3.24. adds b7cfaba Test offset_end. adds a712a20 Release 3.24. adds 2f26b5f Fix plaintext parsing. adds 42d1c59 <plaintext> fixed. adds 589004f Some more state that was not reset on EOF. adds 90c25b6 perl5.004_04 did not have ERRSV adds 24da747 croak(0) was not present for 5.6.0 adds 0f8b950 From: "Stephane Barizien" <s...@ocegr.fr> Subject: HTML::Entities 3.24 does not build on NT To: lib...@perl.org Date: Thu, 10 May 2001 14:16:06 +0200 Organization: Oce-Industries adds 052fec7 Release 3.25 adds 20bb42d Don't encode \r as suggested by Sean M. Burke. adds e612526 Make 'make clean' also clean up generated *.h files adds f68e790 From: "Timur I. Bakeyev" <ti...@gnu.org> Subject: Small bug in HTML::HeadParser To: Gisle Aas <gi...@activestate.com> Date: Sat, 29 Dec 2001 01:12:01 +0100 adds 437cd3d Another example program. adds bb77d5f Avoid warnings emitted by perl-5.7.3 adds 643b33e From: Guy Albertelli II <g...@albertelli.com> Subject: attr_encoded case_sensitive patch To: gi...@activestate.com CC: Gerd Kortemeyer <ko...@lite.msu.edu> Date: Thu, 7 Mar 2002 16:17:21 -0500 (EST) Reply-To: g...@albertelli.com adds c2cf5bd Added a few tests. Resorted. adds eb1c492 More doc updates explaining C<case sensitive> adds af13d66 Calling perl_call_* without G_EVAL always means trouble. adds cb51cfb Dont get fooled by an emtpy http-equiv adds 17a3bc5 We already had a RETHROW macro defined. adds 0737ec7 Release 3.26 adds d9c8ff6 First revision. adds 20e5043 Added eg/hlc to the example programs. adds a4b3e5e Typo spotted by Marc Lehmann <p...@goof.com>. adds fbd3467 Typo. adds 7046df1 From: "Sean M. Burke" <sbu...@cpan.org> Subject: HTML::Entities patch To: Gisle Aas <gi...@aas.no> Date: Fri, 17 Jan 2003 23:13:52 -0900 adds 51fb009 Test encode_entities_numeric adds dc9fe52 Release 3.27 adds 101835e Fixed typo. Spotted by Sean. adds 2e23db3 Pass context around instead of using dTHX; This should be faster. adds abfc8d2 Make <!454554> be treated as a comment unless strict_comment is enabled. adds 8d408c4 Version 3.28. adds 1a1db37 avoid Visual C warning. Patch by g...@activestate.com. adds 0b4cc41 Don't use the pfunc by default. On Intel P4 that saves about 3000 bytes on the binary but there was no easy to measure speed difference. adds 56e9b9c xml_mode implies strict_names also for end tags. adds 9e94d5a 64-bit fix from Doug Larrick <d...@ties.org> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=195500 adds 4501c22 Documentation patch: <textarea> is also literal mode. adds d0571a0 MSIE compatibility stuff. adds 8bc90c1 Need <!-- for strange <script> behaviour to show up. adds e1fd702 Allow crap in end tags as MSIE does. adds bbe02ba The name token name 'empty' was not good. adds 67326b1 Parse <! "<>"> as comment (MSIE compat). adds 21ad535 Implement 'strict_end' to control acceptance of junk at the end of end tags. adds 67ed643 Parse with <--comments> like this if we can't find the real thing. adds 14132c7 Release 3.29. adds 27fda3c From: Steve Hay <steve....@uk.radan.com> Subject: HTML::Parser 3.29 To: gi...@activestate.com Date: Fri, 15 Aug 2003 14:56:39 +0100 Organization: Radan Computational Limited adds 1071102 Avoid RETVAL warnings as reported by Steve Hay <steve....@uk.radan.com> for MSVC++ 6.0 on WinXP: adds c4745a3 Perl-5.7 should be gone by now. adds e46723f Better fix for the RETVAL warnings. Use PPCODE for the parse functions. adds 5df47e4 Missing unicode support noted. adds b71c6ba Also PPCODify handler(). Fixed return value for eof(). adds 9b30572 The assert() apparently needs my_perl so ignore it. adds 3a0d3e6 Documentation: Don't reference perl 5.7 any more. adds 52468f0 Release 3.30. adds 101f4f2 Release 3.31 adds 4f51d9f Stale stuff. adds 654ea79 If the document ends with "some kind of unterminated markup", then we did not clear the buffer. The result is that this markup shows up in the beginning of the next document parsed. adds 00e0dd5 http://rt.cpan.org/Ticket/Display.html?id=3954 adds c30282a Show skipped reason in the official way. adds e81c27b Updated documentation. adds c7c6552 Include $Id$. adds 6c93cb4 Let the get_text() and get_trimmed_text() methods take multiple end tags as argument. Based on patch by <siegm...@tinbergen.nl>. adds 141426e Document the </script> inside quotes case as a BUG. adds 2dc6fc5 Typo spotted by S Page <sp...@macromedia.com> adds 9672e9b Apply patch (partly) from S Page <sp...@macromedia.com> that adds some comments. adds a452209 Note that parsing of Unicode does not work yet. adds 8d422a4 Added dump script. adds 983b7f4 Release 3.32. adds 8f1d150 Implement get_phrase(). adds dad360c Make get_text() expand most skipped tags to " " adds e9607d8 We don't support 5.004 any more. For some strange reason the new tokeparser tests fail to pass. If there is a big outcry because of this I might redeside. adds 6ef0947 Release 3.33 adds c1d7882 Fix release date for 3.33 adds a5cd728 Avoid core dump when the stack get reallocated during the parse() call. adds 1fd5d8d Added testcase for the stack realloc bug to the test suite. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=217616 adds 24ebf15 Release 3.34 adds a7d5dee No need to redeclare SP. adds 73adbc8 From: "Croome, Paul" <paul.cro...@softwareag.com> Subject: A few patches for the POD in HTML::Parser To: "'gi...@activestate.com'" <gi...@activestate.com> Date: Fri, 12 Dec 2003 14:42:50 +0100 adds 6875636 Release 3.35 adds e107397 When an attribute occurs use the first one in 'attr' instead of the last one. This is apparently what MSIE and Mozilla do. adds c5c1b06 Compute hash only once. adds c834052 Release 3.36 adds 740f633 Silence 'gcc -Wall' - the prev_token might be a real issue. adds d3083c3 Time to ditch the v2 synopsis. adds 26f4905 Improve the handling of surrogate pairs. Based on patch by <jgmy...@proofpoint.com>. http://rt.cpan.org/Ticket/Display.html?id=7785 adds 9a612de Match perl's rules for Unicode non-chars. adds 7e3c90a Avoid temp modification of argspec strings. I don't think it really matters, but it is possible to imagine shared readonly SVs between threads. Patch contributed by <jgmy...@proofpoint.com>. http://rt.cpan.org/Ticket/Display.html?id=7786 adds 0f09533 Must also upgrade chars after the gap. Otherwise we might produce a badly encoded SvUTF8(sv). adds 03d6aa9 Release 3.37 adds e19ba13 Make closing of <plaintext> configurable. Contributed by Alex Kapranoff <a...@kapranoff.ru> https://rt.cpan.org/Ticket/Display.html?id=8362 adds 255ac5e Release 3.38 adds 4b46eee Typo. https://rt.cpan.org/Ticket/Display.html?id=8432 adds 124ec21 Parse <title> in literal mode. adds ccaf5eb Updated copyright year. adds f548267 Make the UTF8-ness of strings parsed propagate. Patch by John Gardiner Myers <jgmy...@proofpoint.com>. https://rt.cpan.org/Ticket/Display.html?id=7014 adds e543479 Disable Unicode stuff for perl < 5.8. I still want HTML-Parser to be compatible with these. adds 0882ace Get offsets right for Unicode string. adds 9d927b3 Removed Unicode noop. adds 61a9944 Test Unicode parsing behaviour. adds cd23c2c Don't consider perl-5.6 Unicode capable. adds 76128a1 Release 3.39_90 adds 7d73154 Usually there is only one <title>. adds 49f4e2b Unicode basically done. adds 6f9a5a9 Convert to use Test.pm adds 45bad29 Header is not done if we see the Unicode BOM. adds 21b8c01 Unicode is not supported. adds 4f8064d Unicode BOM tests. adds c7e3280 UTF-8 BOM warning only when Unicode is avalable. adds ba81bf6 BOM tests. adds fa469aa Some behaviour seen in KHTML sources. adds 733eb2c Implement quote behaviour for <script> tags. The behaviour is derived from behaviour seen in KHTML sources. adds 1039f1c Test quote behaviour. adds 5a3466d Propagate UTF-8-ness during flushing at eot. adds 52f7543 If literal tags are unterminated, flush them out with the text that follows and restart parsing. adds 7271e9b Make Unicode BOM warnings optional and document them. adds 5a8e89b This change was supposed to go somewhere else. adds 4ac0714 Document that these modules need decoded chars to parse. adds 05d9609 Release 3.39_91 adds 82184fb Some new MSIE comptibility issues. adds cef2249 MSIE compatibility: Expand unterminated entities in 'dtext' and expose the _decode_entities() routine. adds f073533 Improve decode_entities() documentation. adds c97d2c1 Tweaks. adds 57476e3 Simplify. adds 0569139 Test parsing of Unicode from file. adds c35b6a1 Try to describe Unicode issues better. adds 359fe43 Added attribute 'utf8_mode'. adds 3979c0b Sort documentation; boolean attributes, argspecs, events. adds fc168e2 Test utf8_mode. adds 34bfa36 Fix utf8_mode semantics. The entities are now decoded as UTF-8. adds 4591a34 Release 3.39_92. adds cfa9027 Simpler HTML link. adds 9f9ee31 Trigger UTF8 warning if anything in the first chunk looks like hibit UTF8. adds cbb8192 The utf8_mode produce garbage for older perls. adds 876c70f Least expensive tests first. adds 3388ec5 Release 3.40. adds 3d642c0 Make it work with perl-5.005 adds 440940c Release 3.41 adds c6d3c8a Use push_header for all headers added. Do not want to loose any values. Better to duplicate fields. adds 5b2c0fa Silence warnings from the HP C compiler about char/U8 mismatches. adds e333540 Typo in r2.26 adds 36fff43 Avoid sv_catpvn_utf8_upgrade; make us perl-5.8.0 compatible. Patch by Reed Russell <russell.r...@acxiom.com>. adds 88f32c4 perl-5.8.0 does not have utf8::is_utf8. adds 6f5dc1d Release 3.42. adds 859a842 Fix test failure on Windows. adds 9e8cea7 Forgot to set repl_utf8 flag which might lead to utf8 corruption. This showed as test failure with native compilers on Windows, HP-UX and Solaris. adds 6276749 Release 3.43 adds da7e4da Fix the handling of quoted strings. adds d6009eb Release 3.44. adds 1c3b2d0 Fix stack leak. Patch contributed by Gurusamy Sarathy <g...@sophos.com>. From ActiveState p4 change #125001. adds 87e8f54 Release 3.45. adds ec1d534 Explain affected code. adds 7b8850b From APEE build log with the HP native C compiler. HTML-Parser:3876: Warning 430: "Parser.c", line 604 # The variable 'RETVAL' is never initialized. HTML-Parser:3879: Warning 11010: Exact position unknown; near ["XS_HTML__Entities__probably_utf8_chunk", line 614]. # ["Parser.c", line 614:18 XS_HTML__Entities__probably_utf8_chunk] Uninitialized variable 'RETVAL' adds fba0404 Fix typo spotted by Stefan Funke <bu...@adm.arcor.net>. adds aed1a26 From: Norbert Kiesel <nkie...@tbdnetworks.com> Subject: Re: HTML::Parser: how can I reset report_tags to report all tags? To: Gisle Aas <gi...@activestate.com> Cc: lib...@perl.org Date: Sun, 19 Jun 2005 16:36:38 -0700 Organization: TBD Networks adds 73785be Test pod correctness and fix up missing =back. adds 25a4576 use strict; adds fda1b7e Don't treat 0xA0 as space, since it's not really and XML agrees. This also creates problems when parsing UTF-8 which is how it supports Unicode (https://rt.cpan.org/Ticket/Display.html?id=15068). adds 6a657d7 Try parsing of \x0420. https://rt.cpan.org/Ticket/Display.html?id=15068 adds ef63a22 Release 3.46 adds e627259 From: Norbert Kiesel <nkie...@tbdnetworks.com> Subject: Re: HTML::Parser: how can I reset report_tags to report all tags? To: Gisle Aas <gi...@activestate.com> Date: Tue, 21 Jun 2005 11:57:27 -0700 Organization: TBD Networks adds 6a7ec2a Make unbroken_text the default for HTML::TokeParser. adds 137b1ad Silence all the diag noise. adds 7113f57 Skip blocks needs to be called SKIP for it to work. adds 8782b89 perl-5.8.0 is just too buggy for HTML-Parser. adds 7161c7c Faster load time with XSLoader. https://rt.cpan.org/Ticket/Display.html?id=13409. adds 774eda0 Make the source ASCII only. https://rt.cpan.org/Ticket/Display.html?id=11380 adds 31dad60 Better use of Test::More. adds d41828f An explicit binmode() make this test pass with perl-5.8.0 adds 5da6eb1 encode &apos by default. adds 289196d Make tests pass for perl-5.6. adds 81d8e9c It seems to work with perl-5.8.0 now. adds f562c5a Typos. adds 9475423 Add empty_element_tag and xml_pic attributes. adds d7ef967 xml_pic has been added adds 46fe801 Need to look for '/>' in more places when strict_names isn't enabled. adds 991e983 Make empty_element_tag default on for HTML::TokeParser. adds d792aac Documentation tweaks. adds 86daece Add some empty elements tests. adds acf1523 Rename as empty_element_tags (with s) adds 74d789d Release 3.47. adds 5782978 Test empty_element_tags/xml_pic. adds dad791d Fix typo. adds b1fd168 Don't enable empty_element_tags by default. It breaks HTML::Form :( https://rt.cpan.org/Ticket/Display.html?id=16164 adds 26cd626 Adjust token counts now that empty_element_tags is not the default. adds fba150f marked_sections omit first 3 bytes "<![" from "skipped_text" https://rt.cpan.org/Ticket/Display.html?id=16207 adds 0e6a426 perl 5.6 is required. adds f1ec99b Release 3.48 adds 031ac9c First revision. adds c4065e2 Events could still fire after a handler has signaled eof. adds 3ba63e6 marked_sections with text ending in square bracket parsed wrong http://rt.cpan.org/Ticket/Display.html?id=16749 adds 4b43bac Release 3.49. adds 41bc0e9 Updated copyright year. adds 3d7e92e From: Steve Hay <steve....@uk.radan.com> Subject: [PATCH] Fix code-before-declaration error with VC++ in HTML-Parser-3.49 To: Gisle Aas <gi...@activestate.com> Date: Tue, 14 Feb 2006 16:25:16 +0000 adds 5433201 Release 3.50. adds 88ad35b Typos spotted by will...@knowmad.com. http://rt.cpan.org/Ticket/Display.html?id=18062 adds b519469 Improved MSIE compatibility. Only the Latin-1 entities expand without the trailing semicolon. adds 4e99699 First revision. adds e222f07 More tests. adds 371d20e One more ref. adds 02ef206 Updated documentation. adds fce3e20 Release 3.51. adds e5cfceb Typo fixes are also in 3.51. adds e00adee Bye. adds 2609435 Add some results. adds c95d5aa Link to search.cpan.org. adds 10c3c37 Added HTML-Parser to the result table. adds 954fc52 Safari results. adds 6a0bd24 Documentation typo fix. adds ac2d477 Make sure 'start_document' is triggered exactly once per document. adds 91e3ee5 Documentation tweaks. Recommend empty_element_tags. adds 9240d53 Documentation typo fixes. adds d20a679 Release 3.52. adds aff7ddd ignore_element treated </script> like <script>. http://rt.cpan.org/Ticket/Display.html?id=18936 adds 7bfa2d1 Release 3.53. adds cf7b0dc Enabling of empty_element_tag interacted badly with literal mode. Fixes http://rt.cpan.org//Ticket/Display.html?id=18965 adds 8a2a526 Release 3.54. adds 3d41f39 Yaakov Belch was responsible for release 3.53 and 3.54. adds b9b9835 Test that empty_element_tags works for <script/> too. adds a43519a Consider <!a'b> a comment by itself. Feedback from the AntiSpam guys at Sophos. adds eabafaf From: Gisle Aas <gi...@activestate.com> Subject: Re: Autoclose for <script> and <style> in HTML::Parser Newsgroups: gmane.comp.lang.perl.modules.lwp Cc: lib...@perl.org Date: 09 Jun 2006 01:50:00 -0700 adds 8875e60 Treat <> at end as text. adds 59c57ce Test <!a'b> comments. adds ccfa79b Release 3.55. adds ada5e9c Support threads cloning. Contributed by Bo Lindbergh. adds 0c9324a New test file. adds 6efeaa8 Release 3.56. adds 6db8378 Restore perl-5.6 compatiblity. adds ffa28c5 New year. adds b7f6dbc Remove debug printout. adds b8aef93 State Test::More dependency. https://rt.cpan.org/Ticket/Display.html?id=21387. adds bf85099 Don't require whitespace between declaration tokens. <http://rt.cpan.org/Ticket/Display.html?id=20864> adds e58f986 Extra plaintext test from Alex Kapranoff <ka...@rambler-co.ru>. adds 912ae95 Alex Kapranoff claims the closing_plaintext behaviour only occured in Firefox 1.0. <20070206143237.ga34...@capella.park.rambler.ru> adds ee9b1d4 Implement backquote() attribute as requested by Alex Kapranoff. <20070206145513.ga34...@capella.park.rambler.ru> adds 3fc553b Start using GIT to track the sources. adds 2661287 Patch by CHORNY that provide compatibility with older perls. adds 7e60bae Recognize the </script> and </style> end tags even if quoted. adds e5e4055 Parse the <iframe> content in literal/CDATA mode. adds fdab46a Release 3.57 adds 138d548 Recognize the Unicode BOM in utf8_mode as well [RT#27522] adds 8cc4600 Avoid ending up with '/' keys attribute in Link headers. adds da2490b Suppress "Parsing of undecoded UTF-8 will give garbage" warning with attr_encoded [RT#29089] adds 32a48ac Don't hardcode source line numbers [RT#38114] adds d04a2ee Release 3.58 adds 6f67ef9 Restore perl-5.6 compatiblity for HTML::HeadParser adds 04cc7e1 Tell git to ignore the dist tarballs adds f21f13e Update for GIT and other tweaks. adds baf34a6 More meta info adds 2c2bca3 Release 3.59 adds a408fd2 Spelling fixes. adds 6fda22b Test multi-value headers. adds 4af036e Documentation improvements. adds 9056415 Do not terminate head parsing on the <object> element (added in HTML 4.0). adds 06f4603 Add support for HTML 5 <meta charset> and new HEAD elements. adds ca6ece6 HTML::Parser doesn't compile with perl 5.8.0. adds 9946fcf Short description of the htextsub example adds 2b5088d Suppress warning when encode_entities is called with undef [RT#27567] adds a540419 Release 3.60. adds bbe0e91 Avoid crash (referenced pend_text instead of skipped_text) adds b893efb Test that triggers the crash that Chip fixed adds 1347607 Reference HTML::LinkExttor [RT#43164] adds 2df45a1 Complete documented list of literal tags adds 71cfecd Release 3.61 adds a8d27fb Avoid "my" variable $p masks earlier declaration warning from test adds 0423689 HTTP::Header doc typo fix. adds 2309028 Do not bother tracking style or script, they're ignored. adds 4429d49 Bring HTML 5 head elements up to date with WD-html5-20090423. adds 7a85e26 Doc patch: Make it clearer what the return value from ->parse is adds 32851f1 Improve HeadParser performance. adds e002d1d Update TODO list adds f397f25 Release 3.62 adds 6e91cf4 Take more care to prepare the char range for encode_entities [RT#50170] adds b9aae1e decode_entities confused by trailing incomplete entity adds ddbac59 Release 3.63 adds 2d1a720 Convert files to UTF-8 adds 6acae76 Don't allow decode_entities() to generate illegal Unicode chars adds 914183b Copyright 2009 adds 22b36c2 Remove rendundant (repeated) test adds ea81f5e Make parse_file() method use 3-arg open [RT#49434] adds d30f3be Release 3.64 adds f3cfa07 Fixed endianness typo [RT#50811] adds f1f22f5 Documentation fixes. adds e7b9431 Eliminate buggy entities_decode_old adds f7f7eb7 Release 3.65 adds d53f98d Fix entity decoding in utf8_mode for the title header adds 3786975 Release 3.66 adds b575543 bleadperl 2154eca7 breaks HTML::Parser 3.66 [RT#60368] adds d8a6e70 chmod +x [RT#58016] adds 7b0848b Release 3.67 adds 7b355e3 Declare the encoding of the POD to be utf8 adds f126b4e Release 3.68 adds 0087ee1 Trim surrounding whitespace from extracted URLs. adds 1e1fdec Documentation fix; encode_utf8 mixup [RT#71151] adds 5e4e410 Make it clearer that there are 2 (actually 3) options for handing "UTF-8 garbage" adds fff22c7 Github is the official repo adds 5978308 fix to TokeParser to correctly handle option configuration adds 0a6fa6b Aesthetic change: remove extra ; adds f38f8f1 Can't be bothered to try to fix the failures that occur on perl-5.6 adds f11ee5b Release 3.69 adds 736d02c Comment typo fix adds af4a57e Fix for cross-compiling with Buildroot adds fb77cfa Fix Issue #3 / RT #84144: HTML::Entities::decode_entities() needs to call SV_CHECK_THINKFIRST() before checking READONLY flag adds 6227fd9 Release 3.70 adds a8f8283 Transform ':' in headers to '-' [RT#80524] adds 4bc4979 Release 3.71 adds 15d7713 typo fix adds 1a1f83d typo fixes adds 5184435 Merge pull request #6 from dsteinbrunner/patch-1 adds 0a35724 Merge branch 'master' of github.com:gisle/html-parser adds 59d39b5 Silence clang warning adds c689e39 Avoid more clang casting warnings adds 00139ae const+static-ing adds 4fe7b98 Remove trailing whitespace adds ac31d36 Ensure entities expand to utf8 sequences under 'utf8_mode' [RT#99755] adds c50db95 Copyright 2016 adds ce42c7b Release 3.72 adds 295ddd7 Imported Upstream version 3.72 new 0091495 Merge tag 'upstream/3.72' new 6053043 Update debian/changelog new 09b5b96 Refresh patches new 89db6b7 Update year of upstream copyright new 518bb1c Update packaging copyright new c5773d2 Releasing libhtml-parser-perl version 3.72-1 The 6 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "adds" were already present in the repository and have only been added to this reference. Summary of changes: Changes | 19 +++++++++ META.json | 4 +- META.yml | 6 +-- Parser.pm | 14 ++----- Parser.xs | 12 +++--- README | 2 +- debian/changelog | 10 +++++ debian/copyright | 3 +- debian/patches/debian_examples_location.patch | 2 +- debian/patches/example_selfdocs.patch | 4 +- hparser.c | 58 ++++++++++++++------------- hparser.h | 5 +-- lib/HTML/Filter.pm | 4 +- t/unicode.t | 18 ++++++++- 14 files changed, 101 insertions(+), 60 deletions(-) -- Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/pkg-perl/packages/libhtml-parser-perl.git _______________________________________________ Pkg-perl-cvs-commits mailing list Pkg-perl-cvs-commits@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-perl-cvs-commits