Re: HTML::Parser bug

2005-03-21 Thread Reinier Post
On Sun, Mar 20, 2005 at 01:51:25PM -0800, Bill Moseley wrote:
 On Sun, Mar 20, 2005 at 06:02:26PM +0300, [EMAIL PROTECTED] wrote:
  Hello libwww,
  
  using it to parse html-forms etc...
  noticed, that it recognizes strange comment
  like !-- as starting of the comment,
  not like the whole empty comment, as IE.
 
 Doesn't seem like that's a valid comment.
 
 http://www.w3.org/TR/WD-html40-970917/intro/sgmltut.html#h-3.1.4

Well, the HTML:Parser perldoc says:

  HTML::Parser is not a generic SGML parser. We have tried to make it
  able to deal with the HTML that is actually out there, and it normally
  parses as closely as possible to the way the popular web browsers do it
  instead of strictly following one of the many HTML specifications from
  W3C. Where there is disagreement, there is often an option that you can
  enable to get the official behaviour.

But do all versions of IE parse this the same way?
What do other popular user agents do?

-- 
Reinier


Re: HTML::Parser bug

2005-03-21 Thread Bill Moseley
On Mon, Mar 21, 2005 at 06:51:42PM +0100, Reinier Post wrote:
 On Sun, Mar 20, 2005 at 01:51:25PM -0800, Bill Moseley wrote:
  On Sun, Mar 20, 2005 at 06:02:26PM +0300, [EMAIL PROTECTED] wrote:
   Hello libwww,
   
   using it to parse html-forms etc...
   noticed, that it recognizes strange comment
   like !-- as starting of the comment,
   not like the whole empty comment, as IE.
  
  Doesn't seem like that's a valid comment.
  
  http://www.w3.org/TR/WD-html40-970917/intro/sgmltut.html#h-3.1.4
 
 Well, the HTML:Parser perldoc says:
 
   HTML::Parser is not a generic SGML parser. We have tried to make it
   able to deal with the HTML that is actually out there, and it normally
   parses as closely as possible to the way the popular web browsers do it
   instead of strictly following one of the many HTML specifications from
   W3C. Where there is disagreement, there is often an option that you can
   enable to get the official behaviour.

Hard to imagine handling every possibility as an option.

I would have thought an empty comment would be at a minimum:

  !-- --

or maybe 

  !

although I'm still trying to grasp the concept of an empty comment.


-- 
Bill Moseley
[EMAIL PROTECTED]



RE: HTML::Parser bug

2005-03-21 Thread Cahoon, Forrest
Although not identical to your short comment, Microsoft intentionally
uses similar comments like 

!--[if gte mso 9] (something read by MSIE 5+ but correctly considered
to be a comment by other browsers) ![endif]--

See http://office.microsoft.com/en-us/assistance/HA010549981033.aspx for
more info.

Forrest Cahoon
not speaking for merrill corporation

 -Original Message-
 From: Reinier Post [mailto:[EMAIL PROTECTED] 
 Sent: Monday, March 21, 2005 11:52 AM
 To: libwww@perl.org
 Subject: Re: HTML::Parser bug
 
 On Sun, Mar 20, 2005 at 01:51:25PM -0800, Bill Moseley wrote:
  On Sun, Mar 20, 2005 at 06:02:26PM +0300, [EMAIL PROTECTED] wrote:
   Hello libwww,
   
   using it to parse html-forms etc...
   noticed, that it recognizes strange comment like !-- as 
 starting 
   of the comment, not like the whole empty comment, as IE.
  
  Doesn't seem like that's a valid comment.
  
  http://www.w3.org/TR/WD-html40-970917/intro/sgmltut.html#h-3.1.4
 
 Well, the HTML:Parser perldoc says:
 
   HTML::Parser is not a generic SGML parser. We have tried to make it
   able to deal with the HTML that is actually out there, 
 and it normally
   parses as closely as possible to the way the popular web 
 browsers do it
   instead of strictly following one of the many HTML 
 specifications from
   W3C. Where there is disagreement, there is often an option 
 that you can
   enable to get the official behaviour.
 
 But do all versions of IE parse this the same way?
 What do other popular user agents do?
 
 --
 Reinier
 


Re: HTML::Parser bug

2005-03-20 Thread Andy Lester
like !-- as starting of the comment,
not like the whole empty comment, as IE.
Lots of browsers allow crap that modules don't.
--
Andy Lester = [EMAIL PROTECTED] = www.petdance.com = AIM:petdance


Re: HTML::Parser bug?

2001-08-30 Thread Randal L. Schwartz

 Pedro == Pedro ProençA [EMAIL PROTECTED] writes:

Pedro Hi all,
Pedro When I pass the following string to HTML::Parser:parse()

Pedro String containing entities to be replaced, for instance uarr2;a;

Pedro this is what I get in my text handler:

Pedro String containing entities to be replaced, for instance

Pedro I am using Perl 5.6.0 on Mandrake Linux 8.0 (kernel 2.4.3-20mdk) and
Pedro the latest HTML::Parser version (3.25).
Pedro It his a known problem?  Is there any work around it?

$ perl
use HTML::Parser;
my @a;
my $p = HTML::Parser-new( handlers = { text = [\@a, text ] });
$p-parse(String containing entities to be replaced, for instance uarr2;a);
$p-eof;

print map [$_-[0]], @a;
^D
[String containing entities to be replaced, for instance][ uarr2;a]
$ 

Looks fine to me.  Try that example.  Notice that it pulls it in two pieces.
That's expected unless you also set $p-unbroken_text(1) before parsing.

print Just another Perl hacker,;
-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
[EMAIL PROTECTED] URL:http://www.stonehenge.com/merlyn/
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!



Re: HTML::Parser bug?

2000-07-07 Thread Michael A. Chase

Perhaps you could give us an example of the text you are trying to parse
that includes a comment that gets passed to the 'comment' event handler, but
doesn't get passed to the 'default' event handler when the 'comment' handler
isn't defined.

A short example script that shows the problem would also be handy.  I'd be
especially interested in seeing all HTML::Parser method calls..
--
Mac :})
** I may forward private database questions to the DBI mail lists. **
- Original Message -
From: "Hugo Haas" [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Friday, July 07, 2000 3:28 PM
Subject: HTML::Parser bug?


 The man page says about handlers:

Events

Handlers for the following events can be registered:

[..]

default
 This event is triggered for events that do not have a
 specific handler.  You can set up a handler for this
 event to catch stuff you did not want to catch
 explicitly.

 so I didn't assign any handler to the comment event, thinking default
 would be called:

   $p-handler(default = 'text', 'self, text');

 This doesn't do what I was expecting whereas:

   $p-handler(comment = 'text', 'self, text');
   $p-handler(default = 'text', 'self, text');

 this does (with version 3.08 and 3.10).

 Is it me not reading the documentation right (in that case, I think that
 it is unclear) or is it a bug?





Re: HTML::Parser bug?

2000-07-07 Thread Hugo Haas

On Fri, Jul 07, 2000, Michael A. Chase wrote:
 Perhaps you could give us an example of the text you are trying to parse
 that includes a comment that gets passed to the 'comment' event handler, but
 doesn't get passed to the 'default' event handler when the 'comment' handler
 isn't defined.

Sorry, I realized that I sent my example without enough details but you
replied before I could submit an example.

 A short example script that shows the problem would also be handy.  I'd be
 especially interested in seeing all HTML::Parser method calls..

I was running a test on an excerpt of an HTML file (this is not valid
HTML by itself, but I did that to isolate the problem):

!-- test
-- a href="fdasfafdas"/a

Here's a sample script:

use strict;
require HTML::Parser;
my $p = HTML::Parser-new;
$p-handler(@EVENT@ = \text, 'text');
$p-parse_file('/tmp/foo.html');

sub text() {
  my ($t) = @_;
  print $t . "\n";
}

With @EVENT@ being 'comment':

[hugo:pts/2] larve:~ perl -w test.pl 
!-- test
--
[hugo:pts/2] larve:~ 

With @EVENT@ being 'default':

[hugo:pts/2] larve:~ perl -w test.pl 
[hugo:pts/2] larve:~ 

-- 
Hugo Haas, Webmaster, Systems Team - W3C/MIT
mailto:[EMAIL PROTECTED] - tel:+1-617-452-2092



Re: HTML::Parser bug?

2000-07-07 Thread Hugo Haas

On Fri, Jul 07, 2000, Michael A. Chase wrote:
 I quote:
If new() is called without any arguments, it will create a parser that
uses callback methods compatible with version 2 of CHTML::Parser.
See the section on "version 2 compatibility" below for details.
 
 A HTML::Parser v2 compatable parser has handlers defined for the usual
 events so the default handler does not get called.

I missed that! Sorry for the trouble. It works much better like that
indeed.

Thanks,

Hugo