[ http://issues.apache.org/jira/browse/NUTCH-54?page=all ]

Andrzej Bialecki  updated NUTCH-54:
-----------------------------------

    Attachment: parsestatus.patch

* HTML "meta" tags processor has been extended, so that it collects and 
processes all meta tags. Convenience methods have been added to handle refresh 
meta tag, so that Fetcher can support multiple redirects.

* the interaction between content parsers and their users has been changed, 
from exception-driven to status-driven. This gives a much better control over 
the logic flow, and enables us to communicate more information than just a 
plaintext message. I plan to make similar changes for protocol handlers, which 
should greatly simplify the logic in Fetcher.

* preliminary changes to Fetcher to support automatic redirection loop, if 
parsers report a "refresh" meta directive.

* scaffolding to support parsing complete pages (i.e. pages fetched together 
with all their elements, such as JavaScript and CSS).

Any comments and suggestions are welcome!

> Fetcher  improvements
> ---------------------
>
>          Key: NUTCH-54
>          URL: http://issues.apache.org/jira/browse/NUTCH-54
>      Project: Nutch
>         Type: Improvement
>   Components: fetcher
>     Reporter: Andrzej Bialecki 
>     Assignee: Andrzej Bialecki 
>  Attachments: parsestatus.patch
>
> Fetcher improvements.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



-------------------------------------------------------
This SF.Net email is sponsored by: NEC IT Guy Games.
Get your fingers limbered up and give it your best shot. 4 great events, 4
opportunities to win big! Highest score wins.NEC IT Guy Games. Play to
win an NEC 61 plasma display. Visit http://www.necitguy.com/?r=20
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to