Hello - of course, please open a ticket and provide patches. If people feel 
they need this, and Nutch can offer it, it should :)
Looking forward to a new parser plugin implementation.

Markus

 
 
-----Original message-----
> From:Joseph Naegele <[email protected]>
> Sent: Wednesday 6th April 2016 22:21
> To: [email protected]
> Subject: CSS parser
> 
> Hi everyone,
> 
>  
> 
> Would anyone find useful a parser for collecting outlinks from CSS
> (stylesheets)?
> 
>  
> 
> As far as I can tell Tika doesn't offer this (it looks like Tika 1.12 parses
> CSS as plain text, correct me if I'm wrong). Modern CSS often contains
> "url(.)" links to content needed to properly style pages (e.g. fonts,
> images). I have a simple, working, tested "parse-css" plugin that uses
> http://cssparser.sourceforge.net/ and parses only outlinks, but if it's not
> something that belongs in Nutch that's fine. Otherwise I'll happily open a
> pull request.
> 
>  
> 
> Thanks,
> 
> Joe
> 
> 
  • CSS parser Joseph Naegele
    • RE: CSS parser Markus Jelsma

Reply via email to