Hello - of course, please open a ticket and provide patches. If people feel they need this, and Nutch can offer it, it should :) Looking forward to a new parser plugin implementation.
Markus -----Original message----- > From:Joseph Naegele <[email protected]> > Sent: Wednesday 6th April 2016 22:21 > To: [email protected] > Subject: CSS parser > > Hi everyone, > > > > Would anyone find useful a parser for collecting outlinks from CSS > (stylesheets)? > > > > As far as I can tell Tika doesn't offer this (it looks like Tika 1.12 parses > CSS as plain text, correct me if I'm wrong). Modern CSS often contains > "url(.)" links to content needed to properly style pages (e.g. fonts, > images). I have a simple, working, tested "parse-css" plugin that uses > http://cssparser.sourceforge.net/ and parses only outlinks, but if it's not > something that belongs in Nutch that's fine. Otherwise I'll happily open a > pull request. > > > > Thanks, > > Joe > >

