[ 
https://issues.apache.org/jira/browse/TIKA-337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Nioche updated TIKA-337:
-------------------------------

    Attachment: test.swf

test file for the swf parser

> SWF parser
> ----------
>
>                 Key: TIKA-337
>                 URL: https://issues.apache.org/jira/browse/TIKA-337
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Julien Nioche
>            Assignee: Jukka Zitting
>         Attachments: test.swf, TIKA-337.patch
>
>
> Here is an initial implementation of a SWF Parser which uses JavaSWF and has 
> been adapted from  A. Bialecki's implementation for Nutch.
> The main differences with the implementation for Nutch is that we use the 
> latest version of JavaSWF and do not try to extract text from the actions or 
> structured URLs. As usual URLs can be obtained from the text extracted using 
> ParserPostProcessor.
> JavaSWF has changed quite a bit since the Nutch integration and I wanted to 
> keep this initial port nice and simple. It should be possible to extract the 
> URLs from the actions using  JavaSWF's API, I think this is what they did in 
> Heritrix.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to