There can be pdf files of same name at different hosts so using the url
would be better as compared to name. All this info can be in a xml file
which will be read by the pdf plugin.

Thanks,
Tejas Patil


On Wed, Feb 13, 2013 at 10:35 AM, Jorge Luis Betancourt Gonzalez <
[email protected]> wrote:

> Which could be a good way of specifying which password goes with which PDF
> file? by full URI or by filename? other?
>
> ----- Mensaje original -----
> De: "Julien Nioche" <[email protected]>
> Para: [email protected], "John Dhabolt" <[email protected]>
> Enviados: MiƩrcoles, 13 de Febrero 2013 13:04:27
> Asunto: Re: How do I pass a password to Tika from Nutch for encrypted PDFs?
>
> Hi John,
>
> Currently not but it should be relatively straightforward to modify
> parse-tika to do so and would be a nice contribution to Nutch
>
> Julien
>
> On 13 February 2013 13:53, John Dhabolt <[email protected]> wrote:
>
> > Hi,
> >
> > We have PDFs we need to crawl that have a password associated. I don't
> see
> > a way to pass this password to Tika. Apparently prior to Tika 1.1 the
> > password would have been passed in Tika metadata. In Tika 1.1 and
> greater,
> > they've added a new ParseContext object, PasswordProvider, which adds a
> > getPassword method. Are either of these methods available to Nutch 1.6
> > through a property setting?
> >
>
>
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
>

Reply via email to