There can be pdf files of same name at different hosts so using the url would be better as compared to name. All this info can be in a xml file which will be read by the pdf plugin.
Thanks, Tejas Patil On Wed, Feb 13, 2013 at 10:35 AM, Jorge Luis Betancourt Gonzalez < [email protected]> wrote: > Which could be a good way of specifying which password goes with which PDF > file? by full URI or by filename? other? > > ----- Mensaje original ----- > De: "Julien Nioche" <[email protected]> > Para: [email protected], "John Dhabolt" <[email protected]> > Enviados: MiƩrcoles, 13 de Febrero 2013 13:04:27 > Asunto: Re: How do I pass a password to Tika from Nutch for encrypted PDFs? > > Hi John, > > Currently not but it should be relatively straightforward to modify > parse-tika to do so and would be a nice contribution to Nutch > > Julien > > On 13 February 2013 13:53, John Dhabolt <[email protected]> wrote: > > > Hi, > > > > We have PDFs we need to crawl that have a password associated. I don't > see > > a way to pass this password to Tika. Apparently prior to Tika 1.1 the > > password would have been passed in Tika metadata. In Tika 1.1 and > greater, > > they've added a new ParseContext object, PasswordProvider, which adds a > > getPassword method. Are either of these methods available to Nutch 1.6 > > through a property setting? > > > > > > -- > * > *Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com > http://twitter.com/digitalpebble >

