I got this, but really a tedious work to list passwords for each PDF file that will be crawled, don't you think?
----- Mensaje original ----- De: "Tejas Patil" <[email protected]> Para: [email protected] Enviados: MiƩrcoles, 13 de Febrero 2013 14:03:21 Asunto: Re: How do I pass a password to Tika from Nutch for encrypted PDFs? There can be pdf files of same name at different hosts so using the url would be better as compared to name. All this info can be in a xml file which will be read by the pdf plugin. Thanks, Tejas Patil On Wed, Feb 13, 2013 at 10:35 AM, Jorge Luis Betancourt Gonzalez < [email protected]> wrote: > Which could be a good way of specifying which password goes with which PDF > file? by full URI or by filename? other? > > ----- Mensaje original ----- > De: "Julien Nioche" <[email protected]> > Para: [email protected], "John Dhabolt" <[email protected]> > Enviados: MiƩrcoles, 13 de Febrero 2013 13:04:27 > Asunto: Re: How do I pass a password to Tika from Nutch for encrypted PDFs? > > Hi John, > > Currently not but it should be relatively straightforward to modify > parse-tika to do so and would be a nice contribution to Nutch > > Julien > > On 13 February 2013 13:53, John Dhabolt <[email protected]> wrote: > > > Hi, > > > > We have PDFs we need to crawl that have a password associated. I don't > see > > a way to pass this password to Tika. Apparently prior to Tika 1.1 the > > password would have been passed in Tika metadata. In Tika 1.1 and > greater, > > they've added a new ParseContext object, PasswordProvider, which adds a > > getPassword method. Are either of these methods available to Nutch 1.6 > > through a property setting? > > > > > > -- > * > *Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com > http://twitter.com/digitalpebble >

