Re: [Zope3-Users] Indexing PDF files
Hi, On Wed, May 10, 2006 at 03:29:34PM -0500, Sreeram Raghav wrote: [snip] Initially the only files being indexed were ZPT pages, but after writing the adapter even text files were being indexed. However the problem is that when I try to add a PDF of Word documents, the files are not being indexed and showing an error that cannot decode files. This adapter was just a demonstration on how to index a content object containing a text field. It assumes that context.data contains just a plain string. To index pdf files, you'll have to somehow convert the pdf data to plain text: from ModuleYouHaveToWrite import MagicPdfToText class SearchableTextAdapter(object): [...] def getSearchableText(self): text=MagicPdfToText(context.pdfdata) return (text,) I don't know, if there's a pure python solution for extraction text from pdf files. But you might consider calling an external program like 'pdftotxt' to do the job. However, it's your adapters responsibility to act as define by the interface and 'ISearchableText' says, the adapter must provide plain indexable text. Regards, Frank ___ Zope3-users mailing list Zope3-users@zope.org http://mail.zope.org/mailman/listinfo/zope3-users
[Zope3-Users] Indexing PDF files
Hi all, Thanks to Frank and Brend, I have successfully crated a catalog and indexed ZPTpages. Then I wanted to index text files and PDF files, so wrote an adapter like this: - adapter.py - The adapter SearchableTextAdapter adapts the interface IFile to the interface ISearchableText. Based on Frank's 'adapter.py'. from zope.index.text.interfaces import ISearchableText from zope.component import adapts from zope.interface import implements from zope.app.file.interfaces import IFile class SearchableTextAdapter: implements(ISearchableText) adapts(IFile) def __init__(self, context): self.context = context def getSearchableText(self): return self.context.data -- configure.zcml -- adapter factory = .adapter.SearchableTextAdapter / -- After doing this I restared Zope3 and then went to the Zope3 manager and added a new text file. Initially the only files being indexed were ZPT pages, but after writing the adapter even text files were being indexed. However the problem is that when I try to add a PDF of Word documents, the files are not being indexed and showing an error that cannot decode files. Will somebody please suggest me a solution fro this problem. Thanks -- Sreeram Nudurupati ___ Zope3-users mailing list Zope3-users@zope.org http://mail.zope.org/mailman/listinfo/zope3-users