Re: [Zope3-Users] Indexing PDF files

2006-05-11 Thread Frank Burkhardt
Hi,

On Wed, May 10, 2006 at 03:29:34PM -0500, Sreeram Raghav wrote:

[snip]

 Initially the only files being indexed were ZPT pages, but after writing
 the adapter even text files were being indexed.
 However the problem is that when I try to add a PDF of Word documents, the
 files are not being indexed and showing an error that cannot decode files.

This adapter was just a demonstration on how to index a content object
containing a text field. It assumes that context.data contains just a plain
string. To index pdf files, you'll have to somehow convert the pdf data to
plain text:

from ModuleYouHaveToWrite import MagicPdfToText

class SearchableTextAdapter(object):
[...]
   def getSearchableText(self):
  text=MagicPdfToText(context.pdfdata)
  return (text,)

I don't know, if there's a pure python solution for extraction text from pdf 
files.
But you might consider calling an external program like 'pdftotxt' to do the 
job.
However, it's your adapters responsibility to act as define by the interface and
'ISearchableText' says, the adapter must provide plain indexable text.

Regards,

Frank
___
Zope3-users mailing list
Zope3-users@zope.org
http://mail.zope.org/mailman/listinfo/zope3-users


[Zope3-Users] Indexing PDF files

2006-05-10 Thread Sreeram Raghav
Hi all,
Thanks to Frank and Brend, I have successfully crated a catalog and indexed ZPTpages.
Then I wanted to index text files and PDF files, so wrote an adapter like this:
-
adapter.py
-

The adapter SearchableTextAdapter adapts the interface IFile to the interface 
ISearchableText. Based on Frank's 'adapter.py'.


from zope.index.text.interfaces import ISearchableText
from zope.component import adapts
from zope.interface import implements
from zope.app.file.interfaces import IFile

class SearchableTextAdapter:
 implements(ISearchableText)
 adapts(IFile)
 
 def __init__(self, context):
 self.context = context

 def getSearchableText(self):
 return self.context.data

--
configure.zcml 
--
adapter
 factory = .adapter.SearchableTextAdapter
/

--

After doing this I restared Zope3 and then went to the Zope3 manager and added a new text file.
Initially the only files being indexed were ZPT pages, but after writing the adapter even text files were being indexed. 
However the problem is that when I try to add a PDF of Word documents, the files are not being indexed and showing an error that cannot decode files.

Will somebody please suggest me a solution fro this problem.
Thanks
-- Sreeram Nudurupati 
___
Zope3-users mailing list
Zope3-users@zope.org
http://mail.zope.org/mailman/listinfo/zope3-users