Re: [Zope] ZCatalog attachments?

2000-08-13 Thread jan

  Simon Coles writes:
We have binary files stored in Zope, for example Word documents (but
could be any of a variety of document types).
   
We would like to be able to index and search the contents of these
files using ZCatalog. So if a Word file contains the word "Fred",
then any search for "Fred" would include that file in the list of
documents returned.
  Someone else already told you, that you must create a parameterless
  method (it need not necessary be named "PrincipiaSearchSource")
  that returns the files content.
 
  You may not need to keep the rendered version around but
  may be able to extract the plain text on demand.
  I think, there is a "word.dll" that provides access to
  MS Word from applications. Alternatively, you could
  control Word via COM.

Ther is a Perl (I know, I know...) script to convert Word DOC
files into HTML. That should work well enough to make the stuff
searchable (I would use doc2html.pl | lynx -d to get a pure ASCII
version, though).
It is probably fast enough to just render on the fly (i.e., upon
indexing).

HTH,
Jan

___
Zope maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope-dev )




Re: [Zope] ZCatalog attachments?

2000-08-05 Thread Dieter Maurer

Simon Coles writes:
  We have binary files stored in Zope, for example Word documents (but 
  could be any of a variety of document types).
  
  We would like to be able to index and search the contents of these 
  files using ZCatalog. So if a Word file contains the word "Fred", 
  then any search for "Fred" would include that file in the list of 
  documents returned.
Someone else already told you, that you must create a parameterless
method (it need not necessary be named "PrincipiaSearchSource")
that returns the files content.

You may not need to keep the rendered version around but
may be able to extract the plain text on demand.
I think, there is a "word.dll" that provides access to
MS Word from applications. Alternatively, you could
control Word via COM.


Dieter

___
Zope maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope-dev )




[Zope] ZCatalog attachments?

2000-08-04 Thread Simon Coles

Hi,

We have binary files stored in Zope, for example Word documents (but 
could be any of a variety of document types).

We would like to be able to index and search the contents of these 
files using ZCatalog. So if a Word file contains the word "Fred", 
then any search for "Fred" would include that file in the list of 
documents returned.

Is anyone doing something like this? If so, how?



Simon
-- 
- My opinions are my own, NIP's opinions are theirs --
Simon J. Coles Email: [EMAIL PROTECTED]
New Information Paradigms  Work Phone: +44 1344 753703
http://www.nipltd.com/ Work Fax:   +44 1344 753742
=== Life is too precious to take seriously ===

___
Zope maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope-dev )




Re: [Zope] ZCatalog attachments?

2000-08-04 Thread Aleksander Salwa


On Fri, 4 Aug 2000, Simon Coles wrote:

 We have binary files stored in Zope, for example Word documents (but 
 could be any of a variety of document types).
 
 We would like to be able to index and search the contents of these 
 files using ZCatalog. So if a Word file contains the word "Fred", 
 then any search for "Fred" would include that file in the list of 
 documents returned.
 
 Is anyone doing something like this? If so, how?
 

Simple search in binary data of course won't do it, because of complex
format of Word documents. So:
Try to keep beside every document its 'rendered' version - converted to
plain text (created by saving them with Word in plain text format).
Then create class representing your document. This class should provide
parameterless method 'PrincipiaSearchSource' returning rendered version of
document. However, it's untested - but seems to be a step in right
direction ;)


[EMAIL PROTECTED]

/--\
| `long long long' is too long for GCC |
\--/



___
Zope maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope-dev )