Re: [Zope] ZCatalog attachments?

2000-08-13 Thread jan

> > Simon Coles writes:
> >  > We have binary files stored in Zope, for example Word documents (but
> >  > could be any of a variety of document types).
> >  >
> >  > We would like to be able to index and search the contents of these
> >  > files using ZCatalog. So if a Word file contains the word "Fred",
> >  > then any search for "Fred" would include that file in the list of
> >  > documents returned.
> > Someone else already told you, that you must create a parameterless
> > method (it need not necessary be named "PrincipiaSearchSource")
> > that returns the files content.
> >
> > You may not need to keep the rendered version around but
> > may be able to extract the plain text on demand.
> > I think, there is a "word.dll" that provides access to
> > MS Word from applications. Alternatively, you could
> > control Word via COM.

Ther is a Perl (I know, I know...) script to convert Word DOC
files into HTML. That should work well enough to make the stuff
searchable (I would use doc2html.pl | lynx -d to get a pure ASCII
version, though).
It is probably fast enough to just render on the fly (i.e., upon
indexing).

HTH,
Jan

___
Zope maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope-dev )




Re: [Zope] ZCatalog attachments?

2000-08-05 Thread Phil Harris

All,

Doing the text pulling from COM is *SLOW* to say the least,  You'd probabnly
be better converting them to RTF and then using something like OmniMark to
convert to XML.

That way you'd have the best of both worlds, including something you can
render to HTML when zDOM/zXSLT becomes a reality.

I already do this and it's fast enough at the conversion, a 500k doc takes
about 2 seconds.

hth

Phil
[EMAIL PROTECTED]
- Original Message -
From: Dieter Maurer <[EMAIL PROTECTED]>
To: Simon Coles <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Friday, August 04, 2000 9:06 PM
Subject: Re: [Zope] ZCatalog attachments?


> Simon Coles writes:
>  > We have binary files stored in Zope, for example Word documents (but
>  > could be any of a variety of document types).
>  >
>  > We would like to be able to index and search the contents of these
>  > files using ZCatalog. So if a Word file contains the word "Fred",
>  > then any search for "Fred" would include that file in the list of
>  > documents returned.
> Someone else already told you, that you must create a parameterless
> method (it need not necessary be named "PrincipiaSearchSource")
> that returns the files content.
>
> You may not need to keep the rendered version around but
> may be able to extract the plain text on demand.
> I think, there is a "word.dll" that provides access to
> MS Word from applications. Alternatively, you could
> control Word via COM.
>
>
> Dieter
>
> ___
> Zope maillist  -  [EMAIL PROTECTED]
> http://lists.zope.org/mailman/listinfo/zope
> **   No cross posts or HTML encoding!  **
> (Related lists -
>  http://lists.zope.org/mailman/listinfo/zope-announce
>  http://lists.zope.org/mailman/listinfo/zope-dev )


___
Zope maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope-dev )




Re: [Zope] ZCatalog attachments?

2000-08-05 Thread Dieter Maurer

Simon Coles writes:
 > We have binary files stored in Zope, for example Word documents (but 
 > could be any of a variety of document types).
 > 
 > We would like to be able to index and search the contents of these 
 > files using ZCatalog. So if a Word file contains the word "Fred", 
 > then any search for "Fred" would include that file in the list of 
 > documents returned.
Someone else already told you, that you must create a parameterless
method (it need not necessary be named "PrincipiaSearchSource")
that returns the files content.

You may not need to keep the rendered version around but
may be able to extract the plain text on demand.
I think, there is a "word.dll" that provides access to
MS Word from applications. Alternatively, you could
control Word via COM.


Dieter

___
Zope maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope-dev )




Re: [Zope] ZCatalog attachments?

2000-08-04 Thread Aleksander Salwa


On Fri, 4 Aug 2000, Simon Coles wrote:

> We have binary files stored in Zope, for example Word documents (but 
> could be any of a variety of document types).
> 
> We would like to be able to index and search the contents of these 
> files using ZCatalog. So if a Word file contains the word "Fred", 
> then any search for "Fred" would include that file in the list of 
> documents returned.
> 
> Is anyone doing something like this? If so, how?
> 

Simple search in binary data of course won't do it, because of complex
format of Word documents. So:
Try to keep beside every document its 'rendered' version - converted to
plain text (created by saving them with Word in plain text format).
Then create class representing your document. This class should provide
parameterless method 'PrincipiaSearchSource' returning rendered version of
document. However, it's untested - but seems to be a step in right
direction ;)


[EMAIL PROTECTED]

/--\
| `long long long' is too long for GCC |
\--/



___
Zope maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope-dev )