RE: [Zope] Advice on searching/indexing Word documents?

2001-01-04 Thread Tres Seaver

[EMAIL PROTECTED] wrote:
 
 I really like the idea of extending OFS:File to support different file
 types, but what I would like to see is something that is
 format/filter/library agnostic.

Please have a look at the Hookable PUT proposal (which has already
been implemented for 2.3):

  http://dev.zope.org/Wikis/Proposals/HookablePUTCreation

This project adds an API to the handler for HTTP/FTP PUT requests to
non-existent objects (so that you can specify/tweak the object which
is created).

Handling PUT in the object directly (for WebDAV/FTP/HTTP uploads)
would be the job of your File-like object.  The PTK's content objects
do this now, for limited types of content (structured text with
RFC822-style headers for the metadata);  we plan to add other filters
there, as well.

We'd be glad of your help definining the API.  Could you take the
text of your message and create a fishbowl proposal with it on
the dev.zope.org site?

Tres.
-- 
===
Tres Seaver[EMAIL PROTECTED]
Digital Creations Zope Dealers   http://www.zope.org

___
Zope maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope-dev )




RE: [Zope] Advice on searching/indexing Word documents?

2001-01-03 Thread sean . upton

I really like the idea of extending OFS:File to support different file
types, but what I would like to see is something that is
format/filter/library agnostic.  That is to say, that perhaps the way we
ought to go about this is to create an API framework that upon upload
filters the file with a specified filter for its mime-type.  Perhaps
creating a generic base class that implements a generic API for filtering a
file, from which to extend by inheriting more specific classes for files of
particular types or groups (fine grained to mime-type or grouped in
category, eg. "Illustration"). 

Having such a generic framework would enable Zope to be an excellent
platform for digital asset management; Suppose you had a class for all files
for a particular purpose, and those files would always be of a partiaular
set of mime-types, like Illustrator, PDf, or postscript. For example, if
someone working at a newspaper creates a new file class instance called
"DisplayAd," which is used for postscript files with embedded fonts,
containing specific text, a filter set up as part of the extended class for
DisplayAd file would detect the type of file, determine it was PDF, and
filter out the text, and the face names of the embedded fonts.  If the file
was a PDF or an AI file, it would then run the appropriate filter.

It might also be nice to have a extended class (inherited from file) that
works for all types, and keeps some sort of configurable plugin registry of
sorts, so that we can create plugin classes for specific mime-types, but
only have to use one class for the objects themselves.  This might be more
practical.

One thing that seems important: creating an API like this could allow us to
write filter "plugins" in a variety of Zope supported configs, like
completely in python, a python class extending a C shared library, something
written in a combination of C/Lex, or the python-based plex scanner that was
mentioned earlier - for that matter, even proprietary user-space binaries
called via python code might be fair game...

I really think that this idea has potential as a project, and would be
willing to contribute.

Sean

-Original Message-
From: Bjorn Stabell [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, January 02, 2001 10:07 PM
To: [EMAIL PROTECTED]
Subject: RE: [Zope] Advice on searching/indexing Word documents?


This is something I've been longing for a long time.  Wvare is cool, and
it should also be able to access properties of many Windows (OLE)
documents, not just Word documents.

I've been thinking about extending the File class so that it becomes
aware of the different file types and allows access to (read/write) meta
data and indexing of the files' content.  If we can setup a nice
framework for it, I'm sure a lot of people could contribute code for
specific file formats.

Bye,
-- 
Bjorn

-Original Message-
From: Jens Vagelpohl [mailto:[EMAIL PROTECTED]]
Posted At: Wednesday, January 03, 2001 11:28
Posted To: Zope List
Conversation: [Zope] Advice on searching/indexing Word documents?
Subject: Re: [Zope] Advice on searching/indexing Word documents?


if you're on linux check out WVWare:

http://www.wvware.com

it's a C library that handles all word doc formats since 6.0 or so

jens


On Tue, 02 Jan 2001, Bowyer, Alex wrote:
 Our company has a repository of staff CVs (Resumes) as Word Documents
and I
 am about to embark on creating a new feature for our Zope Intranet to
allow
 project managers to search those documents for keywords such as
particular
 skills or projects.

 I am thinking about several possibilities such as a skills/CVs
database
 linked in via ODBC, or some task that converts the Word documents to
text
 files which can then be searched by Zope (I think Zope can do this,
and I
 assume it can't search Word format directly?).

 Has anyone ever approached a similar problem, does anyone have any
tips on
 how to index/search a load of documents in Zope?

 Any tips/suggestions/comments would be most welcome.

 Thanks,

 Alex


___
Zope maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope-dev )


___
Zope maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope-dev )

___
Zope maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope-dev )




Re: [Zope] Advice on searching/indexing Word documents?

2001-01-03 Thread Jonothan Farr

This sounds pretty exciting. Sounds like someone should set up a proposal on
dev.zope.org.I'm afraid I wouldn't be able to contribute much development right
now but I'd be willing to help test and participate in discussions.

--jfarr

- Original Message -
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Wednesday, January 03, 2001 8:25 AM
Subject: RE: [Zope] Advice on searching/indexing Word documents?


 I really like the idea of extending OFS:File to support different file
 types, but what I would like to see is something that is
 format/filter/library agnostic.  That is to say, that perhaps the way we
 ought to go about this is to create an API framework that upon upload
 filters the file with a specified filter for its mime-type.

[snip]



___
Zope maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope-dev )




RE: [Zope] Advice on searching/indexing Word documents?

2001-01-02 Thread sean . upton

I've been thinking about doing this. I wonder if there are any C filter
libraries that read word docs. The word 2000 docs are supposedly non-binary,
so you could proabaly write a parser of sorts in python or C/Lex; I used to
write text filters in C and Lex for my previous employer - one of these days
I will figure out how to extend python with C and do this.  I'm thinking
about doing this type of thing in order to make PDFs searchable (as well as
IPTC catopn data in JPG files).

Perhaps in the mean time, one could set up a macro in normal.dat template
file that ftps the doc to zope on every save and updates properties
containing the full text for the document.  Sort of kludgy, but I assume it
would work, if you were familiar with VBA coding, and had access to a http
client component.

Doing it this way would make it so you would likely have to manually reindex
the catalog.  There might be a way around that though, to automate it...

Sean

=
Sean Upton
Senior Programmer/Analyst
SignOnSanDiego.com
The San Diego Union-Tribune
619.718.5241
[EMAIL PROTECTED]
=


-Original Message-
From: Bowyer, Alex [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, January 02, 2001 2:45 PM
To: '[EMAIL PROTECTED]'
Subject: [Zope] Advice on searching/indexing Word documents?


Our company has a repository of staff CVs (Resumes) as Word Documents and I
am about to embark on creating a new feature for our Zope Intranet to allow
project managers to search those documents for keywords such as particular
skills or projects.

I am thinking about several possibilities such as a skills/CVs database
linked in via ODBC, or some task that converts the Word documents to text
files which can then be searched by Zope (I think Zope can do this, and I
assume it can't search Word format directly?).

Has anyone ever approached a similar problem, does anyone have any tips on
how to index/search a load of documents in Zope?

Any tips/suggestions/comments would be most welcome.

Thanks,

Alex

==
Alex Bowyer
IT Consultant, Logica Australasia
Tel: +61 2 9202 8130
Fax: +61 2 9922 7466
E-mail : [EMAIL PROTECTED]
WWW: http://www.logica.com.au/
==

___
Zope maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope-dev )


___
Zope maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope-dev )




Re: [Zope] Advice on searching/indexing Word documents?

2001-01-02 Thread Jonothan Farr

I used to
 write text filters in C and Lex for my previous employer - one of these days
 I will figure out how to extend python with C and do this.  

Here's one that's written entirely in Python:
http://www.cosc.canterbury.ac.nz/~greg/python/Plex/

I've seen a couple of other implementations out there.

--jfarr



___
Zope maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope-dev )




Re: [Zope] Advice on searching/indexing Word documents?

2001-01-02 Thread Phil Harris

Alex,

If your running Zope on Win32 you could use COM to snatch the text fairly 
easily and quickly.

The COM interface to Word is well documented in the help files (if they're 
not installed, you'll find them on the CD).

I used this technique to create XML from Word in the past and it works (was 
too slow for my needs tho, YMMV)

hth

Phil
[EMAIL PROTECTED]

On Tue, 02 Jan 2001, Bowyer, Alex wrote:
 Our company has a repository of staff CVs (Resumes) as Word Documents and I
 am about to embark on creating a new feature for our Zope Intranet to allow
 project managers to search those documents for keywords such as particular
 skills or projects.

 I am thinking about several possibilities such as a skills/CVs database
 linked in via ODBC, or some task that converts the Word documents to text
 files which can then be searched by Zope (I think Zope can do this, and I
 assume it can't search Word format directly?).

 Has anyone ever approached a similar problem, does anyone have any tips on
 how to index/search a load of documents in Zope?

 Any tips/suggestions/comments would be most welcome.

 Thanks,

 Alex

 ==
 Alex Bowyer
 IT Consultant, Logica Australasia
 Tel: +61 2 9202 8130
 Fax: +61 2 9922 7466
 E-mail : [EMAIL PROTECTED]
 WWW: http://www.logica.com.au/
 ==

 ___
 Zope maillist  -  [EMAIL PROTECTED]
 http://lists.zope.org/mailman/listinfo/zope
 **   No cross posts or HTML encoding!  **
 (Related lists -
  http://lists.zope.org/mailman/listinfo/zope-announce
  http://lists.zope.org/mailman/listinfo/zope-dev )

___
Zope maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope-dev )




RE: [Zope] Advice on searching/indexing Word documents?

2001-01-02 Thread sean . upton

Cool.  I'll have to take a look at this.  Does anyone know if there is any
effort aimed at writing document filters for use with Zope?  A lot of
commercial products used for knowledge management (like NextPage
LivePublish, some Intranet search engines, etc) already have features like
this, and I would think that a project for document filters would be a good
idea, if something like this doesn't already exist.

Possible things that could be filtered for input:
- The IPTC header data from a JPG/TIF image - comtains a few things like the
caption (the same one that you can edit in photoshop) - This would be a good
addition to various Image classes.
- Office documents (word, excel, powerpoint, wordperfect, staroffice, etc)
- PDF and Postscript documents
- Illustration files (Illustrator, CorelDraw)

The value of such filters to Zope for use in knowledge-management and
digital asset management would be great; I'm wondering if anyone is working
on anything like this?

Sean

-Original Message-
From: Jonothan Farr [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, January 02, 2001 4:22 PM
To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: [Zope] Advice on searching/indexing Word documents?


I used to
 write text filters in C and Lex for my previous employer - one of these
days
 I will figure out how to extend python with C and do this.  

Here's one that's written entirely in Python:
http://www.cosc.canterbury.ac.nz/~greg/python/Plex/

I've seen a couple of other implementations out there.

--jfarr


___
Zope maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope-dev )