RE: [Zope] Advice on searching/indexing Word documents?
[EMAIL PROTECTED] wrote: > > I really like the idea of extending OFS:File to support different file > types, but what I would like to see is something that is > format/filter/library agnostic. Please have a look at the "Hookable PUT" proposal (which has already been implemented for 2.3): http://dev.zope.org/Wikis/Proposals/HookablePUTCreation This project adds an API to the handler for HTTP/FTP PUT requests to non-existent objects (so that you can specify/tweak the object which is created). Handling PUT in the object directly (for WebDAV/FTP/HTTP "uploads") would be the job of your File-like object. The PTK's content objects do this now, for limited types of content (structured text with RFC822-style headers for the metadata); we plan to add other filters there, as well. We'd be glad of your help definining the API. Could you take the text of your message and create a "fishbowl" proposal with it on the dev.zope.org site? Tres. -- === Tres Seaver[EMAIL PROTECTED] Digital Creations "Zope Dealers" http://www.zope.org ___ Zope maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] Advice on searching/indexing Word documents?
This sounds pretty exciting. Sounds like someone should set up a proposal on dev.zope.org.I'm afraid I wouldn't be able to contribute much development right now but I'd be willing to help test and participate in discussions. --jfarr - Original Message - From: <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Wednesday, January 03, 2001 8:25 AM Subject: RE: [Zope] Advice on searching/indexing Word documents? > I really like the idea of extending OFS:File to support different file > types, but what I would like to see is something that is > format/filter/library agnostic. That is to say, that perhaps the way we > ought to go about this is to create an API framework that upon upload > filters the file with a specified filter for its mime-type. [snip] ___ Zope maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
RE: [Zope] Advice on searching/indexing Word documents?
I really like the idea of extending OFS:File to support different file types, but what I would like to see is something that is format/filter/library agnostic. That is to say, that perhaps the way we ought to go about this is to create an API framework that upon upload filters the file with a specified filter for its mime-type. Perhaps creating a generic base class that implements a generic API for filtering a file, from which to extend by inheriting more specific classes for files of particular types or groups (fine grained to mime-type or grouped in category, eg. "Illustration"). Having such a generic framework would enable Zope to be an excellent platform for digital asset management; Suppose you had a class for all files for a particular purpose, and those files would always be of a partiaular set of mime-types, like Illustrator, PDf, or postscript. For example, if someone working at a newspaper creates a new file class instance called "DisplayAd," which is used for postscript files with embedded fonts, containing specific text, a filter set up as part of the extended class for DisplayAd file would detect the type of file, determine it was PDF, and filter out the text, and the face names of the embedded fonts. If the file was a PDF or an AI file, it would then run the appropriate filter. It might also be nice to have a extended class (inherited from file) that works for all types, and keeps some sort of configurable plugin registry of sorts, so that we can create plugin classes for specific mime-types, but only have to use one class for the objects themselves. This might be more practical. One thing that seems important: creating an API like this could allow us to write filter "plugins" in a variety of Zope supported configs, like completely in python, a python class extending a C shared library, something written in a combination of C/Lex, or the python-based plex scanner that was mentioned earlier - for that matter, even proprietary user-space binaries called via python code might be fair game... I really think that this idea has potential as a project, and would be willing to contribute. Sean -Original Message- From: Bjorn Stabell [mailto:[EMAIL PROTECTED]] Sent: Tuesday, January 02, 2001 10:07 PM To: [EMAIL PROTECTED] Subject: RE: [Zope] Advice on searching/indexing Word documents? This is something I've been longing for a long time. Wvare is cool, and it should also be able to access properties of many Windows (OLE) documents, not just Word documents. I've been thinking about extending the File class so that it becomes aware of the different file types and allows access to (read/write) meta data and indexing of the files' content. If we can setup a nice framework for it, I'm sure a lot of people could contribute code for specific file formats. Bye, -- Bjorn -Original Message- From: Jens Vagelpohl [mailto:[EMAIL PROTECTED]] Posted At: Wednesday, January 03, 2001 11:28 Posted To: Zope List Conversation: [Zope] Advice on searching/indexing Word documents? Subject: Re: [Zope] Advice on searching/indexing Word documents? if you're on linux check out WVWare: http://www.wvware.com it's a C library that handles all word doc formats since 6.0 or so jens On Tue, 02 Jan 2001, Bowyer, Alex wrote: > Our company has a repository of staff CVs (Resumes) as Word Documents and I > am about to embark on creating a new feature for our Zope Intranet to allow > project managers to search those documents for keywords such as particular > skills or projects. > > I am thinking about several possibilities such as a skills/CVs database > linked in via ODBC, or some task that converts the Word documents to text > files which can then be searched by Zope (I think Zope can do this, and I > assume it can't search Word format directly?). > > Has anyone ever approached a similar problem, does anyone have any tips on > how to index/search a load of documents in Zope? > > Any tips/suggestions/comments would be most welcome. > > Thanks, > > Alex ___ Zope maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev ) ___ Zope maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev ) ___ Zope maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
RE: [Zope] Advice on searching/indexing Word documents?
This is something I've been longing for a long time. Wvare is cool, and it should also be able to access properties of many Windows (OLE) documents, not just Word documents. I've been thinking about extending the File class so that it becomes aware of the different file types and allows access to (read/write) meta data and indexing of the files' content. If we can setup a nice framework for it, I'm sure a lot of people could contribute code for specific file formats. Bye, -- Bjorn -Original Message- From: Jens Vagelpohl [mailto:[EMAIL PROTECTED]] Posted At: Wednesday, January 03, 2001 11:28 Posted To: Zope List Conversation: [Zope] Advice on searching/indexing Word documents? Subject: Re: [Zope] Advice on searching/indexing Word documents? if you're on linux check out WVWare: http://www.wvware.com it's a C library that handles all word doc formats since 6.0 or so jens On Tue, 02 Jan 2001, Bowyer, Alex wrote: > Our company has a repository of staff CVs (Resumes) as Word Documents and I > am about to embark on creating a new feature for our Zope Intranet to allow > project managers to search those documents for keywords such as particular > skills or projects. > > I am thinking about several possibilities such as a skills/CVs database > linked in via ODBC, or some task that converts the Word documents to text > files which can then be searched by Zope (I think Zope can do this, and I > assume it can't search Word format directly?). > > Has anyone ever approached a similar problem, does anyone have any tips on > how to index/search a load of documents in Zope? > > Any tips/suggestions/comments would be most welcome. > > Thanks, > > Alex ___ Zope maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev ) ___ Zope maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] Advice on searching/indexing Word documents?
if you're on linux check out WVWare: http://www.wvware.com it's a C library that handles all word doc formats since 6.0 or so jens On Tue, 02 Jan 2001, Bowyer, Alex wrote: > Our company has a repository of staff CVs (Resumes) as Word Documents and I > am about to embark on creating a new feature for our Zope Intranet to allow > project managers to search those documents for keywords such as particular > skills or projects. > > I am thinking about several possibilities such as a skills/CVs database > linked in via ODBC, or some task that converts the Word documents to text > files which can then be searched by Zope (I think Zope can do this, and I > assume it can't search Word format directly?). > > Has anyone ever approached a similar problem, does anyone have any tips on > how to index/search a load of documents in Zope? > > Any tips/suggestions/comments would be most welcome. > > Thanks, > > Alex ___ Zope maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
RE: [Zope] Advice on searching/indexing Word documents?
Cool. I'll have to take a look at this. Does anyone know if there is any effort aimed at writing document filters for use with Zope? A lot of commercial products used for knowledge management (like NextPage LivePublish, some Intranet search engines, etc) already have features like this, and I would think that a project for document filters would be a good idea, if something like this doesn't already exist. Possible things that could be filtered for input: - The IPTC header data from a JPG/TIF image - comtains a few things like the caption (the same one that you can edit in photoshop) - This would be a good addition to various Image classes. - Office documents (word, excel, powerpoint, wordperfect, staroffice, etc) - PDF and Postscript documents - Illustration files (Illustrator, CorelDraw) The value of such filters to Zope for use in knowledge-management and digital asset management would be great; I'm wondering if anyone is working on anything like this? Sean -Original Message- From: Jonothan Farr [mailto:[EMAIL PROTECTED]] Sent: Tuesday, January 02, 2001 4:22 PM To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: [Zope] Advice on searching/indexing Word documents? >I used to > write text filters in C and Lex for my previous employer - one of these days > I will figure out how to extend python with C and do this. Here's one that's written entirely in Python: http://www.cosc.canterbury.ac.nz/~greg/python/Plex/ I've seen a couple of other implementations out there. --jfarr ___ Zope maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] Advice on searching/indexing Word documents?
Alex, If your running Zope on Win32 you could use COM to snatch the text fairly easily and quickly. The COM interface to Word is well documented in the help files (if they're not installed, you'll find them on the CD). I used this technique to create XML from Word in the past and it works (was too slow for my needs tho, YMMV) hth Phil [EMAIL PROTECTED] On Tue, 02 Jan 2001, Bowyer, Alex wrote: > Our company has a repository of staff CVs (Resumes) as Word Documents and I > am about to embark on creating a new feature for our Zope Intranet to allow > project managers to search those documents for keywords such as particular > skills or projects. > > I am thinking about several possibilities such as a skills/CVs database > linked in via ODBC, or some task that converts the Word documents to text > files which can then be searched by Zope (I think Zope can do this, and I > assume it can't search Word format directly?). > > Has anyone ever approached a similar problem, does anyone have any tips on > how to index/search a load of documents in Zope? > > Any tips/suggestions/comments would be most welcome. > > Thanks, > > Alex > > == > Alex Bowyer > IT Consultant, Logica Australasia > Tel: +61 2 9202 8130 > Fax: +61 2 9922 7466 > E-mail : [EMAIL PROTECTED] > WWW: http://www.logica.com.au/ > == > > ___ > Zope maillist - [EMAIL PROTECTED] > http://lists.zope.org/mailman/listinfo/zope > ** No cross posts or HTML encoding! ** > (Related lists - > http://lists.zope.org/mailman/listinfo/zope-announce > http://lists.zope.org/mailman/listinfo/zope-dev ) ___ Zope maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] Advice on searching/indexing Word documents?
>I used to > write text filters in C and Lex for my previous employer - one of these days > I will figure out how to extend python with C and do this. Here's one that's written entirely in Python: http://www.cosc.canterbury.ac.nz/~greg/python/Plex/ I've seen a couple of other implementations out there. --jfarr ___ Zope maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
RE: [Zope] Advice on searching/indexing Word documents?
I've been thinking about doing this. I wonder if there are any C filter libraries that read word docs. The word 2000 docs are supposedly non-binary, so you could proabaly write a parser of sorts in python or C/Lex; I used to write text filters in C and Lex for my previous employer - one of these days I will figure out how to extend python with C and do this. I'm thinking about doing this type of thing in order to make PDFs searchable (as well as IPTC catopn data in JPG files). Perhaps in the mean time, one could set up a macro in normal.dat template file that ftps the doc to zope on every save and updates properties containing the full text for the document. Sort of kludgy, but I assume it would work, if you were familiar with VBA coding, and had access to a http client component. Doing it this way would make it so you would likely have to manually reindex the catalog. There might be a way around that though, to automate it... Sean = Sean Upton Senior Programmer/Analyst SignOnSanDiego.com The San Diego Union-Tribune 619.718.5241 [EMAIL PROTECTED] = -Original Message- From: Bowyer, Alex [mailto:[EMAIL PROTECTED]] Sent: Tuesday, January 02, 2001 2:45 PM To: '[EMAIL PROTECTED]' Subject: [Zope] Advice on searching/indexing Word documents? Our company has a repository of staff CVs (Resumes) as Word Documents and I am about to embark on creating a new feature for our Zope Intranet to allow project managers to search those documents for keywords such as particular skills or projects. I am thinking about several possibilities such as a skills/CVs database linked in via ODBC, or some task that converts the Word documents to text files which can then be searched by Zope (I think Zope can do this, and I assume it can't search Word format directly?). Has anyone ever approached a similar problem, does anyone have any tips on how to index/search a load of documents in Zope? Any tips/suggestions/comments would be most welcome. Thanks, Alex == Alex Bowyer IT Consultant, Logica Australasia Tel: +61 2 9202 8130 Fax: +61 2 9922 7466 E-mail : [EMAIL PROTECTED] WWW: http://www.logica.com.au/ == ___ Zope maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev ) ___ Zope maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )