I am sure this is going to end up a long email so I will apologise in advance.
I am looking for some advice as to what direction to take in a project I am considering at work. We currently use Lotus Notes for email and document storage. (This is my fault, a decision made 6 or 7 years ago before I new better.) We have a little over a thousand 1 or 2 page documents stored in native notes format. They are things like product specifications, recipies, etc. These are available to staff internally using native notes access. Notes serves these up to the web for customer access dynamically converting them to html. It also provides indexing, searching and browsing. These documents tend to be in only about 6 standard formats. I want to move away from Notes for a number of reasons including - Notes is proprietary. - Notes native is OK but Notes web is slow - I am keen to serve the documents as pdf's. These seem to be reasonably universally readable and have several other advantages. They make it much more difficult for end users to modify. We have had problems with people downloading large portions of our web site and presenting it as their own. We can include things like a watermark. However, storing as pdf seems like a bad option. I have been thinking about this for a while and as usual with Linux there is more than one way to do it. I have lots of ideas and questions bumping around in my head. None seems to jump out as an ideal solution. The purpose of this email is to seek advice about what direction to head and / or where to look for more information. My ideas/questions include: - we could edit the documents in something like OpenOffice, save as postscript and convert from there. This would provide flexiblity and a good editing platform but I think it would be slow and cumbersome. OpenOfiice is supposed to store as XML. However, a cursory look at a file created by OpenOffice didn't show anything human readable. - should I store the text in a database or in individual files. The database encourages structure and the possiblility to make simple changes across all documents. Storing as individual files provides flexibility at the expense of having lots of small things to manipulate. - I have read a little about Zope but am not sure whether it might be useful or a bit much like Notes. Zope might be useful for the rest of the web site. - as the documents are generally limited to about 6 different layouts DocBook might be a good option. If so, do we store the text in files or a database? How do I edit it? How do I train authorised users to modify documents? - We have experimented a little with latex. Is latex a better option, and if so, do we store the markup bits with the text or store the text in a database and add markup on the fly to present documents in a standard format.? One advantage of latex seems to be that we could store each document individually and so have very flexible formats. However, again training could be a problem. No WYSIWYG. - how do we provide the indexing, searching and browsing functions with any/all of the above? I am leaning a little towards PostgreSQL and perl for several reasons. 1) Gus says perl is good 2) I have a little (read very little) perl experience. 3) I suspect in the long run it is going to be easier and more efficient to store all the text in a RDBMS than as text files 4) I need to have a system for authenticating users and ensuring they have access only to selected subsets of the documents. Documents could be more easily categorised in the RDBMS. I am not sure whether this project is beyond our capabilities given other commitments. It may be necessary to contract out the initial work and then take up maintenance / improvements ourselves. thankyou & regards Steven -- SLUG - Sydney Linux User's Group - http://slug.org.au/ More Info: http://lists.slug.org.au/listinfo/slug
