+1
I too will need to store binaries in the not so distant future. I also
agree with the conclusions Tom has drawn.
Tom, would it be possible for this solution to allow some form of an
envelope for header type information and possibly a (proprietary) system id
(not collection-document id)? This would be incredibly helpful. Maybe a
'BinaryCollectionFacade' actually has two (hidden) collections, one
XmlCollection, one BinaryCollection. The xml collection has the envelope
with searchable fields and references the documentId of the binary in the
binary collection. The binary is exactly as you described....but we get
the best of both worlds without sacrificing any performance. This way I
can look at the info for all binaries without actually having to retrieve
them. Some may be more familiar if I say it's like a 'HEAD' request?
Kevin Ross
www.bredex.com
Tom Bradford
<[EMAIL PROTECTED]> To: [EMAIL PROTECTED]
cc:
06/28/2002 Subject: Re: Binary Files
10:37 AM
Please respond
to xindice-dev
On Thursday, June 27, 2002, at 02:51 PM, Francesco Bellomi wrote:
> I would rather see binary as lower level than xml, not the other way
> round. Xml itself need to be ultimetely encoded in some binary form
> (such as UTF-8) to be written on the file, whereas encoding binary as
> (base64) xml is a less-than-optimal solution for both space and time.
Ok... so here is the issue with binary resources in Xindice, and
hopefully this will allow people to think of it from an implementation
perspective rather than a knee-jerk 'we need this' point of view.
Support for binary resources would *not* be as easy as everyone says it
would be for one very simple reason. When you mix and match tokenized
document streams (which is how documents are represented inside of
Xindice, and not as text) and binary streams in a single collection, you
open up the possibility for major data corruption when people start
reading/writing the binary image of XML documents directly (accidentally
or not) or when you try to read/overwrite a binary resource as if it
were a document.
There are two solutions to this, the first is to have a special
signature at the beginning of *every* tokenized stream that identifies
it as such so that the collection manager can check individual streams
to determine exactly what they are. This will lessen the possibility
for data corruption, though not eliminate it completely. This is an
expensive operation and would also require changing the tokenized format.
The other option is to task a collection as either 'binary' or 'xml'. I
am not opposed to saying that a collection can be a 'binary collection'
or an 'xml collection', but not both. My vote would be +1 for this
option. It wouldn't require changes to the tokenized format, and is a
solution that I think everyone can live with.
> By the way, I do currently use base64 for embedding binary in my
> XIndice database, but it's not a good solution.
FWIW, I agree with this.
--
Tom Bradford - http://www.tbradford.org
Architect - XQRL (XQuery Engine) - http://www.xqrl.com
Apache Xindice (XML Database) - http://xml.apache.org/xindice
Labrador (Web Services Hub) - http://www.notdotnet.org/labrador