Hi Dave -- On Thu, 13 Mar 2003, at 10:14 [-0800], Dave Viner ([EMAIL PROTECTED]) wrote:
> Hi Gary, > I too don't want this discussion to be a "mine is better than yours". > Bickering rarely leads to a good solution. Sorry if my initial comments > came off that way. I really didn't intend them to have that effect. No problem, I was just making it be clear that I don't want to go there either... To clarify a little more, even... right now I think we are still clarifying understanding of 1) what we can do with the different implementations, 2) what the gains and penalties are for the different implementations, and 3) whether the gains justify having two. Until we seem to share the same idea of all three, at least in terms of the facts, I'm going to keep bringing up relevant points until I can't think of any more :) So, in the immortal words of Frank Zappa, "here's some more". > > You're correct in that your implementation avoids the double disk write > issue. I'm not sure what craziness was coursing thru me when I missed that, > rather obvious, point. > > This topic was discussed (at length) before: > http://marc.theaimsgroup.com/?l=xindice-dev&m=104066672331546&w=2 > http://marc.theaimsgroup.com/?l=xindice-dev&m=103946437104874&w=2 > http://marc.theaimsgroup.com/?l=xindice-dev&m=103828918030140&w=2 > http://marc.theaimsgroup.com/?t=102873960400001&r=1&w=2 I'd read the more recent thread, will read the 8/2002 thread later today. Thanks for the link! > > The "inline" metadata approach certainly provides a lot of functionality. > However, as noted in some of the archived messages, for some applications > (mine included), altering the document itself is simply not an option. > True, one could provide an API to fetch the document without the inlined > metadata, but that requires more work than I'd want to do simply for the > possibility of accessing metadata. Let me see if I understand what you are implying with "altering the document itself is simply not an option". There are essentially three cases of interest: 1) The original save of the document. No problem, inline metadata goes into the record with the document. 2) Modification of the document. Doesn't happen! 3) Modification of the system metadata without changing the document. Last access time seems like the canonical example where this is an issue, and I hadn't thought of it before. 4) Introducing metadata into a collection which doesn't have it. Again, the document must be read and re-written. I am guessing that (3) is what you're talking about and it's a good point. Is "most recent access time" typically available in SQL db's for instance? Is it something your applications need/use? It _sounds_ useful, if only for reports, and the inline model cannot support it efficiently. I suspect that the BTree could be modified to manage certain kinds of metadata efficiently, but that's yet another set of changes. It does make me wonder if that's the way I should be approaching this issues, though... first record block is metadata, following blocks are data? Interesting... As far as an API to fetch the document without the metadata, no separate API is required. Fetching a document works the same way it always has. The metadata is automatically stripped off by the reader plugins, there is essentially no performance penalty. I haven't yet dealt with accessing system metadata from outside of the internals, but I understand the desire to do so. I suspect the API you're using now would work fine for the inline system metadata as well, and the inline performance advantage would still apply, assuming caching. > > On a seperate note, have you performance tested the existing MetaData > implementation and found it to be below your requirements? If so, are you > at liberty to disclose your requirements and tests? I'd love to see Xindice > improve the performance of Metadata if it's subpar. That's an excellent question. I saw the doubling issue and thought "this is crazy". That's as far as I went. I take your point: if ya can't tell, does it matter? On the other hand, for a general-purpose tool like a database, I have the impression that the goal is to go as fast as possible, within reason. Xindice could have been built as file-per-document, directory per collection, and it would have only been a little slower. Lots of much easier things could have been done, but in fact the current solution is fairly close to state of the art. What drove that? Not a particular application, I'd wager (and maybe lose :), but probably someone's urge to "get it right". I am all for metadata. For optional metadata that is only updated by the user, independent of the document, performance is not that critical, and your solution is optimal anyway! But doubling access time for every document read or write truly concerns me. Can it really be argued that doubling disk accesses is irrelevant? I hear your argument and would like to pursue this question further. > > Have you contacted Murray about the XNode implementation? It was put > into > the scratchpad, and looked promising but we hit some odd licensing issues > that were never resolved. (Or at least thats the last I remember of it.) No I haven't. I will try to read about it again, I've forgotten what I read the first time! Regards, Gary > dave