2008/6/26 Alexander Wagner <[EMAIL PROTECTED]>:
>
> > Build it into scid: well this is the way arround using
> > UIDs for categories and keywords. Not a flexible way but
> > a doable way. I'd suggest to read them in from a file,
> > though. That way every contributor could add a new term
> > if (s)he feels necessary without touching program code.
> >
> > I prefer predefined categories, even the list can be increased from to
> > time. If you let this "user defined", you will get :
> > - tactics
> > - tactic
> > - tactical shot
> > - taktik
>
> Thats why I all the time vote for normalised material. ;)
> Actually, as a librarian you'd set up a thesaurus to resolve
> all this to the normalised term.
>
I prefer a process that does not permit human errors. See how French people
test software :
http://www.monkeyproofsoftware.com/en/
and especially the validation process that Scid *must* succeed :
http://www.monkeyproofsoftware.com/en/validation.php
>
> > 4. UID
> [...]
> > <basename>:<gameversion>-<number>
> >
> >
> > PGN tags are handled as strings. With this layout, each game will have
> > an UID which will cost around 20 chars. So the sg3 file is increased by
> > 60 MB for a 3 M games DB. So the average number of games in each block
> > will decrease, leading to an overall search penalty (more I/O). Note
> > that events, white and black players tags for example are stored as
> > references in the file sn3, because there are some repetitions. Of
> > course UID does not repeat ...
>
> Up to the bare number it does. But I see your point
> nevertheless. You'd prefer a bare number to store it as an
> integer? Eg. something like 3 digits for the version 12
> digits for the indivitual number just concatenated and
> filled by 0 up to 12 digits all the time, resulting in a 15
> digits bare number. Would this be more efficient? Ie. if one
> translates
> CentriScid:12-12345678
>
> as
>
> 012000012345678
>
> stored as 12000012345678.
>
> This does not work out for PGN headers. Did I get that
> right?
The penalty is at least 15 bytes + tag (uid) = 18 bytes. I checked the
"enormous" base on Crafty's site, and each game is 82 bytes on average. So
we increase games' data by 18% ! The figure is high, and should be around 8%
with games where the header part has more info (like Chessbase for example),
where the average size of games is about 200 bytes.
> BTW: the multiple fields you suggested, wouldn't
> they produce a much larger overhead? (It's a serious
> question.)
No, because the idea is to use a flag (that already exists) and add
information only to the relevant games. For example, consider the tactics
flag : only few games would need this flag along with the extra PGN tags.
Consider the main references :
- CT Art : 1200 games
- Polgar's endgames, middlegame and endings books : around 5000 games each.
If we achieve to correctly set flags and PGN tags in a big ref DB with as
many games as above, this would mean about 5000*50 bytes = 250 kB of extra
info in sg3 file, with nearly no penalty on searches (because of the flag).
And to have games annotated / flagged would give a very good "added value"
to Scid's users. In fact I would love to get performance decreases because I
have a DB with a lot of info and training exercises !
> Actually, I thought I could reduce that by using
> a single string.
>
> > From a DB point of view, the key to find a game is for
> > example the tuple (white, black, date, round). You
> > propose a UID that acts as a "primary key" or "index" if
> > you refer to the RDBMS point of view (DB2, ORACLE).
>
> Sure, a primary key. I always think of a real DB. Maybe
> thats a fault with scid.
>
Not at all. Scid's design gives its performance. Suppose you have UIDs in a
base, with chuncks allocated to users, and that you merge 10 small DBs (from
those users) into this big refDB. Then you sort the whole base by ECO, and
after that you save a new game : could you give me a quick algorithm
defining a new UID for this new game ? I already have in mind some
workarounds, but that stuff is not a piece of cake. And the added value of
this tricky process for the end user is very small.
Also consider the merging of 2 bases where users did not respect their UID
chunks (human error) : for each game inserted you have to check that it does
not exist in the other 3M UIDs.
Remember that with RDBMS like ORACLE or DB2 you have to run "indexes
reorganisation" from time to time or you will get poor performance. Scid
does not have this drawback, because it has no indexes (that is no UIDs).
The maintenance of Scid bases is simple and fast enough. And Scid's
performance remains stable.
[...]
> > We can already easily import one DB into another with Scid, (CTRL D +
> > drag and drop). The distribution of diff is trivial.
>
> The point is not the import but the creation of the diff.
>
If I append new games to a base :
- I reset the flag "user" (for example) for all games
- I enter my games with flag user
- I set the filter to all "user" flagged games
- I copy the filter to the diff base
Or : my base contains 1000 games. I enter 20 new games (the diff). I export
the 20 latest games.
Or : I set a PGN tag, (v12) that I remove after diff's generation
I think there are many ways of creating a diff., which are not as good as
"export version 11 to a new base", but certainly "good enough".
Pascal
-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Scid-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scid-users