Re: [Scid-users] Database for Scid: CentriScid

Pascal Georges Thu, 26 Jun 2008 11:27:27 -0700

2008/6/25 Alexander Wagner <[EMAIL PROTECTED]>:

> Pascal Georges wrote:
>
> It is to be asked if the array of flags could be extended a
> bit. That is, that there could be more flags without
> breaking any compatibility. (E.g. personally I'd like to
> have more user flags.)
>


This is not possible. There are only 16 flags.

[...]


> 1. Categories
>
>    I think it is most suitable to define the necessary
>    categories while actually building the DB. One could set
>    up a hand full and add new ones as necessary. You surely
>    will not know all categories necessary nor whether those
>    invented are used as well. (This is direct experience
>    from my job with our rule sets, and those are really
>    complex rules.)
>
>    Why: this avoids a lot of theoretical "think about" and
>    the invention of groups that are actually not needed.
>
>    How: if a category is defined write it down. It is
>    essential that all people in the group use the same words
>    for the categories.
>
>    Build it into scid: well this is the way arround using
>    UIDs for categories and keywords. Not a flexible way but
>    a doable way. I'd suggest to read them in from a file,
>    though. That way every contributor could add a new term
>    if (s)he feels necessary without touching program code.
>

I prefer predefined categories, even the list can be increased from to time.
If you let this "user defined", you will get :
- tactics
- tactic
- tactical shot
- taktik

for the same category. Then searches are impossible. This is a caricatural
example, but the problem is there.
Moreover things can be translated if categories are finite, otherwise not.


>
> 2. Keywords
>
>    Pascal names some which are surely suitable, I'd suggest
>    to put them in a list that is also extended while
>    building the DB. You most likely will not know all terms
>    necessary.
>
>    Why and how: see 1. Categories


Same thing than above, plus I prefer predefined keywords, so using drop down
boxes is possible.


>
>
> 3. Flags
>
>    Its IMHO essential to use them to allow for fast
>    seraching. I'd suggest to set flags while building up the
>    reference DB I though aloud about.
>

Yes, this is mandatory.

>
> 4. UID
>
>    IMHO this is essential. That there are no UIDs is IMHO a
>    major current drawback in scid. I really miss this
>    feature for a while.
>
>    Why: it allows unique referencing of a game within a DB.
>    The game number is not a good ID, as it changes if the
>    base is resorted, appended, compressed and so on.
>
>    A unique ID is the _only_ reliable way for a computer to
>    get a unique answer for a query. Michals argument "if I
>    know that game I know the players" only holds as long as
>    there is Michal looking at it, it no longer holds once I
>    want to make automatic queries e.g. to draw out a certain
>    part of the DB. (See next point.)
>
>    How: it should be unique, allow the identification of the
>    associated base and it should allow for versioning as a
>    game may change in the history of the DB. All this should
>    be refleced by the UID.
>
>    I suggest (again):
>
>    <basename>:<gameversion>-<number>
>

PGN tags are handled as strings. With this layout, each game will have an
UID which will cost around 20 chars. So the sg3 file is increased by 60 MB
for a 3 M games DB. So the average number of games in each block will
decrease, leading to an overall search penalty (more I/O). Note that events,
white and black players tags for example are stored as references in the
file sn3, because there are some repetitions. Of course UID does not repeat
... By experience I saw that games are around 100 bytes (often less), adding
UID, even in binary compressed form, would increase the average size of
games by a non neglictable percentage.

>From a DB point of view, the key to find a game is for example the tuple
(white, black, date, round). You propose a UID that acts as a "primary key"
or "index" if you refer to the RDBMS point of view (DB2, ORACLE). But the
queries are not the same in Scid, so this key is not strictly necessary
(there are no joint queries).

Moreover I prefer the reference "Spassky Fischer 1972, round 3" than "game
my_huge_ref_db:v12-12345678", because if I copy a game from a base to
another "Spassky Fischer 1972, round 3" is still valid. Not "game
my_huge_ref_db:v12-12345678".

And :
- as a human, I prefer "Spassky Fischer 1972, round 3"
- as a computer, I don't need the UID.

[...]



>    This does _not_ necessarily mean that the ref-db contains
>    all versions that ever existed (though I think this has
>    some charming side effects). BUT it allows to distribute
>    a diff between versions of the DB easily. Distributing a
>    diff is not a matter of bandwidth, it is a matter of
>    convenience for the user: if I add my own additions to
>    the ref db (e.g. analysis I did) I do not want to trow it
>    away cause a new db comes out. I want to smoothly add the
>    new contents.
>

We can already easily import one DB into another with Scid, (CTRL D + drag
and drop). The distribution of diff is trivial.

[....]


> 5. Trainings DB and RefDB (ie. CentriScid)
>
>    If I talked about CentriScid I always talked about a,
>    hopefully large, high quality reference db. This is
>    important. I never refered to small specialised trainings
>    DBs. CentriScid for me is meant to get large.
>
>    I suggest to create this large reference DB with a strong
>    focus on quality concerning header tags (what I call
>    metadata), completeness with regards to tournaments and
>    events, move orders as far as this is possible. I'd place
>    "as big as possible" not as the primary target, but "as
>    reliable as possible".
>

Ok, I don't want to be negative, but where is the start of such DB ?

[...]

     Still its up to the community to decide whether the
>      V:12 and the V:17 version of the game in question is
>      kept.


I don't believe in a real "community" process here. For example look at
contributions at

http://www.chesspositiontrainer.com/English/Downloads/FreeChessRepertoires.aspx

Each repertoire is made by a sole person, and nobody never gives feedback if
he finds an improvement or enhances the repertoire. That's life.

[...]


>    I'd make CentriScid open to new tags added by the
>    community of users that are not actively working on
>    CentriScid. But this _requires_ to have UIDs for you to
>    know at exactly which game to add infos provided from
>    outside.


This is not absolutely necessary. You can precisely designate a game by the
tuple described above. If not, the DB is of poor quality.

[...]

   I suggest to generate trainings DBs out of the large DB
>    as specific subsets dump. Eg. make a query against
>    CentriScid that selects all Rook Endings and copy this to
>    a DB for endgame training.


This looks ok for me.


> At this point my primary idea
>    was not to place the whole game in this trainings db but
>    just the interesting start position and the solution.
>    PLUS a PGN header field containing the UID (see 4.) of
>    the game itself for the user to easily look it up
>    entirely. Hence, CentriScid would contain the _whole_
>    game but you would not do your training against
>    CentriScid but against a specialised partial dump that
>    looks like the current trainings dbs.
>
>    I suggest to set up a simple automatic query tool to
>    accomplish the generation of such a partial dump. This is
>    does not needed to be done by the community to set up the
>    CentriScid DB.
>

This is ok in theory. But I'd prefer to put my hands on something concrete
just to get some reality behind all that.

Pascal

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php

_______________________________________________
Scid-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scid-users

Re: [Scid-users] Database for Scid: CentriScid

Reply via email to