Pascal Georges wrote:
Hi!
Let me join in here taking up some things. And first let me
remark that there is still a fundamental missunderstanding
of my suggestions. I hope I can clear that up.
> To get a base for Scid that can be used for training (for example) and
> to keep track of extra info, I think there is a workaround by mixing
> indexed flags and PGN tags.
Hey! You got it :)
> Each game can get one or several flags that are :
> IDX_FLAG_START // Game has own start position.
[...]
> IDX_FLAG_USER // User-defined flag.
It is to be asked if the array of flags could be extended a
bit. That is, that there could be more flags without
breaking any compatibility. (E.g. personally I'd like to
have more user flags.)
> So imagine you have a big (or small ... it also works) DB and want to
> keep track of tactics. So each relevant game gets the flag
> IDX_FLAG_TACTICS
> and for example the PGN tag is appended :
> FLAG_TACTICS_data "Removal of the guard/23/black/easy"
>
> that is in order : type/move/side/difficulty
You got it again :) Except one single point, I'll come to
that, I'd not need the FLAG_TACTICS_data.
> Searches in base are fast because the prefilter of indexed flags.
Yes! :)
> The data added to "tactics" should be defined once for all and part of
> Scid's UI.
I think a file of tags that is read in would be more
suitable for this kind of data. I'll explain in a minute
why.
> The most interesting flags are (I think)
> Middlegame
> Endgame
> Tactics
Depends. I also use a lot: white opening, black opening,
brilliancy, blunder and acutally user.
> So for each category could someone list the necessary fields like, for
> the tactics example :
>
> Category Tactics :
> type : pin, overburden, ... , undefined
> move : the move number
> side : white or black
> difficulty : very easy, easy, medium, difficult, very difficult,
> undefined
> solved : solved or unsolved
> comment : free text
Ok, now my suggestions, actually the other way round as your
mail is that way. Lets start with the "keyword" like stuff.
First of all: I did _NEVER_ suggest to reinvent some sytem
like ddc. It was just meant as an example as especially
those collegues from the americas get it with the mothers
milk.
Generally, my ideas have in mind to _MINIMISE_ the work for
the contributors. Not to maximise it. IMHO the contributors
should have the maximal ammount of time for collecting and
indexing not for formal stuff. (Some of the ideas below come
actually from some experiences within my job.)
1. Categories
I think it is most suitable to define the necessary
categories while actually building the DB. One could set
up a hand full and add new ones as necessary. You surely
will not know all categories necessary nor whether those
invented are used as well. (This is direct experience
from my job with our rule sets, and those are really
complex rules.)
Why: this avoids a lot of theoretical "think about" and
the invention of groups that are actually not needed.
How: if a category is defined write it down. It is
essential that all people in the group use the same words
for the categories.
Build it into scid: well this is the way arround using
UIDs for categories and keywords. Not a flexible way but
a doable way. I'd suggest to read them in from a file,
though. That way every contributor could add a new term
if (s)he feels necessary without touching program code.
2. Keywords
Pascal names some which are surely suitable, I'd suggest
to put them in a list that is also extended while
building the DB. You most likely will not know all terms
necessary.
Why and how: see 1. Categories
3. Flags
Its IMHO essential to use them to allow for fast
seraching. I'd suggest to set flags while building up the
reference DB I though aloud about.
Why: Flagging it correctly together with proper
keywording allows to draw exatly those games from the big
DB to create a small, specialised trainings collection
from it.
This saves work as it is done only once, ie. those
working on the RefDB also create the trainings DB's on
the fly and vice versa. (Ie. someone that sets up a
training DB could add the games to stage 3, see below.
They would not have to be complete yet!)
4. UID
IMHO this is essential. That there are no UIDs is IMHO a
major current drawback in scid. I really miss this
feature for a while.
Why: it allows unique referencing of a game within a DB.
The game number is not a good ID, as it changes if the
base is resorted, appended, compressed and so on.
A unique ID is the _only_ reliable way for a computer to
get a unique answer for a query. Michals argument "if I
know that game I know the players" only holds as long as
there is Michal looking at it, it no longer holds once I
want to make automatic queries e.g. to draw out a certain
part of the DB. (See next point.)
How: it should be unique, allow the identification of the
associated base and it should allow for versioning as a
game may change in the history of the DB. All this should
be refleced by the UID.
I suggest (again):
<basename>:<gameversion>-<number>
And this ID to be placed in a normal PGN header field. I
suggest to use CmailGameName as this is already used in
CC code for making up the lack of UIDs. (The name has to
do with compatibility to cmail used in email chess.)
The idea of this format is, that even a computer program
could extract from the UID:
- which database to load
- which game number to search
- check if the games version is correct
This allows for things to be done automatically later on
if need arrises. You can say: this need does not arrise.
I tell you from experience: it does and you'l damn the
day you decided not to add this single line of metadata.
(I do already form time to time. ;)
For <number> I suggest that each contributor to the base
gets a block of numbers to use up and then is assigned a
new block. The numbers would not be subsequent but this
eases up the procedure and avoids that the same number is
given twice.
I strongly suggest that UIDs of games that get deleted
are _NOT_ reused ever. (There's an infinite ammount of
numbers, no need to be tight with them here.)
This does _not_ necessarily mean that the ref-db contains
all versions that ever existed (though I think this has
some charming side effects). BUT it allows to distribute
a diff between versions of the DB easily. Distributing a
diff is not a matter of bandwidth, it is a matter of
convenience for the user: if I add my own additions to
the ref db (e.g. analysis I did) I do not want to trow it
away cause a new db comes out. I want to smoothly add the
new contents.
I suggest that there is a database index created for the
new UID field to allow for searches in this particular
field which are as fast as the common searches for
players e.g.
5. Trainings DB and RefDB (ie. CentriScid)
If I talked about CentriScid I always talked about a,
hopefully large, high quality reference db. This is
important. I never refered to small specialised trainings
DBs. CentriScid for me is meant to get large.
I suggest to create this large reference DB with a strong
focus on quality concerning header tags (what I call
metadata), completeness with regards to tournaments and
events, move orders as far as this is possible. I'd place
"as big as possible" not as the primary target, but "as
reliable as possible".
I suggest that this DB is set up by volunteers here from
the community that want to contribute to the scid project
as a whole which is not only about a piece of software. I
feel, that we've many pretty good players arround that
would do better to contribute their chess knowledge in
building up such a DB than sit down and learn TCL ;) (My
chess unfortunatly does not get better as I get better in
Tcl. Well, Pascal may have his doubts whether even the
latter happens ;)
I suggest that the minimum header field info required is
layed down somwhere in written form for easy reference.
And I suggest that any additional information that is
available is not thrown away but kept.
I suggest that especially those of you join the community
that skimm through games regularily anyway and do
something like building up such a DB for their own. To
those I suggest to share their work in this community
efford.
I suggest to build this base in 3 stages:
- first stage: A new event comes in, the PGNs are added,
but not yet checked at all. At this stage each game is
already assinged a UID of the form
CentriScid-00-<number>
This stage also allows for the addition of empty games
as well as unfinished games. Typically, TWIC would end
up each week in this stage.
- second stage: the event (tournament or whatever) is
checked for completeness, formal things get checked ie.
spelling of the event consistency of the naming (is it
ol, olympiad, olym. or what?) All checked games get a
promoted UID
CentriScid:05-<number>
where <number> is _not_ changed. It is not necessary to
keep the old, uncorrected version. Whether to do this
or not is up to the community to decide.
Also doublett games get removed here. This is of
special importance!
At this point games have to be finished, tournaments
are complete. They'd stay in stage 1 as long as they
are complete. Ie. scid can produces cross tables at
stage 2, stuff like that.
- third stage: someone went over the events in the second
stage, gives flags and keywords, probably fixes errors
in move ordering or whatever if they can be found. (See
also below on "indexing".) These games get their final
UID
CentriScid:<release>-<number>
<number> is still the same, <relase> is counted up
for each release. Most likely games at this stage will
never change. But if an error is found later on and a
game moved to stage 3 at release 12 is to be corrected
and the next release is 17 it gets a promoted UID
CentriScid:17-<number>
<number> still stays the same. But one can now see that
this game was touched and someone who is at release 14
and wants to upgrade needs to get all games that are
labled CentriScid:15-* till CentriScid:17-*. A suitable
diff is easily created from the games UID. (Note that
there is a large company in hamburg that can not do
this. I asked explicitly. They where very polite to
tell me they can not accomplish this and are missing
this feature themselves.)
Still its up to the community to decide whether the
V:12 and the V:17 version of the game in question is
kept. As it would be enough to store the V:12 database
somewhere there is no need to generate doubletts within
the same DB. I never suggested to create double
entries, especially I explicitly suggest to remove them
for the release version, but I do suggest to keep track
if a game gets changed later on.
A release is made from time to time by freezing the DB
makeing a clear cut and give it free with a suitable
number for general download. A criterion could be a
certain ammount of changes compared to the last
release.
In the following block "indexing" refers to "give
keywords or flags or whatever". It is not the technical
generation of a database index. (The german word I refer
to would be "Erschliessung", unfortunately "indexing" has
two meanings in english.)
I also suggest to intellecutal index the DB, ie set
keywords wherever the community feels necessary to point
the user to a very nice game, an interesting variation
and so on. I'd leave the depth at which this is done to
those who do the indexing. Ie. I don't feel it is
necessary that every game is analysed with the assistence
of a GM. Some do a deeper indexing others just check the
formal things as they have no time for in depth checks at
the moment. The formal stuff should be done thoroughly
and it should be the target not to touch the header once
a game reaches stage 3. Beyond that, I suggest that it is
up to the contributors. I'd also encourage pointers to
the literature, that is a [Ref "Author: Book, page,
number"] style PGN tag if known and the game is commented
somewhere. I could provide some of those in case of
interest.
I'd make CentriScid open to new tags added by the
community of users that are not actively working on
CentriScid. But this _requires_ to have UIDs for you to
know at exactly which game to add infos provided from
outside. This may include the need for versioning as
there might be a combination found only sake of an error
in V:12 which is fixed in the current V:17. One could
just send in a mail of the form: "I found ... in <UID>
please add these tags and that solution". For
combinations at a certain stage, Pascals suggestion above
could be used to add the necessary data.
I suggest to make extensive use of flags for this
indexing for fast search plus the use of predefined PGN
header fields which point to the interesting part by
normalised keywords. See above, Pascals suggestion is
pretty close to what I was talking about all the time.
I suggest to build a _simple_ list of keyword terms not a
complex set. But it should be clear from this list
which is a broader and the narrower term. Indexing should
always use the narrowest term possible to describe the
thing. (Example: if you have a book about scid and you
could use between Database and Chessdatabase use
Chessdatabase as it is closer to the thing.) This list
can IMHO best be build while the DB is growing. It makes
not much sense to invent it beforehand.
I suggest to generate trainings DBs out of the large DB
as specific subsets dump. Eg. make a query against
CentriScid that selects all Rook Endings and copy this to
a DB for endgame training. At this point my primary idea
was not to place the whole game in this trainings db but
just the interesting start position and the solution.
PLUS a PGN header field containing the UID (see 4.) of
the game itself for the user to easily look it up
entirely. Hence, CentriScid would contain the _whole_
game but you would not do your training against
CentriScid but against a specialised partial dump that
looks like the current trainings dbs.
I suggest to set up a simple automatic query tool to
accomplish the generation of such a partial dump. This is
does not needed to be done by the community to set up the
CentriScid DB.
--
Kind regards, / War is Peace.
| Freedom is Slavery.
Alexander Wagner | Ignorance is Strength.
|
| Theory : G. Orwell, "1984"
/ In practice: USA, since 2001
-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Scid-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scid-users