On Wed, 15 Aug 2012, Benoit Thiell wrote:
> Is this a good idea and could this be merged with master?

If we recreate MARCXML as the `standard' master format out of passed
pickled `recstruct' version, then it may be good indeed to be able to
pass one format or the other (i.e. `xm' or `recstruct') with essentially
the same effect on Invenio.

I have a slight worry in that `recstruct' was meant to be an internal
format useful to various internal clients for speed purposes.  If we
start using it more visibly also outside and allow it to be inserted,
then it may be harder to modify this format in the future, should the
need arise.

For example, we recently discussed with Nikola the problem of tracing
two John Does within the same record.  In order to make BibAuthorID to
trace better what the cataloguer does to every John Doe entry, we
thought of introducing an internal `permanent field ID' that would work
as the current `field_number' in MARCXML, but in a permanent way;
i.e. this ID would survive field moves by cataloguers from BibEdit.
This could work e.g. if we extend `recstruct' to have a new,
automatically-handled persistent field ID inside.  Could be computed
from incoming MARCXML based on field differences, etc.  (If someone is
interested to hear more, we'll soon describe the idea in a new ticket.)

Another item of interest here is the virtual field facility. (Currently
part of ticket:852, but it is attacked as a separate issue, so singled
out.)  In order to introduce virtual field elements that would be kind
of transparent for the current BibRecord clients, we thought of
introducing another master format tentatively dubbed `recjson'.  This
would be an alternative master format to `recstruct' that clients could
use with more programmer-friendly API, with the capability to access
possibly dynamic virtual field elements and values.  `recstruct' would
be kept the same for backwards compatibility, while `recjson' would be
where the new cool stuff is happening.

This goes in the direction of having MARCXML only one of possible master
formats.  For the M9 project, we need to support e.g. EAD input format.
So, M9 instance will have many records, some of them coming from MARCXML
master format, some of them coming from EAD master format, some of them
coming from ICCD master format, etc.  So MARCXML won't be the necessary
master format for everyone.  We thought of using precisely the virtual
field infrastructure (dubbed BibField for the time being) that I
described above and that would be able to work with MARCXML or with EAD.
The information about the record would be stored in
incoming-master-format-independent way dubbed `recjson'.  The clients of
the record would then be able to access logical field concepts such as
``record['first_author']'' regardless of whether the record in MARCXML
or EAD.  The `recstruct' format may not even have to come to the
picture, unless one wants to use BibEdit and friends.  (Which is not the
case for M9, but is the case for regular Invenio installation that would
still use MARCXML master format almost everywhere.)  This means we can
be progressively introducing `recjson' in place of `recstruct' module by
module, the two living independently on the side for the time being.
(Kind of like we shall have `legacy' Invenio native modules and Flask
blueprint modules living side by side in the `next' branch.)

This was just to mention at least few glimpses on what we have in the
pipeline regarding the possibility of master record format not being
MARCXML, hence the possibility of `recstruct' not being particularly
useful or needed.  Esteban has been working on the BibField module this
summer; we hope to have the first usable version by September.

Summa summarum: we can merge your branch to master/next provided that
(i) bibupload will take care of creating proper MARCXML formats out of
incoming `recstruct'; (ii) what about having tests for at least a few
append/correct/replace situations; (iii) `recstruct' is an internal
format and as such it may change among major releases; so some kind of
versioning would be required -- unless we tag this facility as INTERNAL
or EXPERIMENTAL so that only knowledgeable Invenio installations would
use it; say we allow it only when ``--yes-i-know'' flag is in use; then
we could merge the facility almost immediately, since it would satisfy
your use case while not impacting (or not even visible) to other
installations; (iv) please be aware that we hope to progressively
deprecate the use of `recstruct' in profit of `recjson' in the future,
as described above.

WDYT?

Best regards
-- 
Tibor Simko

Reply via email to