On Wed, 15 Aug 2012, Benoit Thiell wrote: > Is this a good idea and could this be merged with master?
If we recreate MARCXML as the `standard' master format out of passed pickled `recstruct' version, then it may be good indeed to be able to pass one format or the other (i.e. `xm' or `recstruct') with essentially the same effect on Invenio. I have a slight worry in that `recstruct' was meant to be an internal format useful to various internal clients for speed purposes. If we start using it more visibly also outside and allow it to be inserted, then it may be harder to modify this format in the future, should the need arise. For example, we recently discussed with Nikola the problem of tracing two John Does within the same record. In order to make BibAuthorID to trace better what the cataloguer does to every John Doe entry, we thought of introducing an internal `permanent field ID' that would work as the current `field_number' in MARCXML, but in a permanent way; i.e. this ID would survive field moves by cataloguers from BibEdit. This could work e.g. if we extend `recstruct' to have a new, automatically-handled persistent field ID inside. Could be computed from incoming MARCXML based on field differences, etc. (If someone is interested to hear more, we'll soon describe the idea in a new ticket.) Another item of interest here is the virtual field facility. (Currently part of ticket:852, but it is attacked as a separate issue, so singled out.) In order to introduce virtual field elements that would be kind of transparent for the current BibRecord clients, we thought of introducing another master format tentatively dubbed `recjson'. This would be an alternative master format to `recstruct' that clients could use with more programmer-friendly API, with the capability to access possibly dynamic virtual field elements and values. `recstruct' would be kept the same for backwards compatibility, while `recjson' would be where the new cool stuff is happening. This goes in the direction of having MARCXML only one of possible master formats. For the M9 project, we need to support e.g. EAD input format. So, M9 instance will have many records, some of them coming from MARCXML master format, some of them coming from EAD master format, some of them coming from ICCD master format, etc. So MARCXML won't be the necessary master format for everyone. We thought of using precisely the virtual field infrastructure (dubbed BibField for the time being) that I described above and that would be able to work with MARCXML or with EAD. The information about the record would be stored in incoming-master-format-independent way dubbed `recjson'. The clients of the record would then be able to access logical field concepts such as ``record['first_author']'' regardless of whether the record in MARCXML or EAD. The `recstruct' format may not even have to come to the picture, unless one wants to use BibEdit and friends. (Which is not the case for M9, but is the case for regular Invenio installation that would still use MARCXML master format almost everywhere.) This means we can be progressively introducing `recjson' in place of `recstruct' module by module, the two living independently on the side for the time being. (Kind of like we shall have `legacy' Invenio native modules and Flask blueprint modules living side by side in the `next' branch.) This was just to mention at least few glimpses on what we have in the pipeline regarding the possibility of master record format not being MARCXML, hence the possibility of `recstruct' not being particularly useful or needed. Esteban has been working on the BibField module this summer; we hope to have the first usable version by September. Summa summarum: we can merge your branch to master/next provided that (i) bibupload will take care of creating proper MARCXML formats out of incoming `recstruct'; (ii) what about having tests for at least a few append/correct/replace situations; (iii) `recstruct' is an internal format and as such it may change among major releases; so some kind of versioning would be required -- unless we tag this facility as INTERNAL or EXPERIMENTAL so that only knowledgeable Invenio installations would use it; say we allow it only when ``--yes-i-know'' flag is in use; then we could merge the facility almost immediately, since it would satisfy your use case while not impacting (or not even visible) to other installations; (iv) please be aware that we hope to progressively deprecate the use of `recstruct' in profit of `recjson' in the future, as described above. WDYT? Best regards -- Tibor Simko

