Re: bibupload "append_or_insert" mode?

Mauricio Acebal Tue, 29 Apr 2014 23:59:28 -0700

Hello Samuele and Alexander,
First all thank you for your time considering this use case. I think i will
follow your recommendation and add a ticket, but we can wait for Github and
we will post the changes we've done locally to achieve such behavior trying
not to alter any previous functionality, so it can be used as starting
point if the change is finally made mainstream. To continue on Samuele's
explanation: This sort of redundant information behavior is inherent to
append mode and can already happen when you upload with '-a' mode a
"duplicate" record of an existent one. We had discussed this and decided we
better had redundant data that can be processed later than only read one
source and rejecting all the others.
The cases available so far are:
-i : insert, fails if a duplicate already exists
-a: append, fails if no duplicate is found and the record needs to be
inserted first, it can leave records with redundant data
-r: replace, fails if no duplicate is found and the record needs to be
inserted first
-i -r: insert or replace, never fails. If a duplicate is found : it is
replaced. If no duplicate: then insert as new
The case we propose:
-i -a: insert or append, never fails, If a duplicate is found : new
information is appended with the same redundancy risk . If no duplicate:
then insert as new.
What is really happening:
-i -a: only considers one of the choices discarding the other. (from the
tests done and taking a look at the source code it seems -i prevails always)


As we see it, current "append"  behavior is very useful, and we don't need
to change it, with just need it to be able to coexist with "insert" mode
like "replace" does.

Best Regards.
Mauricio


-- 
Mauricio Acebal
Senior Software Engineer

Frontiers <http://www.frontiersin.org/>
Centro de Empresas - UPM
Campus de Montegancedo
28223 Pozuelo de Alarcón
Madrid


On Tue, Apr 29, 2014 at 10:15 AM, Samuele Kaplun <[email protected]>wrote:

> Dear Mauricio and Alexander,
>
> In data venerdì 25 aprile 2014 08:45:58, Alexander Wagner ha scritto:
> > > I'm working at frontiers publication group and we are implanting
> invenio
> >
> > [...]
> >
> > > Considering our case could happen to other people and understanding the
> > > downsides of modifying the standard invenio sources, we wanted to ask
> if
> > > this has been discussed before and there is another way to achieve this
> > > behavior we want without altering invenio sources, or are you
> > > considering doing something that can work for us in future versions.
> > > I hope i've explained myself correctly, but if there anything else you
> > > need to know please ask. Thank you very much for your time.
> >
> > For me this sounds like you have a not yet handled usecase and some code
> > that handles the usecase. Additionally, you want this to be handled
> > cleanly, that is that this functionality is available upstreams in
> > Invenio by default. Without your code it would be a feature request,
> > with your code it would be a feature request including a patch (implying
> > you consent ot Invenios GPL license).
> >
> > It would then be up to the responsilbe people in the Invenio core team
> > if and how they can add your code. Depending on how it's done there
> > might be some requirements to be met, but in general I think this is the
> > way to go in OpenSource and it would be the clean way you mention
> > provided your functionality is not yet there.
>
> Alexander is fully right :-)
>
> I would add that typically, you can also open first a ticket (in Trac, but
> we
> are moving to Github this week, so you might wait for till next week) where
> you detail the exact behavior you would like to implement. E.g. to me it's
> not
> fully clear in which way do you expect --append to be compatible with --
> insert. --append by definition is adding the input fields to an existing
> record. Just in the commit:
> [...]
> commit c22accb7d80dfaf84127cff458b1bfb739861635
> Author: Samuele Kaplun <[email protected]>
> Date:   Wed Jan 29 16:57:43 2014 +0100
>
>     BibUpload: --append only new fields
>
>     * When bibupload --append mode is used never create
>       duplicate fields. Two fields are considered identical
>       if they have the same indicators and exactly the same
>       subfields in the same order.
>       (closes #1440)
>
>     Reported-by: Florian Schwennsen <[email protected]>
>     Signed-off-by: Samuele Kaplun <[email protected]>
>     Tested-by: Tibor Simko <[email protected]>
> [...]
>
> which is only available in master, it has been introduced the possibility
> to
> avoid duplicating fields when they both exist in the input record and in
> the
> already existing one. However in most cases you will end up adding
> redundant
> information to an existing record. Is this what you really want?
>
> E.g. take the record (in textmarc):
>
> 001__ 123
> 100__ $$aAn author$uan affiliation
> 245__ $$aThe title
>
> and an incoming record
>
> 001__123
> 100__ $$aAn author$uan affiliation$zsome more details
> 245__ $$aAnother title
>
> if you upload it with --insert --replace you would end up:
>
> 001__123
> 100__ $$aAn author$uan affiliation$zsome more details
> 245__ $$aAnother title
>
> but with a naive --insert --append implementation you would end up with:
>
> 001__ 123
> 100__ $$aAn author$uan affiliation
> 100__ $$aAn author$uan affiliation$zsome more details
> 245__ $$aThe title
> 245__ $$aAnother title
>
> Would that fit your requirements?
>
> Best regards,
>         Samuele
>
>
>
> --
> Samuele Kaplun
> Invenio Developer ** <http://invenio-software.org/>
> INSPIRE Service Manager ** <http://inspirehep.net/>
>
>

Re: bibupload "append_or_insert" mode?

Reply via email to