Hello, ok! Improving bibmatch has been planned for quite a long time, and it's good to know we can get feedback from you, Guido.
I'll start by improving the documentation that generates http://invenio-demo.cern.ch/help/admin/bibmatch-admin-guide Then, "fuzziness", that is, trying to match the input in collection by being "close enough". This would mean that the system would suggest a near match if there are typos, extra punctuation etc. Yours, Marko On Mon, Mar 29, 2010 at 12:26 AM, Tibor Simko <[email protected]> wrote: > Hi Guido: > > (CC-ing project-cdsware-developers) > > On Fri, 26 Mar 2010, Guido Pelzer wrote: >>> On Fri, 26 Mar 2010, Guido Pelzer wrote: >>>> many thanks for your mail. bibmatch works good. do you have a >>>> detailed comment of bibmatch options? >>> >>> Currently not much docs besides the guide: >>> >>> <http://invenio-demo.cern.ch/help/admin/bibmatch-admin-guide> >>> >>> But it may soon be updated, since Marko will work on small fuzzy-like >>> features: >>> >>> <https://savannah.cern.ch/task/?3273> >>> >> >> yes, i had already seen, but i have problems with the advanced options, >> especially >> -m --mode=(a|e|o|p|r)[3] >> -o --operator=(a|o)[2] -> and/or??? >> different between --print-new and --print-match > > The mode and operators are taken from search engine API. > > The output streams NEW prints unmatched records, MATCH matched records > when there was exactly one dupe-like hit, and AMBIGUOUS prints matched > records when there was more than one dupe-like hit. > > Personally I would prefer bibmatch to produce more than one output > stream at the same go, for example in case of two output streams: > > $ bibmatch foo.xml > foo_unmatched.xml 2> foo_matched.xml > > so that one has to process only its output files without diffing WRT the > input file. > > Since Marko is attacking this module WRT some fuzziness, we can as well > take this opportunity and change/prettify its API... > > Best regards > -- > Tibor Simko ** CERN Document Server ** <http://cds.cern.ch/> >
