Thank you dear Stefano,
i am aware of this module, it works great.
But my problem is, what clever regex to use, in order to identify if a
subfield's content, is an ISSN number. Say our mrc has ISSN numbers thrown
in any tag you could imagine...
So my approach, would be, to search the whole mrc, but i do non know which
regex to use...
2016-11-02 11:52 GMT+02:00 Stefano Bargioni <bargi...@pusc.it>:
> Hi, Sergio:
> you can try MARCgrep http://en.pusc.it/bib/MARCgrep.
> Its help is:
> Extracts MARC records that match a condition on fields. Count and
> invert are available.
> MARCgrep.pl [options] [-e condition] file.mrc
> -h print this help message and exit
> -c count only
> -e condition
> -f comma separated list of fields to print
> -o output format "marc" | "line" | "INLINE"
> -s separator string for condition, default ","
> -v invert match
> -e 'tag,indicator1,indicator2,subfield,value'
> -h Print this message and exit.
> -c Count and print number of matching records
> -e The condition to match in the record.
> For data fields, the syntax is:
> where tag, indicator1, indicator2, subfield, and value are
> regular expressions patterns.
> Do not put spaces around the separators.
> For control fields, the syntax is:
> where tag starts with '00' (use '000' or 'LDR' for
> leader), pos1 is the starting position,
> pos2 is the ending position, both 0-based. Value is a
> regular expression.
> Default condition (-e not specified) matches any data
> For control fields, only the tag is mandatory.
> Examples: -e '100,,,a,^A' will match records that contain
> 100$a starting with 'A'
> -e '008,35,37,(ita|eng)' will match records with
> language ita or eng in 008
> -e '(1|7)(0|1)(0|1),,2' will match
> 100,110,111,700,710,711 with ind2=2
> -f Comma separated list of fields (tags) to print if output
> is "line" or "inline". Default is any field.
> Note that if a tag is preceded by '#' sign (like in
> '#nnn'), a
> count of occurrences will be printed instead.
> Examples: -f '100,245' will print field 100 and 245
> -f '400,#400' will print all occurrences of 400
> field as well as the number of its occurrences
> -o Output format: "marc" for ISO2709, "line" for each subfield
> a line, "inline" (default) for each field in a line.
> -s Specify a string separator for condition. Default is ','.
> -v Invert the sense of matching, to select non-matching
> -V Print the version and exit.
> The mandatory ISO2709 file to read. Can be STDIN, '-'.
> Like grep, the famous Unix utility, MARCgrep.pl allows to filter
> records based on conditions on tag, indicators, and field value.
> Conditions can be applied to data fields, control fields or the
> In case of data fields, the condition can specify tag, indicators,
> subfield and value using regular
> expressions. In case of control fields, the condition must contain
> tag name, the starting
> and ending position (both 0-based), and a regular expressions for
> Options -c and -v allow respectively to count matching records and
> invert the match.
> If option -c is not specified, the output format can be "line" or
> "inline" (both human readable),
> or "marc" for MARC binary (ISO2709). For formats "line" or
> the -f option allows to specify
> fields to print.
> You can chain more conditions using
> ./MARCGgrep.pl -o marc -e condition1 file.mrc | ./MARCGgrep.pl -e
> condition2 -
> KNOWN ISSUES
> Accepts and returns only UTF-8.
> Checks are case sensitive.
> Pontificia Universita' della Santa Croce <http://www.pusc.it/bib/>
> Stefano Bargioni <bargi...@pusc.it>
> SEE ALSO
> marktriggs / marcgrep at <https://github.com/marktriggs/marcgrep>
> filtering large data sets
> > On 02 nov 2016, at 09:57, Sergio Letuche <code4libus...@gmail.com>
> > Hello community,
> > how would you treat the following?
> > I need a way to identify all tags - subfields, that have stored an ISSN
> number in them.
> > What would you suggest as a clever approach for this?
> > Thank you