Re: [RDD] rdimport - Suggested ingestion parameters?

David Klann Tue, 06 Nov 2018 09:17:03 -0800

Hi Richard,

On 11/5/18 5:29 PM, Richard G Elen wrote:
> Hi...
> 
> I am soon to ingest our music library using rdimport.
> 
> The source library structure is basically
> ~/Artistname/Albumname/trackfiles and they may be multiple album
> subfolders in the Artistname folder.
> 
> We've spent quite a lot of time and effort getting our metadata in good
> order so hopefully all will go well.
> 
> looking at the man pages, rdimport has a great number of options, but,
> being new to this, I wonder if anyone can suggest what is a good
> general-purpose set of configuration options that will be most likely to
> give the best results. Fortunately or unfortunately we have an extremely
> eclectic library ranging from mediaeval to rock, so I am expecting to
> have to do some fine tuning.
> 
> But some basic suggestions would be gratefully received, and thanks in
> advance.


I have written a ZSH script to wrap rdimport. Mostly the script is
irrelevant to your question, but you can see how I invoke it at the
bottom of the attached file (in the series of lines starting with
"${DEBUG} ${RDIMPORT}"). The file is also available on GitHub at
https://github.com/opensourceradio/ram/blob/master/usr/local/bin/import-rd

Given the recent discussion on the list regarding normalization level, I
recommend setting (and will change) the default normalization in the
script from -4 dBFS to -12 dBFS.

While rdimport(1) is pretty good at extracting metadata from audio
files, I like to do so manually, and then set the the database fields to
predictable values on the command line rather than simply leaving them
blank. My experience has been that rdimport ignores _all_ of the
metadata if it finds problems with _any_ of the metadata it (someone
please correct me if I misunderestimate that).

Feel free to ask questions!

  ~David Klann

#!/bin/zsh

setopt NO_CASE_MATCH
zmodload zsh/regex
zmodload zsh/terminfo
autoload colors
colors

# This script complies with Semantic Versioning: http://semver.org/
vMajor=0
vMinor=1
vPatch=4
vHash='$Hash: 917b77e$'

# Tell them how to use this command.
usage() {
  myName=${1}

  ${CAT} << EOF
${myName}: import digital audio tracks into the Rivendell Library using 
rdimport(1).

Summary: ${myName##*/} [ --code (-c) { <SchedulerCodes> or 'list' } ]
                   [ --label (-l) <Record Label> ]
                   [ --normalization <Level> ]
                   --group (-g) { <Rivendell Group Name> or 'list' }
                   audio-file ...

You may use the word "list" for the --code (-c) and --group (-g)
options. This will cause ${myName##*/} to list the available codes or
groups and prompt for which one(s) to use. You may specify multiple
Scheduler Codes with the "-c" (aka "--code") option; do this by
specifying the codes separated by commas (you must quote the list of
Scheduler Codes if you include spaces between the codes).

This script will set Scheduler Code to the "genre" meta tag if the
"genre" matches a valid Rivendell Scheduler code. Otherwise the
"genre" meta tag will be ignored.

The script will attempt to extract the label name from the meta
tags. You can explicitely set this with the --label option.

Normalization level defaults to -4 dBFS.

EOF
}

# Check to be sure it is a valid Scheduler Code. Returns true or false
# based on the regex match test.
validSchedCode() {
  local definedCodes=${1} ; shift
  local requestedCode=${1} ; shift

  for schedCode in $(echo ${definedCodes}) ; do
    [[ "${requestedCode}" = ${schedCode} ]] && return 0
  done

  return 1
}

# Check to be sure it is a valid GROUP. Returns true or false based on
# the regex match test
validGroup() {
  local definedGroups=${1} ; shift
  local requestedGroup=${1} ; shift

  [[ "${definedGroups}" =~ ".*${requestedGroup}.*" ]]
}

# Figure out the file type and extract the meta tags from it.
tags() {
  local track="${1}"

  # NB: these string match tests are performed case INsensitively
  # (see setopt, above).
  if [[ "${track##*.}" = 'flac' ]] ; then
    flacTags "${track}"
  elif [[ "${track##*.}" = 'ogg' ]] ; then
    vorbisTags "${track}"
  elif [[ "${track##*.}" = 'mp3' ]] ; then
    id3Tags "${track}"
  else
    echo "${BOLD}Cannot figure out what kind of file this is. I give 
up!${NORM}" >&2
    exit
  fi
}

# We prefer the style of metaflac(1), so leave these alone.
flacTags() {
  local track="${1}"
    
  ${METAFLAC} --list --block-type=VORBIS_COMMENT "${track}" | ${FGREP} ' 
comment['
}

# We prefer the style of metaflac(1), so do only minimal processing.
# Output from vorbiscomment(1) is expected to be simply "key=value" format.
vorbisTags() {
  local track="${1}"
  local index=1
    
  ${VORBISCOMMENT} --list "${track}" | while read line ; do

    echo "comment[$((index++))]: ${line}"

  done
}

# We prefer the style of metaflac(1), so convert the ID3 tags to
# metaflac style.
id3Tags() {
  local track=${1}

  ${ID3INFO} "${track}" | while read line ; do
    case "${line#=== }" in
      TT2*|TIT2*) echo "comment[1]: title=${line#*: }" ;; # 
(Title/songname/content description)
      TAL*|TALB*) echo "comment[2]: album=${line#*: }" ;; # (Album/Movie/Show 
title)
      TCOP*) : ;; # (Copyright message)
      TCO*|TCON*) echo "comment[3]: genre=${line#*: }" ;; # (Content type)
      TRK*|TRCK*) echo "comment[4]: tracknumber=${${line#*: }%/*}" ;; # (Track 
number/Position in set)
      TYE*|TYER*) echo "comment[5]: date=${line#*: }" ;; # (Year)
      TP1*|TPE1*) echo "comment[6]: artist=${line#*: }" ;; # (Lead 
performer(s)/Soloist(s))
      TP2*|TPE2*) : ;; # (Band/orchestra/accompaniment):
      TP3*|TPE3*) : ;; # (Conductor/performer refinement)
      COM*|COMM*) : ;; # (Comments)
      TPA*|TPOS*) : ;; # (Part of a set)
      APIC*) : ;; # (Attached picture)
      PRIV*) : ;; # (Private frame):  (unimplemented)
      TPUB*) echo "comment[7]: label=${line#*: }" ;; # (Publisher)
      TCOM*) : ;; # (Composer)
      *) : ;; # Unknown ID3 keyword.
    esac
  done
}

# Get zsh functions necessary for this script
if [[ -r ${ROOT:-/}usr/local/bin/zsh-functions ]] ; then
  source ${ROOT:-/}usr/local/bin/zsh-functions
else
  echo "${BOLD}Cannot read /usr/local/bin/zsh-functions, which is necessary for 
this script to operate.${NORM}"
  usage ${0##*/}
fi

#################  BEGIN shell commands used in this script.  #################
# This script uses these 9 external commands.
# Look for them in their upper case, parameter expanded form.
ourCommands=(
  cat
  column
  fgrep
  getopt
  id3info
  metaflac
  perl
  rdimport
  vorbiscomment
)
# Find the executables we need; this uses a little old fashioned shell and
# a ZSH trick -- the (U) in the eval(1) says to evaluate the parameter as
# all upper case letters. We will use the command names in upper case as
# variables by which to call the external commands used in this script.
for C in ${ourCommands} ; do
  for D in ${path} ; do
    [[ -x ${D}/${C} ]] && { eval ${(U)C//-/_}=${D}/${C} ; break }
  done
  [[ -x $(eval echo \$${(U)C//-/_}) ]] || { echo "Cannot find ${C}! Done."; 
return 1 }
done
##################  END shell commands used in this script.  ##################

RED="${fg_bold[red]}"
BOLD="${bold_color}"
NORM="${reset_color}"

# these depends on having ~/.my.cnf or /etc/rd.conf configured with
# valid credentials for accessing the Rivendell database
rivendellSchedCodes=( $(doSQL "select CODE from SCHED_CODES") )
rivendellGroupList=( $(doSQL "select NAME from GROUPS") )

# use getopt to parse the command line args
TEMP=$(${GETOPT} -o hc:g:l:n:V --long 
help,codes:,group:,label:,normalization:,version -n ${0:t} -- "${@}")
if [ ${?} != 0 ] ; then echo "${BOLD}Terminating...${NORM}" >&2 ; exit 1 ; fi
# Note the quotes around "$TEMP": they are essential!
eval set -- "${TEMP}"
while :
do
  case "${1}" in
    -c|--code*) ourSchedCodes=${2} ; shift 2 ;;
    -g|--grou*) group=${2} ; shift 2 ;;
    -l|--labe*) label=${2} ; shift 2 ;;
    -n|--norm*) normalization=${2} ; shift 2 ;;
    -h|--help) usage ${0} ; exit ;;
    -V|--vers*) showVersion=1 ; shift ;;
    --) shift ; break ;;
    *) echo "${BOLD}Internal error!${NORM}" ; exit 1 ;;
  esac
done
unset TEMP

if ((showVersion)) ; then
  echo "${0##*/}: version ${vMajor}.${vMinor}.${vPatch}-${${vHash#\$Hash: }%$}"
  exit 0
fi

if [[ -r "${1}" ]] ; then

  # list the groups and prompt them for a group to use in this import
  if [[ -n "${group}" ]] ; then
    if [[ "${group}" = 'list' ]] ; then
      echo "Current list of GROUPs:"
      echo ${(F)rivendellGroupList} | ${COLUMN} -x -c $(( COLUMNS > 80 ? 78 : 
COLUMNS ))
      read group\?"Enter a GROUP to import into: "
    fi
  else
    echo "Current list of GROUPs:"
    echo ${(F)rivendellGroupList} | ${COLUMN} -x -c $(( COLUMNS > 80 ? 78 : 
COLUMNS ))
    read group\?"Enter a GROUP for this import: "
  fi

  validGroup "${rivendellGroupList}" ${group} || { echo
                                                   "${BOLD}${group} is not in 
the list. Please use a valid GROUP
    name.${NORM}" 1>&2 ; exit 2 ; }

  # List the codes and prompt the user for codes to use in this
  # import.  See also, below where we attempt to assign the "genre"
  # to a Scheduler Code.
  if [[ -n "${ourSchedCodes}" ]] ; then
    if [[ "${ourSchedCodes}" = 'list' ]] ; then
      echo "Current list of Scheduler codes:"
      echo ${(F)rivendellSchedCodes} | ${COLUMN} -x -c $(( COLUMNS > 80 ? 78 : 
COLUMNS ))
      read ourSchedCodes\?"Enter a comma-separated list of scheduler codes (no 
spaces): "
    fi

    # We have to echo(1) this else the expanded and edited
    # variable is still a single word.
    for code in $(echo ${ourSchedCodes//,/ })
    do
      validSchedCode "${rivendellSchedCodes}" ${code} || { echo "${BOLD}${code} 
is not in the list. Please use a valid scheduler code.${NORM}" 1>&2 ; exit 2 ; }
    done
  fi

  # Save the initial list of Scheduler Codes (stripping whitespace from the 
list).
  initialSchedCodes="${ourSchedCodes// /}"

  # snarf each track one by one, extract metadata and import with
  # rdimport(1) TODO: add support for other import audio file formats
  for track ; do
    # Reset the list of scheduler Codes for this track.
    ourSchedCodes="${initialSchedCodes}"

    # Store the file location of the source material for reference.
    if [[ "${track}" =~ '^/.*' ]] ; then
      # Track is a full path, save it as is.
      location=${track#/*/*/}
    elif [[ "${track}" =~ '^[^/].*/.*' ]] ; then
      # Track is a relative path, leave it alone.
      location=${track}
    else
      # Track is in the current directory, get the full path.
      location=${PWD}/${track}
    fi

    meta=$(tags ${track})
    artist=$(echo ${meta} | ${PERL} -ne 'if ( /comment.*: artist=/i ) { 
@x=split(/\=/); print $x[1]; }') ; artist=${artist:-UNKNOWN ARTIST}
    title=$(echo ${meta} | ${PERL} -ne 'if ( /comment.*: title=/i ) { 
@x=split(/\=/); print $x[1]; }') ; title=${title:-UNKNOWN TITLE}
    album=$(echo ${meta} | ${PERL} -ne 'if ( /comment.*: album=/i ) { 
@x=split(/\=/); print $x[1]; }') ; album=${album:-UNKNOWN ALBUM}
    year=$(echo ${meta} | ${PERL} -ne 'if ( /comment.*: date=/i ) { 
@x=split(/\=/); print $x[1]; }') ; year=${year:-9999}
    tracknum=$(echo ${meta} | ${PERL} -ne 'if ( /comment.*: tracknumber=/i ) { 
@x=split(/\=/); print $x[1]; }') ; tracknum=${tracknum:-999}
    # The track may have multiple 'genre' and 'label' tags, so
    # stash them in arrays for later processing; set IFS to only
    # NEWLINE while processing.
    tempIFS="${IFS}"
    IFS="
"
    genre=( $(echo ${meta} | ${PERL} -ne 'if ( /comment.*: genre=/i ) { 
@x=split(/\=/); print $x[1]; }') )
    trackLabel=( $(echo ${meta} | ${PERL} -ne 'if ( /comment.*: 
(label|organization)=/i ) { @x=split(/\=/); print $x[1]; }') )
    IFS="${tempIFS}"
    unset tempIFS

    # Attempt to tease out the year from the DATE string; we assume
    # that the year is ALWAYS a four-digit string.
    if [[ "${year}" =~ '[[:digit:]]\{4\}[-/ ][[:digit:]]\{2\}[-/ 
][[:digit:]]\{2\}' ]] ; then
      # YYYY-MM-DD format (separated with "/" or "-" or " ")
      year="${year%%[-/ ]*}"
    elif [[ "${year}" =~ '[["digit:]]\{2\}[-/ ][["digit:]]\{2\}[-/ 
][["digit:]]\{4\}' ]] ; then
      # MM-DD-YYYY or DD-MM-YYYY format (separated with "/" or "-" or " ")
      year="${year##*[-/ ]}"
    fi

    # Prefer the embedded label (publisher), but use the one passed on
    # the command line if necessary (set above).
    # The track may have had multiple embedded label names, so
    # separate them with a semicolon (';')
    if [[ -n "${trackLabel}" ]] ; then
      label="$(echo ${(apj';')trackLabel})"
    fi

    # If there are one or more embedded genres, see if any are in
    # our chosen list of scheduler codes, or if we already have
    # the 'genre' in our scheduler code list.  We need to put a
    # comma at the beginning of the string in order for the
    # parameter expansion below (in the rdimport invocation) to
    # work properly.
    if [[ -n "${genre}" ]] ; then
      # Ignore the embedded 'genre' tags if they are not in the
      # list of Rivendell Scheduler Codes
      for g in ${genre} ; do

        # Get the actual SCHED_CODE if the genre was a parenthesized number.
        if [[ "${g}" =~ '^([[:digit:]][[:digit:]]*)' ]] ; then
          g=$(doSQL "SELECT code FROM SCHED_CODES WHERE description LIKE 
'${g}%'")
        fi

        if validSchedCode "${rivendellSchedCodes}" "${g}" ; then
          # Add the embedded 'genre' to our list of
          # Scheduler Codes if it is not already there.
          if ! validSchedCode "${ourSchedCodes}" "${g}" ; then
            ourSchedCodes=,${g}${ourSchedCodes:+",${ourSchedCodes}"}
          fi
        else
          # We do not know about this genre, ignore it and
          # prepare the list of Scheduler Codes for parmeter
          # substitution.
          ourSchedCodes=${ourSchedCodes:+",${ourSchedCodes}"}
        fi
      done
    else
      # No embedded genre, so prepare the list of Scheduler Codes
      # for parmeter substitution.
      ourSchedCodes=${ourSchedCodes:+",${ourSchedCodes}"}
    fi
    
    ${DEBUG} ${RDIMPORT} \
             --log-mode \
             --verbose \
             --fix-broken-formats \
             --normalization-level=${normalization:-"-4"} \
             --set-string-artist=${artist} \
             --set-string-title=${title} \
             --set-string-album=${album} \
             --set-string-year=${year} \
             --set-string-description="Track ${tracknum}" \
             --set-string-publisher=${label:-"UNKNOWN PUBLISHER"} \
             --set-string-user-defined="source=${location}" \
             $(echo ${ourSchedCodes//,/ --add-scheduler-code=}) \
             ${group} ${track}
  done

else

  if [[ -z "${1}" ]] ; then
    message="No audio file name specified."
  elif [[ -r "${1}" ]] ; then
    message="Cannot read audio file '${1}'."
  else
    message="File '${1}' does not seem to exist (check the name)."
  fi

  echo "${BOLD}${message} Please name at least one audio file for which you 
have read permission to import.${NORM}"

  usage ${0##*/}

fi

exit

# Local Variables: ***
# mode:shell-script ***
# indent-tabs-mode: f ***
# sh-indentation: 2 ***
# sh-basic-offset: 2 ***
# sh-indent-for-do: 0 ***
# sh-indent-after-do: + ***
# sh-indent-comment: t ***
# sh-indent-after-case: + ***
# sh-indent-after-done: 0 ***
# sh-indent-after-else: + ***
# sh-indent-after-if: + ***
# sh-indent-after-loop-construct: + ***
# sh-indent-after-open: + ***
# sh-indent-after-switch: + ***
# sh-indent-for-case-alt: ++ ***
# sh-indent-for-case-label: + ***
# sh-indent-for-continuation: + ***
# sh-indent-for-done: 0 ***
# sh-indent-for-else: 0 ***
# sh-indent-for-fi: 0 ***
# sh-indent-for-then: 0 ***
# End: ***

signature.asc
Description: OpenPGP digital signature

_______________________________________________
Rivendell-dev mailing list
[email protected]
http://caspian.paravelsystems.com/mailman/listinfo/rivendell-dev

Re: [RDD] rdimport - Suggested ingestion parameters?

Reply via email to