Re: [gentoo-dev] [PATCH 01/14] cdrom.eclass: Detect case-insensitively and handle special characters
On Wed, 19 Apr 2017 05:14:34 +0200 Michał Górnywrote: > >> > @@ -243,4 +240,27 @@ _cdrom_locate_file_on_cd() { > >> > done > >> > } > >> > > >> > +# @FUNCTION: _cdrom_glob_match > >> > +# @USAGE: > >> > +# @INTERNAL > >> > +# @DESCRIPTION: > >> > +# Locates the given path ($2) within the given root directory ($1) > >> > +# case-insensitively and returns the first actual matching path. > >This > >> > +# eclass previously used "find -iname" but it only checked the > >file > >> > +# case-insensitively and not the directories. There is "find > >-ipath" but > >> > +# this does not intelligently skip non-matching paths, making it > >> > +# slow. Case-insensitive matching can only be applied to patterns > >so > >> > +# extended globbing is used to turn regular strings into patterns. > >All > >> > +# special characters are escaped so don't worry about breaking > >this. The > >> > +# first person to make this work without an eval wins a cookie. > >> > +_cdrom_glob_match() { > >> > +local p=\?\($(sed -e 's:[^A-Za-z0-9/]:\\\0:g' -e 's:/:)/?(:g' > >> > <<< > >"$2" || die)\) > >> > >> Explanatory comment needed, i.e. what gets converted into what, and > >why. > > > >I'll add this: > > > ># The following line turns this: > ># foo*foo/bar bar/baz/file.zip > ># > ># Into this: > ># ?(foo\*foo)/?(bar\ bar)/?(baz)/?(file\.zip) > ># > ># This turns every path component into an escaped extended glob > ># pattern to allow case-insensitive matching. Globs cannot span > ># directories so each component becomes an individual pattern. > > Why do you escape pattern chars? Wasn't the variable supposed to be a > pattern in the first place? If you mean in the eclass before I changed it then no. In the non CD_ROOT case, the basename was passed to "find -iname" but this was not documented. In the CD_ROOT case, the whole thing was passed to [[ -e ]] so patterns wouldn't have worked here. You wouldn't want to use a pattern anyway as you're trying to uniquely identify the disc using a very specific filename. Conversely, I relaxed case-sensitivity because this can vary depending on whether we're dealing with the original disc, a copy of some kind, or an existing installation that may come from Windows. The Curse of Monkey Island turned out to be a great example. Both discs have some files in common like COMI.LA0, however, when mounted with the default options, it appears upper-cased on the first disc but lower-cased on the second. Why? The second disc doesn't use Joliet as all the filenames have the old 8.3 format. Linux normalises these to lower-case. The first disc does use Joliet because of a single file, "Curse of Monkey Island - Manual.pdf" so all the other 8.3 filename are left as upper-case by Linux. -- James Le Cuirot (chewi) Gentoo Linux Developer pgpeNIjKYyVV0.pgp Description: OpenPGP digital signature
Re: [gentoo-dev] [PATCH 01/14] cdrom.eclass: Detect case-insensitively and handle special characters
Dnia 18 kwietnia 2017 23:31:31 CEST, James Le Cuirotnapisał(a): >On Tue, 18 Apr 2017 08:08:44 +0200 >Michał Górny wrote: > >> On pon, 2017-04-17 at 22:53 +0100, James Le Cuirot wrote: >> > diff --git a/eclass/cdrom.eclass b/eclass/cdrom.eclass >> > index 41488d2446c2..de72f15563db 100644 >> > --- a/eclass/cdrom.eclass >> > +++ b/eclass/cdrom.eclass >> > @@ -79,12 +79,13 @@ cdrom_get_cds() { >> >export CDROM_ROOT=${CD_ROOT_1:-${CD_ROOT}} >> >einfo "Found CD #${CDROM_CURRENT_CD} root at ${CDROM_ROOT}" >> >export CDROM_SET=-1 >> > - for f in ${CDROM_CHECK_1//:/ } ; do >> > + IFS=: >> >> 'local', please. > >This line disappears later in the series but I've amended it for the >history anyway. > >> > @@ -181,28 +182,24 @@ _cdrom_locate_file_on_cd() { >> >local showedmsg=0 showjolietmsg=0 >> > >> >while [[ -z ${CDROM_ROOT} ]] ; do >> > - local i=0 >> > - local -a cdset=(${*//:/ }) >> > + local i=0 cdset >> > + IFS=: read -a cdset <<< "${*}" >> >> -r to avoid handling escapes; -d '' to avoid finishing on newline. > >Good call. > >> > @@ -243,4 +240,27 @@ _cdrom_locate_file_on_cd() { >> >done >> > } >> > >> > +# @FUNCTION: _cdrom_glob_match >> > +# @USAGE: >> > +# @INTERNAL >> > +# @DESCRIPTION: >> > +# Locates the given path ($2) within the given root directory ($1) >> > +# case-insensitively and returns the first actual matching path. >This >> > +# eclass previously used "find -iname" but it only checked the >file >> > +# case-insensitively and not the directories. There is "find >-ipath" but >> > +# this does not intelligently skip non-matching paths, making it >> > +# slow. Case-insensitive matching can only be applied to patterns >so >> > +# extended globbing is used to turn regular strings into patterns. >All >> > +# special characters are escaped so don't worry about breaking >this. The >> > +# first person to make this work without an eval wins a cookie. >> > +_cdrom_glob_match() { >> > + local p=\?\($(sed -e 's:[^A-Za-z0-9/]:\\\0:g' -e 's:/:)/?(:g' <<< >"$2" || die)\) >> >> Explanatory comment needed, i.e. what gets converted into what, and >why. > >I'll add this: > ># The following line turns this: ># foo*foo/bar bar/baz/file.zip ># ># Into this: ># ?(foo\*foo)/?(bar\ bar)/?(baz)/?(file\.zip) ># ># This turns every path component into an escaped extended glob ># pattern to allow case-insensitive matching. Globs cannot span ># directories so each component becomes an individual pattern. Why do you escape pattern chars? Wasn't the variable supposed to be a pattern in the first place? -- Best regards, Michał Górny (by phone) -- Best regards, Michał Górny (by phone)
Re: [gentoo-dev] [PATCH 01/14] cdrom.eclass: Detect case-insensitively and handle special characters
On Tue, 18 Apr 2017 08:08:44 +0200 Michał Górnywrote: > On pon, 2017-04-17 at 22:53 +0100, James Le Cuirot wrote: > > diff --git a/eclass/cdrom.eclass b/eclass/cdrom.eclass > > index 41488d2446c2..de72f15563db 100644 > > --- a/eclass/cdrom.eclass > > +++ b/eclass/cdrom.eclass > > @@ -79,12 +79,13 @@ cdrom_get_cds() { > > export CDROM_ROOT=${CD_ROOT_1:-${CD_ROOT}} > > einfo "Found CD #${CDROM_CURRENT_CD} root at ${CDROM_ROOT}" > > export CDROM_SET=-1 > > - for f in ${CDROM_CHECK_1//:/ } ; do > > + IFS=: > > 'local', please. This line disappears later in the series but I've amended it for the history anyway. > > @@ -181,28 +182,24 @@ _cdrom_locate_file_on_cd() { > > local showedmsg=0 showjolietmsg=0 > > > > while [[ -z ${CDROM_ROOT} ]] ; do > > - local i=0 > > - local -a cdset=(${*//:/ }) > > + local i=0 cdset > > + IFS=: read -a cdset <<< "${*}" > > -r to avoid handling escapes; -d '' to avoid finishing on newline. Good call. > > @@ -243,4 +240,27 @@ _cdrom_locate_file_on_cd() { > > done > > } > > > > +# @FUNCTION: _cdrom_glob_match > > +# @USAGE: > > +# @INTERNAL > > +# @DESCRIPTION: > > +# Locates the given path ($2) within the given root directory ($1) > > +# case-insensitively and returns the first actual matching path. This > > +# eclass previously used "find -iname" but it only checked the file > > +# case-insensitively and not the directories. There is "find -ipath" but > > +# this does not intelligently skip non-matching paths, making it > > +# slow. Case-insensitive matching can only be applied to patterns so > > +# extended globbing is used to turn regular strings into patterns. All > > +# special characters are escaped so don't worry about breaking this. The > > +# first person to make this work without an eval wins a cookie. > > +_cdrom_glob_match() { > > + local p=\?\($(sed -e 's:[^A-Za-z0-9/]:\\\0:g' -e 's:/:)/?(:g' <<< "$2" > > || die)\) > > Explanatory comment needed, i.e. what gets converted into what, and why. I'll add this: # The following line turns this: # foo*foo/bar bar/baz/file.zip # # Into this: # ?(foo\*foo)/?(bar\ bar)/?(baz)/?(file\.zip) # # This turns every path component into an escaped extended glob # pattern to allow case-insensitive matching. Globs cannot span # directories so each component becomes an individual pattern. -- James Le Cuirot (chewi) Gentoo Linux Developer pgpkMgCgMuOu9.pgp Description: OpenPGP digital signature
Re: [gentoo-dev] [PATCH 01/14] cdrom.eclass: Detect case-insensitively and handle special characters
On pon, 2017-04-17 at 22:53 +0100, James Le Cuirot wrote: > This eclass previously used "find -iname" but it only checked the file > case-insensitively and not the directories. There is "find -ipath" but > this does not intelligently skip non-matching paths, making it > slow. Globbing is used here instead. > > The : character has always been used to delimit paths given to > cdrom_get_cds, which makes sense because : generally isn't allowed on > CDs, while whitespace is. Despite that, whitespace was not being > handled properly and neither were wildcard characters. Now all special > characters are automatically escaped. > --- > eclass/cdrom.eclass | 48 ++-- > 1 file changed, 34 insertions(+), 14 deletions(-) > > diff --git a/eclass/cdrom.eclass b/eclass/cdrom.eclass > index 41488d2446c2..de72f15563db 100644 > --- a/eclass/cdrom.eclass > +++ b/eclass/cdrom.eclass > @@ -79,12 +79,13 @@ cdrom_get_cds() { > export CDROM_ROOT=${CD_ROOT_1:-${CD_ROOT}} > einfo "Found CD #${CDROM_CURRENT_CD} root at ${CDROM_ROOT}" > export CDROM_SET=-1 > - for f in ${CDROM_CHECK_1//:/ } ; do > + IFS=: 'local', please. > + for f in ${CDROM_CHECK_1} ; do > + unset IFS > ((++CDROM_SET)) > - [[ -e ${CDROM_ROOT}/${f} ]] && break > + export CDROM_MATCH=$(_cdrom_glob_match "${CDROM_ROOT}" > "${f}") > + [[ -n ${CDROM_MATCH} ]] && return > done > - export CDROM_MATCH=${f} > - return > fi > > # User didn't help us out so lets make sure they know they can > @@ -181,28 +182,24 @@ _cdrom_locate_file_on_cd() { > local showedmsg=0 showjolietmsg=0 > > while [[ -z ${CDROM_ROOT} ]] ; do > - local i=0 > - local -a cdset=(${*//:/ }) > + local i=0 cdset > + IFS=: read -a cdset <<< "${*}" -r to avoid handling escapes; -d '' to avoid finishing on newline. > + > if [[ -n ${CDROM_SET} ]] ; then > - cdset=(${cdset[${CDROM_SET}]}) > + cdset=( "${cdset[${CDROM_SET}]}" ) > fi > > while [[ -n ${cdset[${i}]} ]] ; do > - local dir=$(dirname ${cdset[${i}]}) > - local file=$(basename ${cdset[${i}]}) > - > local point= node= fs= foo= > while read point node fs foo ; do > [[ " cd9660 iso9660 udf " != *" ${fs} "* ]] && \ > ! [[ ${fs} == "subfs" && ",${opts}," == > *",fs=cdfss,"* ]] \ > && continue > point=${point//\040/ } > - [[ ! -d ${point}/${dir} ]] && continue > - [[ -z $(find "${point}/${dir}" -maxdepth 1 > -iname "${file}") ]] \ > - && continue > + export CDROM_MATCH=$(_cdrom_glob_match > "${point}" "${cdset[${i}]}") > + [[ -z ${CDROM_MATCH} ]] && continue > export CDROM_ROOT=${point} > export CDROM_SET=${i} > - export CDROM_MATCH=${cdset[${i}]} > return > done <<< "$(get_mounts)" > > @@ -243,4 +240,27 @@ _cdrom_locate_file_on_cd() { > done > } > > +# @FUNCTION: _cdrom_glob_match > +# @USAGE: > +# @INTERNAL > +# @DESCRIPTION: > +# Locates the given path ($2) within the given root directory ($1) > +# case-insensitively and returns the first actual matching path. This > +# eclass previously used "find -iname" but it only checked the file > +# case-insensitively and not the directories. There is "find -ipath" but > +# this does not intelligently skip non-matching paths, making it > +# slow. Case-insensitive matching can only be applied to patterns so > +# extended globbing is used to turn regular strings into patterns. All > +# special characters are escaped so don't worry about breaking this. The > +# first person to make this work without an eval wins a cookie. > +_cdrom_glob_match() { > + local p=\?\($(sed -e 's:[^A-Za-z0-9/]:\\\0:g' -e 's:/:)/?(:g' <<< "$2" > || die)\) Explanatory comment needed, i.e. what gets converted into what, and why. > + ( > + cd "$1" 2>/dev/null || return > + shopt -s extglob nocaseglob nullglob || die > + eval "ARRAY=( ${p} )" > + echo ${ARRAY[0]} > + ) > +} > + > fi -- Best regards, Michał Górny signature.asc Description: This is a digitally signed message part
[gentoo-dev] [PATCH 01/14] cdrom.eclass: Detect case-insensitively and handle special characters
This eclass previously used "find -iname" but it only checked the file case-insensitively and not the directories. There is "find -ipath" but this does not intelligently skip non-matching paths, making it slow. Globbing is used here instead. The : character has always been used to delimit paths given to cdrom_get_cds, which makes sense because : generally isn't allowed on CDs, while whitespace is. Despite that, whitespace was not being handled properly and neither were wildcard characters. Now all special characters are automatically escaped. --- eclass/cdrom.eclass | 48 ++-- 1 file changed, 34 insertions(+), 14 deletions(-) diff --git a/eclass/cdrom.eclass b/eclass/cdrom.eclass index 41488d2446c2..de72f15563db 100644 --- a/eclass/cdrom.eclass +++ b/eclass/cdrom.eclass @@ -79,12 +79,13 @@ cdrom_get_cds() { export CDROM_ROOT=${CD_ROOT_1:-${CD_ROOT}} einfo "Found CD #${CDROM_CURRENT_CD} root at ${CDROM_ROOT}" export CDROM_SET=-1 - for f in ${CDROM_CHECK_1//:/ } ; do + IFS=: + for f in ${CDROM_CHECK_1} ; do + unset IFS ((++CDROM_SET)) - [[ -e ${CDROM_ROOT}/${f} ]] && break + export CDROM_MATCH=$(_cdrom_glob_match "${CDROM_ROOT}" "${f}") + [[ -n ${CDROM_MATCH} ]] && return done - export CDROM_MATCH=${f} - return fi # User didn't help us out so lets make sure they know they can @@ -181,28 +182,24 @@ _cdrom_locate_file_on_cd() { local showedmsg=0 showjolietmsg=0 while [[ -z ${CDROM_ROOT} ]] ; do - local i=0 - local -a cdset=(${*//:/ }) + local i=0 cdset + IFS=: read -a cdset <<< "${*}" + if [[ -n ${CDROM_SET} ]] ; then - cdset=(${cdset[${CDROM_SET}]}) + cdset=( "${cdset[${CDROM_SET}]}" ) fi while [[ -n ${cdset[${i}]} ]] ; do - local dir=$(dirname ${cdset[${i}]}) - local file=$(basename ${cdset[${i}]}) - local point= node= fs= foo= while read point node fs foo ; do [[ " cd9660 iso9660 udf " != *" ${fs} "* ]] && \ ! [[ ${fs} == "subfs" && ",${opts}," == *",fs=cdfss,"* ]] \ && continue point=${point//\040/ } - [[ ! -d ${point}/${dir} ]] && continue - [[ -z $(find "${point}/${dir}" -maxdepth 1 -iname "${file}") ]] \ - && continue + export CDROM_MATCH=$(_cdrom_glob_match "${point}" "${cdset[${i}]}") + [[ -z ${CDROM_MATCH} ]] && continue export CDROM_ROOT=${point} export CDROM_SET=${i} - export CDROM_MATCH=${cdset[${i}]} return done <<< "$(get_mounts)" @@ -243,4 +240,27 @@ _cdrom_locate_file_on_cd() { done } +# @FUNCTION: _cdrom_glob_match +# @USAGE: +# @INTERNAL +# @DESCRIPTION: +# Locates the given path ($2) within the given root directory ($1) +# case-insensitively and returns the first actual matching path. This +# eclass previously used "find -iname" but it only checked the file +# case-insensitively and not the directories. There is "find -ipath" but +# this does not intelligently skip non-matching paths, making it +# slow. Case-insensitive matching can only be applied to patterns so +# extended globbing is used to turn regular strings into patterns. All +# special characters are escaped so don't worry about breaking this. The +# first person to make this work without an eval wins a cookie. +_cdrom_glob_match() { + local p=\?\($(sed -e 's:[^A-Za-z0-9/]:\\\0:g' -e 's:/:)/?(:g' <<< "$2" || die)\) + ( + cd "$1" 2>/dev/null || return + shopt -s extglob nocaseglob nullglob || die + eval "ARRAY=( ${p} )" + echo ${ARRAY[0]} + ) +} + fi -- 2.11.0