Re: [gentoo-dev] [PATCH 01/14] cdrom.eclass: Detect case-insensitively and handle special characters

2017-04-19 Thread James Le Cuirot
On Wed, 19 Apr 2017 05:14:34 +0200
Michał Górny  wrote:

> >> > @@ -243,4 +240,27 @@ _cdrom_locate_file_on_cd() {
> >> >  done
> >> >  }
> >> >  
> >> > +# @FUNCTION: _cdrom_glob_match
> >> > +# @USAGE:  
> >> > +# @INTERNAL
> >> > +# @DESCRIPTION:
> >> > +# Locates the given path ($2) within the given root directory ($1)
> >> > +# case-insensitively and returns the first actual matching path.  
> >This  
> >> > +# eclass previously used "find -iname" but it only checked the  
> >file  
> >> > +# case-insensitively and not the directories. There is "find  
> >-ipath" but  
> >> > +# this does not intelligently skip non-matching paths, making it
> >> > +# slow. Case-insensitive matching can only be applied to patterns  
> >so  
> >> > +# extended globbing is used to turn regular strings into patterns.  
> >All  
> >> > +# special characters are escaped so don't worry about breaking  
> >this. The  
> >> > +# first person to make this work without an eval wins a cookie.
> >> > +_cdrom_glob_match() {
> >> > +local p=\?\($(sed -e 's:[^A-Za-z0-9/]:\\\0:g' -e 's:/:)/?(:g' 
> >> > <<<  
> >"$2" || die)\)
> >> 
> >> Explanatory comment needed, i.e. what gets converted into what, and  
> >why.
> >
> >I'll add this:
> >
> ># The following line turns this:
> >#  foo*foo/bar bar/baz/file.zip
> >#
> ># Into this:
> >#  ?(foo\*foo)/?(bar\ bar)/?(baz)/?(file\.zip)
> >#
> ># This turns every path component into an escaped extended glob
> ># pattern to allow case-insensitive matching. Globs cannot span
> ># directories so each component becomes an individual pattern.  
> 
> Why do you escape pattern chars? Wasn't the variable supposed to be a
> pattern in the first place?

If you mean in the eclass before I changed it then no. In the non
CD_ROOT case, the basename was passed to "find -iname" but this was not
documented. In the CD_ROOT case, the whole thing was passed to
[[ -e ]] so patterns wouldn't have worked here.

You wouldn't want to use a pattern anyway as you're trying to uniquely
identify the disc using a very specific filename. Conversely, I relaxed
case-sensitivity because this can vary depending on whether we're
dealing with the original disc, a copy of some kind, or an existing
installation that may come from Windows.

The Curse of Monkey Island turned out to be a great example. Both discs
have some files in common like COMI.LA0, however, when mounted with the
default options, it appears upper-cased on the first disc but
lower-cased on the second. Why? The second disc doesn't use Joliet as
all the filenames have the old 8.3 format. Linux normalises these to
lower-case. The first disc does use Joliet because of a single file,
"Curse of Monkey Island - Manual.pdf" so all the other 8.3 filename are
left as upper-case by Linux.

-- 
James Le Cuirot (chewi)
Gentoo Linux Developer


pgpeNIjKYyVV0.pgp
Description: OpenPGP digital signature


Re: [gentoo-dev] [PATCH 01/14] cdrom.eclass: Detect case-insensitively and handle special characters

2017-04-18 Thread Michał Górny
Dnia 18 kwietnia 2017 23:31:31 CEST, James Le Cuirot  
napisał(a):
>On Tue, 18 Apr 2017 08:08:44 +0200
>Michał Górny  wrote:
>
>> On pon, 2017-04-17 at 22:53 +0100, James Le Cuirot wrote:
>> > diff --git a/eclass/cdrom.eclass b/eclass/cdrom.eclass
>> > index 41488d2446c2..de72f15563db 100644
>> > --- a/eclass/cdrom.eclass
>> > +++ b/eclass/cdrom.eclass
>> > @@ -79,12 +79,13 @@ cdrom_get_cds() {
>> >export CDROM_ROOT=${CD_ROOT_1:-${CD_ROOT}}
>> >einfo "Found CD #${CDROM_CURRENT_CD} root at ${CDROM_ROOT}"
>> >export CDROM_SET=-1
>> > -  for f in ${CDROM_CHECK_1//:/ } ; do
>> > +  IFS=:  
>> 
>> 'local', please.
>
>This line disappears later in the series but I've amended it for the
>history anyway.
>
>> > @@ -181,28 +182,24 @@ _cdrom_locate_file_on_cd() {
>> >local showedmsg=0 showjolietmsg=0
>> >  
>> >while [[ -z ${CDROM_ROOT} ]] ; do
>> > -  local i=0
>> > -  local -a cdset=(${*//:/ })
>> > +  local i=0 cdset
>> > +  IFS=: read -a cdset <<< "${*}"  
>> 
>> -r to avoid handling escapes; -d '' to avoid finishing on newline.
>
>Good call.
>
>> > @@ -243,4 +240,27 @@ _cdrom_locate_file_on_cd() {
>> >done
>> >  }
>> >  
>> > +# @FUNCTION: _cdrom_glob_match
>> > +# @USAGE:  
>> > +# @INTERNAL
>> > +# @DESCRIPTION:
>> > +# Locates the given path ($2) within the given root directory ($1)
>> > +# case-insensitively and returns the first actual matching path.
>This
>> > +# eclass previously used "find -iname" but it only checked the
>file
>> > +# case-insensitively and not the directories. There is "find
>-ipath" but
>> > +# this does not intelligently skip non-matching paths, making it
>> > +# slow. Case-insensitive matching can only be applied to patterns
>so
>> > +# extended globbing is used to turn regular strings into patterns.
>All
>> > +# special characters are escaped so don't worry about breaking
>this. The
>> > +# first person to make this work without an eval wins a cookie.
>> > +_cdrom_glob_match() {
>> > +  local p=\?\($(sed -e 's:[^A-Za-z0-9/]:\\\0:g' -e 's:/:)/?(:g' <<<
>"$2" || die)\)  
>> 
>> Explanatory comment needed, i.e. what gets converted into what, and
>why.
>
>I'll add this:
>
># The following line turns this:
>#  foo*foo/bar bar/baz/file.zip
>#
># Into this:
>#  ?(foo\*foo)/?(bar\ bar)/?(baz)/?(file\.zip)
>#
># This turns every path component into an escaped extended glob
># pattern to allow case-insensitive matching. Globs cannot span
># directories so each component becomes an individual pattern.

Why do you escape pattern chars? Wasn't the variable supposed to be a pattern 
in the first place?

-- 
Best regards,
Michał Górny (by phone)
-- 
Best regards,
Michał Górny (by phone)



Re: [gentoo-dev] [PATCH 01/14] cdrom.eclass: Detect case-insensitively and handle special characters

2017-04-18 Thread James Le Cuirot
On Tue, 18 Apr 2017 08:08:44 +0200
Michał Górny  wrote:

> On pon, 2017-04-17 at 22:53 +0100, James Le Cuirot wrote:
> > diff --git a/eclass/cdrom.eclass b/eclass/cdrom.eclass
> > index 41488d2446c2..de72f15563db 100644
> > --- a/eclass/cdrom.eclass
> > +++ b/eclass/cdrom.eclass
> > @@ -79,12 +79,13 @@ cdrom_get_cds() {
> > export CDROM_ROOT=${CD_ROOT_1:-${CD_ROOT}}
> > einfo "Found CD #${CDROM_CURRENT_CD} root at ${CDROM_ROOT}"
> > export CDROM_SET=-1
> > -   for f in ${CDROM_CHECK_1//:/ } ; do
> > +   IFS=:  
> 
> 'local', please.

This line disappears later in the series but I've amended it for the
history anyway.

> > @@ -181,28 +182,24 @@ _cdrom_locate_file_on_cd() {
> > local showedmsg=0 showjolietmsg=0
> >  
> > while [[ -z ${CDROM_ROOT} ]] ; do
> > -   local i=0
> > -   local -a cdset=(${*//:/ })
> > +   local i=0 cdset
> > +   IFS=: read -a cdset <<< "${*}"  
> 
> -r to avoid handling escapes; -d '' to avoid finishing on newline.

Good call.

> > @@ -243,4 +240,27 @@ _cdrom_locate_file_on_cd() {
> > done
> >  }
> >  
> > +# @FUNCTION: _cdrom_glob_match
> > +# @USAGE:  
> > +# @INTERNAL
> > +# @DESCRIPTION:
> > +# Locates the given path ($2) within the given root directory ($1)
> > +# case-insensitively and returns the first actual matching path. This
> > +# eclass previously used "find -iname" but it only checked the file
> > +# case-insensitively and not the directories. There is "find -ipath" but
> > +# this does not intelligently skip non-matching paths, making it
> > +# slow. Case-insensitive matching can only be applied to patterns so
> > +# extended globbing is used to turn regular strings into patterns. All
> > +# special characters are escaped so don't worry about breaking this. The
> > +# first person to make this work without an eval wins a cookie.
> > +_cdrom_glob_match() {
> > +   local p=\?\($(sed -e 's:[^A-Za-z0-9/]:\\\0:g' -e 's:/:)/?(:g' <<< "$2" 
> > || die)\)  
> 
> Explanatory comment needed, i.e. what gets converted into what, and why.

I'll add this:

# The following line turns this:
#  foo*foo/bar bar/baz/file.zip
#
# Into this:
#  ?(foo\*foo)/?(bar\ bar)/?(baz)/?(file\.zip)
#
# This turns every path component into an escaped extended glob
# pattern to allow case-insensitive matching. Globs cannot span
# directories so each component becomes an individual pattern.

-- 
James Le Cuirot (chewi)
Gentoo Linux Developer


pgpkMgCgMuOu9.pgp
Description: OpenPGP digital signature


Re: [gentoo-dev] [PATCH 01/14] cdrom.eclass: Detect case-insensitively and handle special characters

2017-04-18 Thread Michał Górny
On pon, 2017-04-17 at 22:53 +0100, James Le Cuirot wrote:
> This eclass previously used "find -iname" but it only checked the file
> case-insensitively and not the directories. There is "find -ipath" but
> this does not intelligently skip non-matching paths, making it
> slow. Globbing is used here instead.
> 
> The : character has always been used to delimit paths given to
> cdrom_get_cds, which makes sense because : generally isn't allowed on
> CDs, while whitespace is. Despite that, whitespace was not being
> handled properly and neither were wildcard characters. Now all special
> characters are automatically escaped.
> ---
>  eclass/cdrom.eclass | 48 ++--
>  1 file changed, 34 insertions(+), 14 deletions(-)
> 
> diff --git a/eclass/cdrom.eclass b/eclass/cdrom.eclass
> index 41488d2446c2..de72f15563db 100644
> --- a/eclass/cdrom.eclass
> +++ b/eclass/cdrom.eclass
> @@ -79,12 +79,13 @@ cdrom_get_cds() {
>   export CDROM_ROOT=${CD_ROOT_1:-${CD_ROOT}}
>   einfo "Found CD #${CDROM_CURRENT_CD} root at ${CDROM_ROOT}"
>   export CDROM_SET=-1
> - for f in ${CDROM_CHECK_1//:/ } ; do
> + IFS=:

'local', please.

> + for f in ${CDROM_CHECK_1} ; do
> + unset IFS
>   ((++CDROM_SET))
> - [[ -e ${CDROM_ROOT}/${f} ]] && break
> + export CDROM_MATCH=$(_cdrom_glob_match "${CDROM_ROOT}" 
> "${f}")
> + [[ -n ${CDROM_MATCH} ]] && return
>   done
> - export CDROM_MATCH=${f}
> - return
>   fi
>  
>   # User didn't help us out so lets make sure they know they can
> @@ -181,28 +182,24 @@ _cdrom_locate_file_on_cd() {
>   local showedmsg=0 showjolietmsg=0
>  
>   while [[ -z ${CDROM_ROOT} ]] ; do
> - local i=0
> - local -a cdset=(${*//:/ })
> + local i=0 cdset
> + IFS=: read -a cdset <<< "${*}"

-r to avoid handling escapes; -d '' to avoid finishing on newline.

> +
>   if [[ -n ${CDROM_SET} ]] ; then
> - cdset=(${cdset[${CDROM_SET}]})
> + cdset=( "${cdset[${CDROM_SET}]}" )
>   fi
>  
>   while [[ -n ${cdset[${i}]} ]] ; do
> - local dir=$(dirname ${cdset[${i}]})
> - local file=$(basename ${cdset[${i}]})
> -
>   local point= node= fs= foo=
>   while read point node fs foo ; do
>   [[ " cd9660 iso9660 udf " != *" ${fs} "* ]] && \
>   ! [[ ${fs} == "subfs" && ",${opts}," == 
> *",fs=cdfss,"* ]] \
>   && continue
>   point=${point//\040/ }
> - [[ ! -d ${point}/${dir} ]] && continue
> - [[ -z $(find "${point}/${dir}" -maxdepth 1 
> -iname "${file}") ]] \
> - && continue
> + export CDROM_MATCH=$(_cdrom_glob_match 
> "${point}" "${cdset[${i}]}")
> + [[ -z ${CDROM_MATCH} ]] && continue
>   export CDROM_ROOT=${point}
>   export CDROM_SET=${i}
> - export CDROM_MATCH=${cdset[${i}]}
>   return
>   done <<< "$(get_mounts)"
>  
> @@ -243,4 +240,27 @@ _cdrom_locate_file_on_cd() {
>   done
>  }
>  
> +# @FUNCTION: _cdrom_glob_match
> +# @USAGE:  
> +# @INTERNAL
> +# @DESCRIPTION:
> +# Locates the given path ($2) within the given root directory ($1)
> +# case-insensitively and returns the first actual matching path. This
> +# eclass previously used "find -iname" but it only checked the file
> +# case-insensitively and not the directories. There is "find -ipath" but
> +# this does not intelligently skip non-matching paths, making it
> +# slow. Case-insensitive matching can only be applied to patterns so
> +# extended globbing is used to turn regular strings into patterns. All
> +# special characters are escaped so don't worry about breaking this. The
> +# first person to make this work without an eval wins a cookie.
> +_cdrom_glob_match() {
> + local p=\?\($(sed -e 's:[^A-Za-z0-9/]:\\\0:g' -e 's:/:)/?(:g' <<< "$2" 
> || die)\)

Explanatory comment needed, i.e. what gets converted into what, and why.

> + (
> + cd "$1" 2>/dev/null || return
> + shopt -s extglob nocaseglob nullglob || die
> + eval "ARRAY=( ${p} )"
> + echo ${ARRAY[0]}
> + )
> +}
> +
>  fi

-- 
Best regards,
Michał Górny


signature.asc
Description: This is a digitally signed message part


[gentoo-dev] [PATCH 01/14] cdrom.eclass: Detect case-insensitively and handle special characters

2017-04-17 Thread James Le Cuirot
This eclass previously used "find -iname" but it only checked the file
case-insensitively and not the directories. There is "find -ipath" but
this does not intelligently skip non-matching paths, making it
slow. Globbing is used here instead.

The : character has always been used to delimit paths given to
cdrom_get_cds, which makes sense because : generally isn't allowed on
CDs, while whitespace is. Despite that, whitespace was not being
handled properly and neither were wildcard characters. Now all special
characters are automatically escaped.
---
 eclass/cdrom.eclass | 48 ++--
 1 file changed, 34 insertions(+), 14 deletions(-)

diff --git a/eclass/cdrom.eclass b/eclass/cdrom.eclass
index 41488d2446c2..de72f15563db 100644
--- a/eclass/cdrom.eclass
+++ b/eclass/cdrom.eclass
@@ -79,12 +79,13 @@ cdrom_get_cds() {
export CDROM_ROOT=${CD_ROOT_1:-${CD_ROOT}}
einfo "Found CD #${CDROM_CURRENT_CD} root at ${CDROM_ROOT}"
export CDROM_SET=-1
-   for f in ${CDROM_CHECK_1//:/ } ; do
+   IFS=:
+   for f in ${CDROM_CHECK_1} ; do
+   unset IFS
((++CDROM_SET))
-   [[ -e ${CDROM_ROOT}/${f} ]] && break
+   export CDROM_MATCH=$(_cdrom_glob_match "${CDROM_ROOT}" 
"${f}")
+   [[ -n ${CDROM_MATCH} ]] && return
done
-   export CDROM_MATCH=${f}
-   return
fi
 
# User didn't help us out so lets make sure they know they can
@@ -181,28 +182,24 @@ _cdrom_locate_file_on_cd() {
local showedmsg=0 showjolietmsg=0
 
while [[ -z ${CDROM_ROOT} ]] ; do
-   local i=0
-   local -a cdset=(${*//:/ })
+   local i=0 cdset
+   IFS=: read -a cdset <<< "${*}"
+
if [[ -n ${CDROM_SET} ]] ; then
-   cdset=(${cdset[${CDROM_SET}]})
+   cdset=( "${cdset[${CDROM_SET}]}" )
fi
 
while [[ -n ${cdset[${i}]} ]] ; do
-   local dir=$(dirname ${cdset[${i}]})
-   local file=$(basename ${cdset[${i}]})
-
local point= node= fs= foo=
while read point node fs foo ; do
[[ " cd9660 iso9660 udf " != *" ${fs} "* ]] && \
! [[ ${fs} == "subfs" && ",${opts}," == 
*",fs=cdfss,"* ]] \
&& continue
point=${point//\040/ }
-   [[ ! -d ${point}/${dir} ]] && continue
-   [[ -z $(find "${point}/${dir}" -maxdepth 1 
-iname "${file}") ]] \
-   && continue
+   export CDROM_MATCH=$(_cdrom_glob_match 
"${point}" "${cdset[${i}]}")
+   [[ -z ${CDROM_MATCH} ]] && continue
export CDROM_ROOT=${point}
export CDROM_SET=${i}
-   export CDROM_MATCH=${cdset[${i}]}
return
done <<< "$(get_mounts)"
 
@@ -243,4 +240,27 @@ _cdrom_locate_file_on_cd() {
done
 }
 
+# @FUNCTION: _cdrom_glob_match
+# @USAGE:  
+# @INTERNAL
+# @DESCRIPTION:
+# Locates the given path ($2) within the given root directory ($1)
+# case-insensitively and returns the first actual matching path. This
+# eclass previously used "find -iname" but it only checked the file
+# case-insensitively and not the directories. There is "find -ipath" but
+# this does not intelligently skip non-matching paths, making it
+# slow. Case-insensitive matching can only be applied to patterns so
+# extended globbing is used to turn regular strings into patterns. All
+# special characters are escaped so don't worry about breaking this. The
+# first person to make this work without an eval wins a cookie.
+_cdrom_glob_match() {
+   local p=\?\($(sed -e 's:[^A-Za-z0-9/]:\\\0:g' -e 's:/:)/?(:g' <<< "$2" 
|| die)\)
+   (
+   cd "$1" 2>/dev/null || return
+   shopt -s extglob nocaseglob nullglob || die
+   eval "ARRAY=( ${p} )"
+   echo ${ARRAY[0]}
+   )
+}
+
 fi
-- 
2.11.0