Re: awk not just using the Field separator as such. it is using the blank space as well ...

2023-02-21 Thread Steve McIntyre
lbrt...@gmail.com wrote:
>On 2/21/23, Greg Wooledge  wrote:
>> I have a funny feeling Albretch might be using Microsoft file systems
>> (FAT, NTFS) for a large chunk of his system.  Those have a much larger
>> set of restricted characters.
>
> Certainly not FAT32 and definitely not FAT, but at work (I work as a
>Math teacher and most schools use Microsoft) I have had to use WSL and
>NTFS. I always thought that  FSs used length-defined raster data
>structures in order to avoid messing with points and such things.

Different filesystems can vary massively here, you can't really assume
anything. All of the following can vary in filesystems supported by
Linux:

 * allowed characters in filenames
 * allowed filename lengths
 * allowed full-path lengths
 * character encodings for filenames
 * case-sensitivity
 * max number of files per directory
 * max number of files per filesystem
 * timestamps (minimum, maximum and resolution)
 * support for symlinks and hardlinks
 * support for extended attributes, permissions and and ACLs
 * ...

The VFS layer does a very good job of hiding the complexity and giving
you a reasonably consistent view, but it's not difficult to find edges
if you look. :-)

-- 
Steve McIntyre, Cambridge, UK.st...@einval.com
< sladen> I actually stayed in a hotel and arrived to find a post-it
  note stuck to the mini-bar saying "Paul: This fridge and
  fittings are the correct way around and do not need altering"



Re: awk not just using the Field separator as such. it is using the blank space as well ...

2023-02-21 Thread Albretch Mueller
On 2/21/23, Greg Wooledge  wrote:
> I have a funny feeling Albretch might be using Microsoft file systems
> (FAT, NTFS) for a large chunk of his system.  Those have a much larger
> set of restricted characters.

 Certainly not FAT32 and definitely not FAT, but at work (I work as a
Math teacher and most schools use Microsoft) I have had to use WSL and
NTFS. I always thought that  FSs used length-defined raster data
structures in order to avoid messing with points and such things.
 lbrtchx



Re: awk not just using the Field separator as such. it is using the blank space as well ...

2023-02-21 Thread Greg Wooledge
On Tue, Feb 21, 2023 at 05:19:13AM +, Tim Woodall wrote:
> On Mon, 20 Feb 2023, Albretch Mueller wrote:
> 
> > On 2/15/23, Greg Wooledge  wrote:
> > 
> > The reason why I use pipes as field delimiter is because it is an
> > excellent meta character when you are working with filesystems. Pipes
> > would not accepted for files or directory names for good reasons,
> > anyway.
> > 
> 
> tim@einstein(7):~ (none)$ touch 'i|use|pipes'
> tim@einstein(7):~ (none)$ ls -l i*use*
> -rw-rw-r-- 1 tim tim 0 Feb 21 05:14 'i|use|pipes'
> tim@einstein(7):~ (none)$ rm i\|use\|pipes
> tim@einstein(7):~ (none)$
> 
> AFAIR only / and nul are prohibited in file names.

In Unix-like file systems, including Debian's default ext4, this is true.

I have a funny feeling Albretch might be using Microsoft file systems
(FAT, NTFS) for a large chunk of his system.  Those have a much larger
set of restricted characters.



Re: awk not just using the Field separator as such. it is using the blank space as well ...

2023-02-20 Thread Tim Woodall

On Mon, 20 Feb 2023, Albretch Mueller wrote:


On 2/15/23, Greg Wooledge  wrote:

The reason why I use pipes as field delimiter is because it is an
excellent meta character when you are working with filesystems. Pipes
would not accepted for files or directory names for good reasons,
anyway.



tim@einstein(7):~ (none)$ touch 'i|use|pipes'
tim@einstein(7):~ (none)$ ls -l i*use*
-rw-rw-r-- 1 tim tim 0 Feb 21 05:14 'i|use|pipes'
tim@einstein(7):~ (none)$ rm i\|use\|pipes
tim@einstein(7):~ (none)$

AFAIR only / and nul are prohibited in file names.




Re: awk not just using the Field separator as such. it is using the blank space as well ...

2023-02-20 Thread Greg Wooledge
On Mon, Feb 20, 2023 at 09:12:08PM +, Albretch Mueller wrote:
>  However this would rightly split that line based on the pipe delimiter:
> 
> $ echo "${_PTH}" | awk -F '|' '{for (i=1; i<=NF; i++) print $i;}'
> 83847547
> 2
> dli.ernet.449320/449320-Seduction Of The Innocent_text.pdf

So you're just converting pipelines to newlines?  You can do that with
tr.

tr '|' '\n'

>  There should be a sane way ;-) to feed those three lines into a bash array.

mapfile -t myarray < <(...)

But calling multiple processes just to split *one* line of input
is rather inefficient.



Re: awk not just using the Field separator as such. it is using the blank space as well ...

2023-02-20 Thread Albretch Mueller
 Thank you! I noticed my mistake and yes, once again it was a hack
which I thought to be a typo. I had removed the pipe you had included
in the last part of the input string!: "${_PTH}|"

_PTH="83847547|2|dli.ernet.449320/449320-Seduction Of The Innocent_text.pdf"
IFS="|" read -ra _PTH_AR <<< "${_PTH}|"
_PTH_AR_L=${#_PTH_AR[@]}
echo "// __ \$_PTH_AR_L: |${_PTH_AR_L}|, \"${_PTH}\""
for(( IX=0; IX<${_PTH_AR_L}; ++IX )); do
 echo "// __ [$IX/$_PTH_AR_L): |${_PTH_AR[$IX]}|"
done

// __ $_PTH_AR_L: |3|, "83847547|2|dli.ernet.449320/449320-Seduction
Of The Innocent_text.pdf"
// __ [0/3): |83847547|
// __ [1/3): |2|
// __ [2/3): |dli.ernet.449320/449320-Seduction Of The Innocent_text.pdf|

 With awk I just to do such things like this:

_PTH_AR=($( echo "${_PTH}" | awk -F '|' '{for (i=1; i<=NF; i++) print $i;}' ))
echo "// __ \$_PTH_AR_L: |${_PTH_AR_L}|, \"${_PTH}\""
// __ $_PTH_AR_L: |1|, "83847547|2|dli.ernet.449320/449320-Seduction
Of The Innocent_text.pdf"

 However this would rightly split that line based on the pipe delimiter:

$ echo "${_PTH}" | awk -F '|' '{for (i=1; i<=NF; i++) print $i;}'
83847547
2
dli.ernet.449320/449320-Seduction Of The Innocent_text.pdf
$

 There should be a sane way ;-) to feed those three lines into a bash array.



Re: awk not just using the Field separator as such. it is using the blank space as well ...

2023-02-20 Thread Greg Wooledge
On Mon, Feb 20, 2023 at 07:24:01PM +, Albretch Mueller wrote:
> > https://mywiki.wooledge.org/BashPitfalls#pf47
> >
>  what I am trying to do is split a string using as delimiter a pipe

The web page you cited tells you how, doesn't it?  Assuming your string
is a line (e.g. something you pulled out of a *simplified* CSV file,
where there are no delimiters inside fields), and that you want to store
the fields in an array, you can simply do:

IFS="|" read -ra myarray <<< "$mystring|"

Demonstration:

unicorn:~$ mystring='foo|bar|last|field|is|empty|'
unicorn:~$ IFS="|" read -ra myarray <<< "$mystring|"
unicorn:~$ declare -p myarray
declare -a myarray=([0]="foo" [1]="bar" [2]="last" [3]="field" [4]="is" 
[5]="empty" [6]="")

> I used to do that with awk,

I don't understand how awk helps you populate the elements of a bash
array.  Awk can write a new string to stdout, but then you still have
to parse that string in bash...?  I don't see what benefit awk gives
you here.

>  How do you split a string using as delimiter a pipe these days
> without using a bloody hack?

You cited a bash web page.  So, everything you're doing is a hack.
That's the nature of bash.



Re: awk not just using the Field separator as such. it is using the blank space as well ...

2023-02-20 Thread Albretch Mueller
> https://mywiki.wooledge.org/BashPitfalls#pf47
>
 what I am trying to do is split a string using as delimiter a pipe. I
used to do that with awk, but it doesn't work anymore after someone
had the great idea of substituting awk with mawk, it seems;  and Hey!
They could have done it with python!:

$ which awk
/usr/bin/awk

$ which mawk
/usr/bin/mawk

$ awk -W version
mawk 1.3.4 20200120
Copyright 2008-2019,2020, Thomas E. Dickey
Copyright 1991-1996,2014, Michael D. Brennan

random-funcs:   srandom/random
regex-funcs:internal
compiled limits:
sprintf buffer  8192
maximum-integer 2147483647

$ mawk -W version
mawk 1.3.4 20200120
Copyright 2008-2019,2020, Thomas E. Dickey
Copyright 1991-1996,2014, Michael D. Brennan

random-funcs:   srandom/random
regex-funcs:internal
compiled limits:
sprintf buffer  8192
maximum-integer 2147483647
$

 How do you split a string using as delimiter a pipe these days
without using a bloody hack?



Re: awk not just using the Field separator as such. it is using the blank space as well ...

2023-02-20 Thread Greg Wooledge
On Mon, Feb 20, 2023 at 07:10:11AM +, Albretch Mueller wrote:
> On 2/15/23, Greg Wooledge  wrote:
> > If you want to read FIELDS of a SINGLE LINE as array elements, use
> > read -ra:
> >
> > read -ra myarray <<< "$one_line"
> 
>  It didn't work. I tried different options. I am getting: "bash: read:
> ... : not a valid identifier"
> 
>  _PTH="83847547|2|dli.ernet.449320/449320-Seduction Of The Innocent_text.pdf"
>  echo "// __ \$_PTH: \"${_PTH}\""
> 
> # read -ra -d "\\|" _PTH_AR <<< "${_PTH}"
> # read -ra -d "\|" _PTH_AR <<< "${_PTH}"
> # read -ra -d "|" _PTH_AR <<< "${_PTH}"

The -a option has to be followed by the array name.  The -d option has
to be followed by the delimiter.

However, you do NOT want -d "|" here.  The -d delimiter tells read
where to stop reading entirely.  For you, that's the newline character,
which is the default for read, and which is added by the <<< operator.

If you wish to do field splitting when using read, that's what IFS is
for.  However, beware of the atrociously stupid pitfall regarding IFS
with non-whitespace values.

unicorn:~$ _PTH="83847547|2|dli.ernet.449320/449320-Seduction Of The 
Innocent_text.pdf"
unicorn:~$ declare -p _PTH
declare -- _PTH="83847547|2|dli.ernet.449320/449320-Seduction Of The 
Innocent_text.pdf"
unicorn:~$ IFS="|" read -ra _PTH_AR <<< "${_PTH}|"
unicorn:~$ declare -p _PTH_AR
declare -a _PTH_AR=([0]="83847547" [1]="2" 
[2]="dli.ernet.449320/449320-Seduction Of The Innocent_text.pdf")

That, I believe, is what you were trying to accomplish.  Note that I
added a trailing | character on the <<< "${_PTH}|" command.  That's
because of this pitfall:

https://mywiki.wooledge.org/BashPitfalls#pf47

Now we just need to teach you to stop using _ALL_CAPS variable names,
especially ones with leading underscores.



Re: awk not just using the Field separator as such. it is using the blank space as well ...

2023-02-19 Thread Albretch Mueller
On 2/15/23, Greg Wooledge  wrote:
> If you want to read FIELDS of a SINGLE LINE as array elements, use
> read -ra:
>
> read -ra myarray <<< "$one_line"

 It didn't work. I tried different options. I am getting: "bash: read:
... : not a valid identifier"

 _PTH="83847547|2|dli.ernet.449320/449320-Seduction Of The Innocent_text.pdf"
 echo "// __ \$_PTH: \"${_PTH}\""

# read -ra -d "\\|" _PTH_AR <<< "${_PTH}"
# read -ra -d "\|" _PTH_AR <<< "${_PTH}"
# read -ra -d "|" _PTH_AR <<< "${_PTH}"

# read -ra -d '\\|' _PTH_AR <<< "${_PTH}"
# read -ra -d '\|' _PTH_AR <<< "${_PTH}"
# read -ra -d '|' _PTH_AR <<< "${_PTH}"

 _PTH_AR_L=${#_PTH_AR[@]}
 echo "// __ \$_PTH_AR_L: |${_PTH_AR_L}|, \"${_PTH}\""

 The reason why I use pipes as field delimiter is because it is an
excellent meta character when you are working with filesystems. Pipes
would not accepted for files or directory names for good reasons,
anyway.



Re: awk not just using the Field separator as such. it is using the blank space as well ...

2023-02-15 Thread Greg Wooledge
On Wed, Feb 15, 2023 at 12:09:28PM +, Albretch Mueller wrote:
> On 2/15/23, DdB  wrote:
> > $ echo "Adams, Fred, and Ken Aizawa \"The Bounds of Cognition\"" | awk
> > -F'\"' '{for (i=1; i<=NF; i++) print $i;}'
> > Adams, Fred, and Ken Aizawa
> > The Bounds of Cognition
> 
>  yes and this also works:
> 
> _L="Adams, Fred, and Ken Aizawa \"The Bounds of Cognition\""
> echo "${_L}" | awk -F'\"' '{for (i=1; i<=NF; i++) print $i;}'
> Adams, Fred, and Ken Aizawa
> The Bounds of Cognition
> 
>  but I wasn't able to write the output into an array

If you want to read LINES of a STREAM as array elements, use mapfile:

mapfile -t myarray < <(
printf '%s\n' "$stuff" | awk -F'\"' '...'
)

If you want to read FIELDS of a SINGLE LINE as array elements, use
read -ra:

read -ra myarray <<< "$one_line"

Note the caveats associated with each of these, especially the second
one.  Very few things in bash ever work as you expect once you start
poking at the corner cases.

https://mywiki.wooledge.org/BashPitfalls#pf47



Re: awk not just using the Field separator as such. it is using the blank space as well ...

2023-02-15 Thread Albretch Mueller
On 2/15/23, DdB  wrote:
> $ echo "Adams, Fred, and Ken Aizawa \"The Bounds of Cognition\"" | awk
> -F'\"' '{for (i=1; i<=NF; i++) print $i;}'
> Adams, Fred, and Ken Aizawa
> The Bounds of Cognition

 yes and this also works:

_L="Adams, Fred, and Ken Aizawa \"The Bounds of Cognition\""
echo "${_L}" | awk -F'\"' '{for (i=1; i<=NF; i++) print $i;}'
Adams, Fred, and Ken Aizawa
The Bounds of Cognition

 but I wasn't able to write the output into an array

> $ awk --version

 I also discovered that there seems to be something wrong with the
version of awk I am working:

$ awk --version
awk: not an option: --version

$ which awk
/usr/bin/awk

$ awk -W version
mawk 1.3.4 20200120
Copyright 2008-2019,2020, Thomas E. Dickey
Copyright 1991-1996,2014, Michael D. Brennan

random-funcs:   srandom/random
regex-funcs:internal
compiled limits:
sprintf buffer  8192
maximum-integer 2147483647
$

On 2/15/23, David  wrote:
> Start reading here:
>   http://mywiki.wooledge.org/BashFAQ/005

 which helped me find a hack around it I am comfortable with:

_DT=$(date +%Y%m%d%H%M%S)
_TMPFL=$(basename "$(pwd)")_$(mktemp ${_DT}.XX)

_L="Adams, Fred, and Ken Aizawa \"The Bounds of Cognition\""
echo "${_L}" | awk -F'\"' '{for (i=1; i<=NF; i++) print $i;}' > "${_TMPFL}"

mapfile -t _AR < "${_TMPFL}"
_AR_L=${#_AR[@]}
echo "// __ \$_AR_L: |${_AR_L}|"

rm --force --verbose "${_TMPFL}"

 I think the problem is whatever bash is using as "awk" is also
including a blank space as delimiter for the splitting of the string

lbrtchx



Re: awk not just using the Field separator as such. it is using the blank space as well ...

2023-02-15 Thread David
On Wed, 15 Feb 2023 at 18:22, DdB
 wrote:
> Am 15.02.2023 um 07:25 schrieb Albretch Mueller:

> > $ _L="Adams, Fred, and Ken Aizawa \"The Bounds of Cognition\""
> > echo "// __ \$_L: |${_L}|"
> > _AR=($(echo "${_L}" | awk -F'\"' '{for (i=1; i<=NF; i++) print $i}' ))
> > _AR_L=${#_AR[@]}
> > echo "// __ \$_AR_L: |${_AR_L}|"
> > for(( _IX=0; _IX<${_AR_L}; _IX++ )); do
> >  echo "// __ [$_IX/$_AR_L): |${_AR[$_IX]}|"
> > done

> what awk are you using? gnu awk works fine. see:

The complaint has nothing to do with awk.

The reason this is happening is because when the
shell creates the elements of the array _AR, it
parses those elements as separated by any whitespace.

Whereas the OP expects the elements to be
separated by newlines.

Just looking at this made my eyes bleed so that,
combined with the total lack of troubleshooting
effort, means that my answer ends as follows:

Start reading here:
  http://mywiki.wooledge.org/BashFAQ/005



Re: awk not just using the Field separator as such. it is using the blank space as well ...

2023-02-14 Thread DdB
Am 15.02.2023 um 08:21 schrieb DdB:
> $ awk --version
> GNU Awk 4.2.1, API: 2.0 (GNU MPFR 4.0.2, GNU MP 6.1.2)
> Copyright © 1989, 1991-2018 Free Software Foundation.

even mawk would. see:
$ mawk -W version

compiled limits:
max NF 32767
sprintf buffer  2040

$ echo "Adams, Fred, and Ken Aizawa \"The Bounds of Cognition\"" | mawk
-F'\"' '{for (i=1; i<=NF; i++) print $i;}'
Adams, Fred, and Ken Aizawa
The Bounds of Cognition




Re: awk not just using the Field separator as such. it is using the blank space as well ...

2023-02-14 Thread DdB
Am 15.02.2023 um 07:25 schrieb Albretch Mueller:
> $ _L="Adams, Fred, and Ken Aizawa \"The Bounds of Cognition\""
> echo "// __ \$_L: |${_L}|"
> _AR=($(echo "${_L}" | awk -F'\"' '{for (i=1; i<=NF; i++) print $i}' ))
> _AR_L=${#_AR[@]}
> echo "// __ \$_AR_L: |${_AR_L}|"
> for(( _IX=0; _IX<${_AR_L}; _IX++ )); do
>  echo "// __ [$_IX/$_AR_L): |${_AR[$_IX]}|"
> done
what awk are you using? gnu awk works fine. see:

$ echo "Adams, Fred, and Ken Aizawa \"The Bounds of Cognition\"" | awk
-F'\"' '{for (i=1; i<=NF; i++) print $i;}'
Adams, Fred, and Ken Aizawa
The Bounds of Cognition

$ awk --version
GNU Awk 4.2.1, API: 2.0 (GNU MPFR 4.0.2, GNU MP 6.1.2)
Copyright © 1989, 1991-2018 Free Software Foundation.

Dieses Programm ist Freie Software. Sie können es unter den Bedingungen
der von der Free Software Foundation veröffentlichten GNU
General Public License weitergeben und/oder ändern.
Es gilt Version 2 dieser Lizenz oder (nach Ihrer Wahl) irgendeine
spätere Version.

Dieses Programm wird weitergegeben in der Hoffnung, dass es nützlich ist,
aber OHNE JEDE GEWÄHRLEISTUNG; nicht einmal mit der impliziten Gewähr-
leistung einer HANDELBARKEIT oder der EIGNUNG FÜR EINEN BESTIMMTEN ZWECK.
Sehen Sie bitte die GNU General Public License für weitere Details.
Sie sollten eine Kopie der GNU General Publice License zusammen mit
diesem Programm erhalten haben. Wenn nicht, lesen Sie bitte
http://www.gnu.org/licenses/.



awk not just using the Field separator as such. it is using the blank space as well ...

2023-02-14 Thread Albretch Mueller
 Once again one of my silly problems ;-). I search and search for an
answer/the reason why this is happening.

$ _L="Adams, Fred, and Ken Aizawa \"The Bounds of Cognition\""
echo "// __ \$_L: |${_L}|"
_AR=($(echo "${_L}" | awk -F'\"' '{for (i=1; i<=NF; i++) print $i}' ))
_AR_L=${#_AR[@]}
echo "// __ \$_AR_L: |${_AR_L}|"
for(( _IX=0; _IX<${_AR_L}; _IX++ )); do
 echo "// __ [$_IX/$_AR_L): |${_AR[$_IX]}|"
done
// __ $_L: |Adams, Fred, and Ken Aizawa "The Bounds of Cognition"|
// __ $_AR_L: |9|
// __ [0/9): |Adams,|
// __ [1/9): |Fred,|
// __ [2/9): |and|
// __ [3/9): |Ken|
// __ [4/9): |Aizawa|
// __ [5/9): |The|
// __ [6/9): |Bounds|
// __ [7/9): |of|
// __ [8/9): |Cognition|
$

 This is the result I am looking for (probably the last empty string
could be discarded):

 // __ $_L: |Adams, Fred, and Ken Aizawa "The Bounds of Cognition"|
// __ $_AR_L: |3|
// __ [0/3): |Adams, Fred, and Ken Aizawa |
// __ [1/3): |The Bounds of Cognition|
// __ [2/3): ||

 lbrtchx