Re: Handling files with CRLF line ending

2022-12-08 Thread Chris Elvidge

On 08/12/2022 19:34, Ángel wrote:

On 2022-12-07 at 12:38 +, Chris Elvidge wrote:

I don't use Python generally, but my understanding of it (only a
quick test)

f = open("demofile2.txt", "a")
f.write("Now the file has more content!")
f.close()

f.write doesn't append either \r\n or \n unless specified.

f.write("Now the file has more content!\n") does, in Windows, seem
to
append \r\n. In Linux (Slackware) and WSL (Slackware), it only
appends \n.

So it does seem to be a "problem" with Windows rather than a problem
with Python.

However, I may be wrong!!


The open() function in python has a newline parameter which can be
None, '', '\n', '\r', and '\r\n'.

When writing output to the stream, if newline is None (the default),
any '\n' characters written are translated to the system default line
separator (os.linesep).

Details on https://docs.python.org/3/library/functions.html#open

Thus, adding newline='\n' to your python open() calls would create the
files with unix newlines.


For bash itself, it's a bit more convoluted, as it will depend on the C
library used (which might be doing newline translation or not) that
willl be bridging between the POSIX API and Windows native functions.

Regards







I didn't know that; as I said, I don't generally use Python. YThanks for 
the heads-up.



--
Chris Elvidge
England




Re: Handling files with CRLF line ending

2022-12-08 Thread Ángel
On 2022-12-07 at 12:38 +, Chris Elvidge wrote:
> I don't use Python generally, but my understanding of it (only a
> quick test)
> 
> f = open("demofile2.txt", "a")
> f.write("Now the file has more content!")
> f.close()
> 
> f.write doesn't append either \r\n or \n unless specified.
> 
> f.write("Now the file has more content!\n") does, in Windows, seem
> to 
> append \r\n. In Linux (Slackware) and WSL (Slackware), it only
> appends \n.
> 
> So it does seem to be a "problem" with Windows rather than a problem 
> with Python.
> 
> However, I may be wrong!!

The open() function in python has a newline parameter which can be
None, '', '\n', '\r', and '\r\n'.

When writing output to the stream, if newline is None (the default),
any '\n' characters written are translated to the system default line
separator (os.linesep).

Details on https://docs.python.org/3/library/functions.html#open

Thus, adding newline='\n' to your python open() calls would create the
files with unix newlines.


For bash itself, it's a bit more convoluted, as it will depend on the C
library used (which might be doing newline translation or not) that
willl be bridging between the POSIX API and Windows native functions.

Regards






Re: Handling files with CRLF line ending

2022-12-07 Thread Chris Elvidge

On 06/12/2022 23:39, L A Walsh wrote:

On 2022/12/06 10:57, Chris Elvidge wrote:

Yair, how about using the Python installed in the WSL instance.

---
Oh, I wondered why Python used CRLF, but nothing else did.

What version of python are you using?  The Python for WSL,
the python for cygwin, or the python for Windows?  If you are
using python for Windows, I'd *sorta* expect it to use CRLF, but
would expect WSL or Cygwin versions to use just 'LF'.  Similarly w/bash --
I haven't tested it, but I'd expect bash compiled for windows
(using mingw toolchain) to use CRLF, but LF for WSL or Cygwin.

Are you using both tools for the same OS and subsys and having
them conflict?






I don't use Python generally, but my understanding of it (only a quick test)

f = open("demofile2.txt", "a")
f.write("Now the file has more content!")
f.close()

f.write doesn't append either \r\n or \n unless specified.

f.write("Now the file has more content!\n") does, in Windows, seem to 
append \r\n. In Linux (Slackware) and WSL (Slackware), it only appends \n.


So it does seem to be a "problem" with Windows rather than a problem 
with Python.


However, I may be wrong!!

Reading the file created in Python/Windows with bash read in linux:
read 

Re: Handling files with CRLF line ending

2022-12-06 Thread Koichi Murase
2022年12月7日(水) 8:40 L A Walsh :
> [...]  Similarly w/bash --
> I haven't tested it, but I'd expect bash compiled for windows
> (using mingw toolchain) to use CRLF, but LF for WSL or Cygwin.

I think there is actually no Bash compiled for Windows (i.e., the pure
Windows API on the Windows subsystem). The Bash that comes with the
MinGW toolchain is linked with msys-2.0.dll (in the case of MSYS2),
which means that the POSIX layer Bash relies on is provided by MSYS
which is a minimized fork of Cygwin. The MSYS Bash treats LF as the
newline but not CRLF.

> Are you using both tools for the same OS and subsys and having
> them conflict?

I think so. I think this means that the reported configuration is
wrong or, at least, very unusual. I don't think we should add in Bash
an option that is only meaningful in a specific non-unix-like
operating system for a heterogeneous amalgam of programs from
different subsystems. That option is practically useless in all of the
major Unix-like systems.

If something would be modified at the side of Bash, maybe there is a
chance that Bash of the Cygwin/MSYS packages could be patched like
`shopt -s completion_strip_exe'. But even with that case, a question
is why filtering by `tr' is not an option. The answer seemed to be to
make the program work unmodified, but I don't think we should expect
that the combination of programs from different subsystems will work
unmodified in general.

--
Koichi



Re: Handling files with CRLF line ending

2022-12-06 Thread L A Walsh

On 2022/12/06 10:57, Chris Elvidge wrote:

Yair, how about using the Python installed in the WSL instance.
  

---
   Oh, I wondered why Python used CRLF, but nothing else did.

   What version of python are you using?  The Python for WSL,
the python for cygwin, or the python for Windows?  If you are
using python for Windows, I'd *sorta* expect it to use CRLF, but
would expect WSL or Cygwin versions to use just 'LF'.  Similarly w/bash --
I haven't tested it, but I'd expect bash compiled for windows
(using mingw toolchain) to use CRLF, but LF for WSL or Cygwin.

Are you using both tools for the same OS and subsys and having
them conflict?





Re: Handling files with CRLF line ending

2022-12-06 Thread Chris Elvidge

On 06/12/2022 16:00, Dale R. Worley wrote:

It seems to me that there's more going on than first meets the eye.


Yes. Yair is trying to process text files written on a Windows system 
(line ending \r\n) on a Linux system (line ending \n). That Python wrote 
them is neither here nor there.


Windows text files have to be converted to Linux format before 
processing - either inline (tr -d '\r') or in mass (dos2unix).


Expecting bash to cope is a non-starter.

Yair, how about using the Python installed in the WSL instance.

--
Chris Elvidge
England




Re: Handling files with CRLF line ending

2022-12-06 Thread Yair Lenga
 Valid question.

I believe a major goal of bash will be to cross operate with other tools.
In this case, being able to read text files generated by python, when
running under WSL, seems like something bash should do.

On the question of minimal changes. I believe many bash users (some are not
hard core developers, just devops) are tasked with transfering existing
solutions to WSL. I am not aware of hard data, but I believe those are
underrepresented in this forum.

I admit no hard data to support any of those.

On Mon, Dec 5, 2022, 15:36 Chet Ramey  wrote:

> On 12/3/22 8:53 AM, Yair Lenga wrote:
> > Thank you for suggestions. I want to emphasize: I do not need help in
> > striping the CR from the input files - it's simple.
> >
> > The challenge is executing a working bash/python solution from Linux on
> > WSL, with MINIMAL changes to the scripts.
>
> That's certainly your priority. But is it a compelling enough reason to
> change bash to accomplish it?
>
> It seems easy enough to set up a pipeline on WSL to provide input in the
> form the script authors assume.
>
>
> --
> ``The lyf so short, the craft so long to lerne.'' - Chaucer
>  ``Ars longa, vita brevis'' - Hippocrates
> Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
>
>


Re: Handling files with CRLF line ending

2022-12-05 Thread Chet Ramey

On 12/3/22 8:53 AM, Yair Lenga wrote:

Thank you for suggestions. I want to emphasize: I do not need help in
striping the CR from the input files - it's simple.

The challenge is executing a working bash/python solution from Linux on
WSL, with MINIMAL changes to the scripts.


That's certainly your priority. But is it a compelling enough reason to
change bash to accomplish it?

It seems easy enough to set up a pipeline on WSL to provide input in the
form the script authors assume.


--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/




Re: Handling files with CRLF line ending

2022-12-03 Thread Yair Lenga
Thank you for suggestions. I want to emphasize: I do not need help in
striping the CR from the input files - it's simple.

The challenge is executing a working bash/python solution from Linux on
WSL, with MINIMAL changes to the scripts.

Specifically in my case, the owners of the various modules are working in
Linux. They are research people, with no access to window dev boxes. I
would also mention: the research people have little interest in
cross-platform portability issues.

Yair

On Sat, Dec 3, 2022 at 8:44 AM Greg Wooledge  wrote:

> On Sat, Dec 03, 2022 at 05:40:02AM -0500, Yair Lenga wrote:
> > I was recently asked to deploy a bash/python based solution to windows
> > (WSL2).  The solution was developed on Linux. Bash is being used as a
> glue
> > to connect the python based data processing (pipes, files, ...).
> Everything
> > works as expected with a small BUT: files created by python can not be
> read
> > by bash `read` and `readarray`.
> >
> > The root cause is the CRLF line ending ("\r\n") - python on windows uses
> > the platform CRLF line ending (as opposed to LF line ending for Linux).
>
> The files can be read.  You just need to remove the CR yourself.  Probably
> the *easiest* way would be to replace constructs like this:
>
> readarray -t myarray < "$myfile"
>
> with this:
>
> readarray -t myarray < <(tr -d \\r < "$myfile")
>
> And replace constructs like this:
>
> while read -r line; do
> ...
> done < "$myfile"
>
> with either this:
>
> while read -r line; do
> ...
> done < <(tr -d \\r < "$myfile")
>
> or this:
>
> while read -r line; do
> line=${line%$'\r'}
> ...
> done < "$myfile"
>
> > The short term (Dirty, but very quick) solution was to add dos2unix pipe
> > when reading the files.
>
> dos2unix wants to "edit" the files in place.  It's not a filter.
> I'd steer clear of dos2unix, unless that's what you truly want.  Also,
> dos2unix isn't a standard utility, so it might not even be present on
> the target system.
>


Re: Handling files with CRLF line ending

2022-12-03 Thread Greg Wooledge
On Sat, Dec 03, 2022 at 05:40:02AM -0500, Yair Lenga wrote:
> I was recently asked to deploy a bash/python based solution to windows
> (WSL2).  The solution was developed on Linux. Bash is being used as a glue
> to connect the python based data processing (pipes, files, ...). Everything
> works as expected with a small BUT: files created by python can not be read
> by bash `read` and `readarray`.
> 
> The root cause is the CRLF line ending ("\r\n") - python on windows uses
> the platform CRLF line ending (as opposed to LF line ending for Linux).

The files can be read.  You just need to remove the CR yourself.  Probably
the *easiest* way would be to replace constructs like this:

readarray -t myarray < "$myfile"

with this:

readarray -t myarray < <(tr -d \\r < "$myfile")

And replace constructs like this:

while read -r line; do
...
done < "$myfile"

with either this:

while read -r line; do
...
done < <(tr -d \\r < "$myfile")

or this:

while read -r line; do
line=${line%$'\r'}
...
done < "$myfile"

> The short term (Dirty, but very quick) solution was to add dos2unix pipe
> when reading the files.

dos2unix wants to "edit" the files in place.  It's not a filter.
I'd steer clear of dos2unix, unless that's what you truly want.  Also,
dos2unix isn't a standard utility, so it might not even be present on
the target system.



Handling files with CRLF line ending

2022-12-03 Thread Yair Lenga
Hi,

I was recently asked to deploy a bash/python based solution to windows
(WSL2).  The solution was developed on Linux. Bash is being used as a glue
to connect the python based data processing (pipes, files, ...). Everything
works as expected with a small BUT: files created by python can not be read
by bash `read` and `readarray`.

The root cause is the CRLF line ending ("\r\n") - python on windows uses
the platform CRLF line ending (as opposed to LF line ending for Linux).

The short term (Dirty, but very quick) solution was to add dos2unix pipe
when reading the files. However, I'm wonder about a better solution: add
"autocrlf" to basic input/output.

Basically, new option "autocrlf" (set -o autocrlf), which will allow bash
scripts (under Unix and Windows) to work with text files that have CRLF
line ending. The goal should be to minimize the number of changes that are
needed.

Possible better alternative will be use environment variable
("BASH_AUTOCRLF" ? ) that will perform the same, with the advantage that it
will be inherit by sub processes, making it possible to activate the
behavior, without having to modify the actual code. Huge plus in certain
situations.

Specifically:
* For "read": remove CR before line ending if autocrlf is on.
* For "readarray": the '-t' option should also strip CRLF line ending if
autocrlf is on.
* No impact on other commands, in particular echo/printf. It's impossible
to know if a specific printf/echo should produce CRLF, as it is unknown if
the program that will read the data is capable of handing CRLF line ending.

Feedback/comments.

Yair