Re: [Tutor] implementing sed - termination error

2016-11-02 Thread Peter Otten
bruce wrote:

> Hi
> 
> Running a test on a linux box, with python.
> 
> Trying to do a search/replace over a file, for a given string, and
> replacing the string with a chunk of text that has multiple lines.
> 
> From the cmdline, using sed, no prob. however, implementing sed, runs
> into issues, that result in a "termination error"
> 
> The error gets thrown, due to the "\" of the newline. SO, and other
> sites have plenty to say about this, but haven't run across any soln.
> 
> The test file contains 6K lines, but, the process requires doing lots
> of search/replace operations, so I'm interested in testing this method
> to see how "fast" the overall process is.
> 
> The following psuedo code is what I've used to test. The key point
> being changing the "\n" portion to try to resolved the termination
> error.

Here's a self-contained example that demonstrates that the key change is to 
avoid shell=True. 

$ cat input.txt
foo
alpha
beta foo gamma
epsilon
foo zeta
$ sed s/foo/bar\\nbaz/g input.txt
bar
baz
alpha
beta bar
baz gamma
epsilon
bar
baz zeta
$ python3
Python 3.4.3 (default, Sep 14 2016, 12:36:27) 
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import subprocess
>>> subprocess.call(["sed", "s/foo/bar\\nbaz/g", "input.txt"])
bar
baz 
  
alpha   
  
beta bar
  
baz gamma   
  
epsilon 
  
bar 
  
baz zeta
  
0   
  

Both the shell and Python require you to escape, so if you use one after the 
other you have to escape the escapes; but with only one level of escapes and 
a little luck you need not make any changes between Python and the shell.



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] implementing sed - termination error

2016-11-01 Thread cs

On 01Nov2016 20:18, bruce  wrote:

Running a test on a linux box, with python.
Trying to do a search/replace over a file, for a given string, and
replacing the string with a chunk of text that has multiple lines.

From the cmdline, using sed, no prob. however, implementing sed, runs
into issues, that result in a "termination error"


Just terminology: you're not "implementing sed", which is a nontrivial task 
that would involve writing a python program that could do everything sed does.  
You're writing a small python program to call sed to do the work.


Further discussion below.


The error gets thrown, due to the "\" of the newline. SO, and other
sites have plenty to say about this, but haven't run across any soln.

The test file contains 6K lines, but, the process requires doing lots
of search/replace operations, so I'm interested in testing this method
to see how "fast" the overall process is.

The following psuedo code is what I've used to test. The key point
being changing the "\n" portion to try to resolved the termination
error.

import subprocess

ll_="ffdfdfdfg"
ll2_="12112121212121212"
hash="a"

data_=ll_+"\n"+ll2_+"\n"+qq22_
print data_


Presuming qq22_ is not shown.


cc='sed -i "s/'+hash+'/'+data_+'/g" '+dname
print cc
proc=subprocess.Popen(cc, shell=True,stdout=subprocess.PIPE)
res=proc.communicate()[0].strip()


There are two fairly large problems with this program. The first is your need 
to embed newlines in the replacement pattern. You have genuine newlines in your 
string, but a sed command would look like this:


 sed 's/a/ffdfdfdfg\
 12112121212121212\
 q/g'

so you need to replace the newlines with "backslash and newline".

Fortunately strings have a .replace() method which you can use for this 
purpose. Look it up:


 https://docs.python.org/3/library/stdtypes.html#str.replace

You can use it to make data_ how you want it to be for the command.

The second problem is that you're then trying to invoke sed by constructing a 
shell command string and handing that to Popen. This means that you need to 
embed shell syntax in that string to quote things like the sed command. All 
very messy.


It is better to _bypass_ the shell and invoke sed directory by leaving out the 
"shell=True" parameter. All the command line (which is the shell) is doing is 
honouring the shell quoting and constructing a sed invocation as distinct 
strings:


 sed
 -i
 s/this/that/g
 filename

You want to do the equivalent in python, something like this:

 sed_argv = [ 'sed', '-i', 's/'+hash+'/'+data_+'/g', dname ]
 proc=subprocess.Popen(sed_argv, stdout=subprocess.PIPE)

See how you're now unconcerned by any difficulties around shell quoting? You're 
now dealing directly in strings.


There are a few other questions, such as: if you're using sed's -i option, why 
is stdout a pipe? And what if hash or data_ contain slashes, which you are 
using in sed to delimit them?


Hoping this will help you move forward.

Cheers,
Cameron Simpson 
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] implementing sed - termination error

2016-11-01 Thread Alan Gauld via Tutor
On 02/11/16 00:18, bruce wrote:

> Trying to do a search/replace over a file, for a given string, and
> replacing the string with a chunk of text that has multiple lines.
> 
> From the cmdline, using sed, no prob. however, implementing sed, runs
> into issues, that result in a "termination error"

I don;t understand what you mean by that last paragraph.
"using sed, no prob" implies you know the command you want
to run because you got it to work on the command line?
If that's correct can you share the exact command you
typed at the command line that worked?

"implementing sed" implies you are trying to write the
sed tool in Python. but your code suggests you are trying
to run sed from within a Python script - very different.

> The error gets thrown, due to the "\" of the newline. 

That sounds very odd. What leads you to that conclusion?
For that matter which \ or newline?
In which string - the search string, the replacement
string or the file content?

> The test file contains 6K lines, but, the process requires doing lots
> of search/replace operations, so I'm interested in testing this method
> to see how "fast" the overall process is.

I'm not sure what you are testing? Is it the sed tool itself?
Or is it the Python script that runs sed? Or something else?

> The following psuedo code is what I've used to test. 

Pseudo code is fine to explain complex algorithms but
in this case the actual code is probably more useful.

> The key point
> being changing the "\n" portion to try to resolved the termination
> error.

Again, I don't really understand what you mean by that.


> import subprocess
> 
> ll_="ffdfdfdfg"
> ll2_="12112121212121212"
> hash="a"
> 
> data_=ll_+"\n"+ll2_+"\n"+qq22_
> print data_
> 
> cc='sed -i "s/'+hash+'/'+data_+'/g" '+dname
> print cc

I assume dname is your file?
I'd also use string formatting to construct the command,
simply because sed uses regex and a lot of + signs looks
like a regex so it is confusing (to me at least).
But see the comment below about Popen args.

> 
> proc=subprocess.Popen(cc, shell=True,stdout=subprocess.PIPE)
> res=proc.communicate()[0].strip()
> 
> 
> 
> ===
> error
> sed: -e expression #1, char 38: unterminated `s' command

My first instinct when dealing with subprocess errors is to set
shell=False to ensure the shell isn't messing about with my inputs.
What happens if you set shell false?

I'd also tend to put the sed arguments into a list rather
than pass a single string.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] implementing sed - termination error

2016-11-01 Thread bruce
Hi

Running a test on a linux box, with python.

Trying to do a search/replace over a file, for a given string, and
replacing the string with a chunk of text that has multiple lines.

>From the cmdline, using sed, no prob. however, implementing sed, runs
into issues, that result in a "termination error"

The error gets thrown, due to the "\" of the newline. SO, and other
sites have plenty to say about this, but haven't run across any soln.

The test file contains 6K lines, but, the process requires doing lots
of search/replace operations, so I'm interested in testing this method
to see how "fast" the overall process is.

The following psuedo code is what I've used to test. The key point
being changing the "\n" portion to try to resolved the termination
error.


import subprocess


ll_="ffdfdfdfg"
ll2_="12112121212121212"
hash="a"

data_=ll_+"\n"+ll2_+"\n"+qq22_
print data_

cc='sed -i "s/'+hash+'/'+data_+'/g" '+dname
print cc

proc=subprocess.Popen(cc, shell=True,stdout=subprocess.PIPE)
res=proc.communicate()[0].strip()



===
error
sed: -e expression #1, char 38: unterminated `s' command
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor