On 01Nov2016 20:18, bruce <badoug...@gmail.com> wrote:
Running a test on a linux box, with python.
Trying to do a search/replace over a file, for a given string, and
replacing the string with a chunk of text that has multiple lines.

From the cmdline, using sed, no prob. however, implementing sed, runs
into issues, that result in a "termination error"

Just terminology: you're not "implementing sed", which is a nontrivial task that would involve writing a python program that could do everything sed does. You're writing a small python program to call sed to do the work.

Further discussion below.

The error gets thrown, due to the "\" of the newline. SO, and other
sites have plenty to say about this, but haven't run across any soln.

The test file contains 6K lines, but, the process requires doing lots
of search/replace operations, so I'm interested in testing this method
to see how "fast" the overall process is.

The following psuedo code is what I've used to test. The key point
being changing the "\n" portion to try to resolved the termination
error.

import subprocess

ll_="ffdfdfdfghhhh"
ll2_="12112121212121212"
hash="aaaaa"

data_=ll_+"\n"+ll2_+"\n"+qq22_
print data_

Presuming qq22_ is not shown.

cc='sed -i "s/'+hash+'/'+data_+'/g" '+dname
print cc
proc=subprocess.Popen(cc, shell=True,stdout=subprocess.PIPE)
res=proc.communicate()[0].strip()

There are two fairly large problems with this program. The first is your need to embed newlines in the replacement pattern. You have genuine newlines in your string, but a sed command would look like this:

 sed 's/aaaaa/ffdfdfdfghhhh\
 12112121212121212\
 qqqqq/g'

so you need to replace the newlines with "backslash and newline".

Fortunately strings have a .replace() method which you can use for this purpose. Look it up:

 https://docs.python.org/3/library/stdtypes.html#str.replace

You can use it to make data_ how you want it to be for the command.

The second problem is that you're then trying to invoke sed by constructing a shell command string and handing that to Popen. This means that you need to embed shell syntax in that string to quote things like the sed command. All very messy.

It is better to _bypass_ the shell and invoke sed directory by leaving out the "shell=True" parameter. All the command line (which is the shell) is doing is honouring the shell quoting and constructing a sed invocation as distinct strings:

 sed
 -i
 s/this/that/g
 filename

You want to do the equivalent in python, something like this:

 sed_argv = [ 'sed', '-i', 's/'+hash+'/'+data_+'/g', dname ]
 proc=subprocess.Popen(sed_argv, stdout=subprocess.PIPE)

See how you're now unconcerned by any difficulties around shell quoting? You're now dealing directly in strings.

There are a few other questions, such as: if you're using sed's -i option, why is stdout a pipe? And what if hash or data_ contain slashes, which you are using in sed to delimit them?

Hoping this will help you move forward.

Cheers,
Cameron Simpson <c...@zip.com.au>
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to