Re: I can't understand re.sub

2015-12-01 Thread Erik

On 01/12/15 05:28, Jussi Piitulainen wrote:

A real solution should be aware of the actual structure of those lines,
assuming they follow some defined syntax.


I think that we are in violent agreement on this ;)

E.

--
https://mail.python.org/mailman/listinfo/python-list


Re: I can't understand re.sub

2015-11-30 Thread Erik

On 29/11/15 21:36, Mr Zaug wrote:

I need to use re.sub to replace strings in a text file.


Do you? Is there any other way?


result = re.sub(pattern, repl, string, count=0, flags=0);

I think I understand that pattern is the regex I'm searching for and
repl is the thing I want to substitute for whatever pattern finds but
what is string?


Where do you think the function gets the string you want to transform from?


This should be simple, right?


It is. And it could be even simpler if you don't bother with regexes at 
all (if your input is as fixed as you say it is):


>>> foo = "foo bar baz spam CONTENT_PATH bar spam"
>>> ' Substitute '.join(foo.split(' CONTENT_PATH ', 1))
'foo bar baz spam Substitute bar spam'
>>>

E.
--
https://mail.python.org/mailman/listinfo/python-list


Re: I can't understand re.sub

2015-11-30 Thread Erik

On 30/11/15 08:51, Jussi Piitulainen wrote:

Surely the straight thing to say is:

>>> foo.replace(' CONTENT_PATH ', ' Substitute ')
'foo bar baz spam Substitute bar spam'


Not quite the same thing (but yes, with a third argument of 1, it would be).


But there was no guarantee of spaces around the target.


I know. It was just an example to show that there might be an option 
that's not a regex for the specific use indicated. It's up to the OP to 
decide whether they think the spaces (or any other, or no, delimiter) 
would actually be required or useful. Or whether they really prefer a 
regex after all.



If you wish to,
say, replace "spam" in your foo with "REDACTED" but leave it intact in
"May the spammer be prosecuted", a regex might be attractive after all.


But that's not what the OP said they wanted to do. They said everything 
was very fixed - they did not want a general purpose human language text 
processing solution ... ;)


E.
--
https://mail.python.org/mailman/listinfo/python-list


Re: I can't understand re.sub

2015-11-30 Thread Jussi Piitulainen
Erik writes:
> On 30/11/15 08:51, Jussi Piitulainen wrote:
[- -]
>> If you wish to,
>> say, replace "spam" in your foo with "REDACTED" but leave it intact in
>> "May the spammer be prosecuted", a regex might be attractive after all.
>
> But that's not what the OP said they wanted to do. They said
> everything was very fixed - they did not want a general purpose human
> language text processing solution ... ;)

Language processing is not what I had in mind here. Merely this, that
there is some sort of word boundary, be it punctuation, whitespace, or
an end of the string:

   >>> re.sub(r'\bspam\b', '', 'spamalot spam')
   'spamalot '

That's not perfect either, but it's simple and might be somewhat
proportional to the problem.

A real solution should be aware of the actual structure of those lines,
assuming they follow some defined syntax.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: I can't understand re.sub

2015-11-30 Thread Jussi Piitulainen
Erik writes:

> On 29/11/15 21:36, Mr Zaug wrote:
>> This should be simple, right?
>
> It is. And it could be even simpler if you don't bother with regexes
> at all (if your input is as fixed as you say it is):
>
> >>> foo = "foo bar baz spam CONTENT_PATH bar spam"
> >>> ' Substitute '.join(foo.split(' CONTENT_PATH ', 1))
> 'foo bar baz spam Substitute bar spam'

Surely the straight thing to say is:

   >>> foo.replace(' CONTENT_PATH ', ' Substitute ')
   'foo bar baz spam Substitute bar spam'

But there was no guarantee of spaces around the target. If you wish to,
say, replace "spam" in your foo with "REDACTED" but leave it intact in
"May the spammer be prosecuted", a regex might be attractive after all.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: I can't understand re.sub

2015-11-29 Thread Denis McMahon
On Sun, 29 Nov 2015 13:36:57 -0800, Mr Zaug wrote:

> result = re.sub(pattern, repl, string, count=0, flags=0);

re.sub works on a string, not on a file.

Read the file to a string, pass it in as the string.

Or pre-compile the search pattern(s) and process the file line by line:

import re

patts = [
 (re.compile("axe"), "hammer"),
 (re.compile("cat"), "dog"),
 (re.compile("tree"), "fence")
 ]

with open("input.txt","r") as inf, open("output.txt","w") as ouf:
line = inf.readline()
for patt in patts:
line = patt[0].sub(patt[1], line)
ouf.write(line)

Not tested, but I think it should do the trick.

Or use a single patt and a replacement func:

import re

patt = re.compile("(axe)|(cat)|(tree)")

def replfunc(match):
if match == 'axe':
return 'hammer'
if match == 'cat':
return 'dog'
if match == 'tree':
return 'fence'
return match

with open("input.txt","r") as inf, open("output.txt","w") as ouf:
line = inf.readline()
line = patt.sub(replfunc, line)
ouf.write(line)

(also not tested)

-- 
Denis McMahon, denismfmcma...@gmail.com
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: I can't understand re.sub

2015-11-29 Thread Mr Zaug
Thanks. That does help quite a lot.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: I can't understand re.sub

2015-11-29 Thread Rick Johnson
On Sunday, November 29, 2015 at 3:37:34 PM UTC-6, Mr Zaug wrote:

> The items I'm searching for are few and they do not change. They are 
> "CONTENT_PATH", "ENV" and "NNN". These appear on a few lines in a template 
> file. They do not appear together on any line and they only appear once on 
> each line. This should be simple, right?

Yes. In fact so simple that string methods and a "for loop" will suffice. Using 
regexps for this tasks would be like using a dump truck to haul a teaspoon of 
salt.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: I can't understand re.sub

2015-11-29 Thread Mr Zaug
On Sunday, November 29, 2015 at 8:12:25 PM UTC-5, Rick Johnson wrote:
> On Sunday, November 29, 2015 at 3:37:34 PM UTC-6, Mr Zaug wrote:
> 
> > The items I'm searching for are few and they do not change. They are 
> > "CONTENT_PATH", "ENV" and "NNN". These appear on a few lines in a template 
> > file. They do not appear together on any line and they only appear once on 
> > each line. This should be simple, right?
> 
> Yes. In fact so simple that string methods and a "for loop" will suffice. 
> Using regexps for this tasks would be like using a dump truck to haul a 
> teaspoon of salt.

I rarely get a chance to do any scripting so yeah, I stink at it.

Ideally I would have a script that will spit out a config file such as 
087_pre-prod_snakeoil_farm.any and not need to manually rename said output file.
-- 
https://mail.python.org/mailman/listinfo/python-list