Hello Spir, Alan, and Paul, Thank you for your help. I have been working on the file, but I still have a problem doing what I wanted. As a reminder,
I have #!usr/bin/python tags = { 'noun-prop': 'noun_prop null null'.split(), 'case_def_gen': 'case_def gen null'.split(), 'dem_pron_f': 'dem_pron f null'.split(), 'case_def_acc': 'case_def acc null'.split(), } TAB = '\t' def newlyTaggedWord(line): (word,tag) = line.split(TAB) # separate parts of line, keeping data only new_tags = tags[tag] # read in dict tagging = TAB.join(new_tags) # join with TABs return word + TAB + tagging # formatted result def replaceTagging(source_name, target_name): target_file = open(target_name, "w") # replacement loop for line in open(source_name, "r"): new_line = newlyTaggedWord(line) + '\n' target_file.write(new_line) target_file.close() if __name__ == "__main__": source_name = sys.argv[1] target_name = sys.argv[2] replaceTagging(source_name, target_name) On Mon, May 4, 2009 at 12:38 PM, <tutor-requ...@python.org> wrote: > Send Tutor mailing list submissions to > tutor@python.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://mail.python.org/mailman/listinfo/tutor > or, via email, send a message with subject or body 'help' to > tutor-requ...@python.org > > You can reach the person managing the list at > tutor-ow...@python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Tutor digest..." > > > Today's Topics: > > 1. Re: Iterating over a long list with regular expressions and > changing each item? (Paul McGuire) > 2. Advanced String Search using operators AND, OR etc.. (Alex Feddor) > 3. Re: Encode problem (Pablo P. F. de Faria) > 4. Re: Encode problem (Pablo P. F. de Faria) > 5. Re: Advanced String Search using operators AND, OR etc.. > (vince spicer) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 4 May 2009 11:17:53 -0500 > From: "Paul McGuire" <pt...@austin.rr.com> > Subject: Re: [Tutor] Iterating over a long list with regular > expressions and changing each item? > To: <tutor@python.org> > Message-ID: <99b447f3c7ef4996aa2ed683f1ee6...@awa2> > Content-Type: text/plain; charset="us-ascii" > > Original: > 'case_def_gen':['case_def','gen','null'], > 'nsuff_fem_pl':['nsuff','null', 'null'], > 'abbrev': ['abbrev, null, null'], > 'adj': ['adj, null, null'], > 'adv': ['adv, null, null'],} > > Note the values for 'abbrev', 'adj' and 'adv' are not lists, but strings > containing comma-separated lists. > > Should be: > 'case_def_gen':['case_def','gen','null'], > 'nsuff_fem_pl':['nsuff','null', 'null'], > 'abbrev': ['abbrev', 'null', 'null'], > 'adj': ['adj', 'null', 'null'], > 'adv': ['adv', 'null', 'null'],} > > For much of my own code, I find lists of string literals to be tedious to > enter, and easy to drop a ' character. This style is a little easier on > the > eyes, and harder to screw up. > > 'case_def_gen':['case_def gen null'.split()], > 'nsuff_fem_pl':['nsuff null null'.split()], > 'abbrev': ['abbrev null null'.split()], > 'adj': ['adj null null'.split()], > 'adv': ['adv null null'.split()],} > > Since all that your code does at runtime with the value strings is > "\t".join() them, then you might as well initialize the dict with these > computed values, for at least some small gain in runtime performance: > > T = lambda s : "\t".join(s.split()) > 'case_def_gen' : T('case_def gen null'), > 'nsuff_fem_pl' : T('nsuff null null'), > 'abbrev' : T('abbrev null null'), > 'adj' : T('adj null null'), > 'adv' : T('adv null null'),} > del T > > (Yes, I know PEP8 says *not* to add spaces to line up assignments or other > related values, but I think there are isolated cases where it does help to > see what's going on. You could even write this as: > > T = lambda s : "\t".join(s.split()) > 'case_def_gen' : T('case_def gen null'), > 'nsuff_fem_pl' : T('nsuff null null'), > 'abbrev' : T('abbrev null null'), > 'adj' : T('adj null null'), > 'adv' : T('adv null null'),} > del T > > and the extra spaces help you to see the individual subtags more easily, > with no change in the resulting values since split() splits on multiple > whitespace the same as a single space.) > > Of course you could simply code as: > > 'case_def_gen' : T('case_def\tgen\t null'), > 'nsuff_fem_pl' : T('nsuff\tnull\tnull'), > 'abbrev' : T('abbrev\tnull\tnull'), > 'adj' : T('adj\tnull\tnull'), > 'adv' : T('adv\tnull\tnull'),} > > But I think readability definitely suffers here, I would probably go with > the penultimate version. > > -- Paul > > > > > ------------------------------ > > Message: 2 > Date: Mon, 4 May 2009 14:45:06 +0200 > From: Alex Feddor <alex.fed...@gmail.com> > Subject: [Tutor] Advanced String Search using operators AND, OR etc.. > To: tutor@python.org > Message-ID: > <5bf184e30905040545i78bc75b8ic78eabf44a55a...@mail.gmail.com> > Content-Type: text/plain; charset="iso-8859-1" > > Hi > > I am looking for method enables advanced text string search. Method > string.find() or re module seems no supporting what I am looking for. The > idea is as follows: > > Text ="FDA meeting was successful. New drug is approved for whole sale > distribution!" > > I would like to scan the text using AND and OR operators and gets -1 or > other value if the searching elements haven't found in the text. > Example 01: > search criteria: "FDA" AND ( "approve*" OR "supported") > The catch is that in Text variable FDA and approve words are not one after > another (other words are in between). > Example 02: > search criteria: "Ben" > The catch is that code sould find only exact Ben words not also words which > that has firts three letters Ben such as Benquick, Benseek etc.. Only Ben > is > the right word we are looking for. > > I would really appreciated your advice - code sample / links how above can > be achieved! if possible I would appreciated solution achieved with free of > charge module. > > Cheers, Alex > PS: > A few moths ago I have discovered Python. I am amazed what all can be done > with it. Really cool programming language.. > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://mail.python.org/pipermail/tutor/attachments/20090504/bbd34b5a/attachment-0001.htm > > > > ------------------------------ > > Message: 3 > Date: Mon, 4 May 2009 11:09:25 -0300 > From: "Pablo P. F. de Faria" <pablofa...@gmail.com> > Subject: Re: [Tutor] Encode problem > To: Kent Johnson <ken...@tds.net> > Cc: *tutor python <tutor@python.org> > Message-ID: > <3ea81d4c0905040709m78a45d11j2037943380817...@mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > Thanks, Kent, but that doesn't solve my problem. In fact, I need > ConfigParser to work with non-ascii characters, since my App may run > in "latin-1" environments (folders e files names). I must find out why > the str() function in the module ConfigParser doesn't use the encoding > defined for the application (# -*- coding: utf-8 -*-). The rest of the > application works properly with utf-8, except for ConfigParser. What I > found out is that ConfigParser seems to make use of the configuration > in Site.py (which is set to 'ascii'), instead of the configuration > defined for the App (if I change . But this is very problematic to > have to change Site.py in every computer... So I wonder if there is a > way to replace the settings in Site.py only for my App. > > 2009/5/1 Kent Johnson <ken...@tds.net>: > > On Fri, May 1, 2009 at 4:54 PM, Pablo P. F. de Faria > > <pablofa...@gmail.com> wrote: > >> Hi, Kent. > >> > >> The stack trace is: > >> > >> Traceback (most recent call last): > >> ?File "/home/pablo/workspace/E-Dictor/src/MainFrame.py", line 1057, in > OnClose > >> ? ?self.SavePreferences() > >> ?File "/home/pablo/workspace/E-Dictor/src/MainFrame.py", line 1068, > >> in SavePreferences > >> ? ?self.cfg.set(u'File Settings',u'Recent files', > >> unicode(",".join(self.recent_files))) > >> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position > >> 12: ordinal not in range(128) > >> > >> The "unicode" function, actually doesn't do any difference... The > >> content of the string being saved is "/home/pablo/?rea de > >> Trabalho/teste.xml". > > > > OK, this error is in your code, not the ConfigParser. The problem is with > > ",".join(self.recent_files) > > > > Are the entries in self.recent_files unicode strings? If so, then I > > think the join is trying to convert to a string using the default > > codec. Try > > > > self.cfg.set('File Settings','Recent files', > > ','.join(name.encode('utf-8') for name in self.recent_files)) > > > > Looking at the ConfigParser.write() code, it wants the values to be > > strings or convertible to strings by calling str(), so non-ascii > > unicode values will be a problem there. I would use plain strings for > > all the interaction with ConfigParser and convert to Unicode yourself. > > > > Kent > > > > PS Please Reply All to reply to the list. > > > > > > -- > --------------------------------- > "Estamos todos na sarjeta, mas alguns de n?s olham para as estrelas." > (Oscar Wilde) > --------------------------------- > Pablo Faria > Mestrando em Aquisi??o de Linguagem - IEL/Unicamp > Bolsista t?cnico FAPESP no Projeto Padr?es R?tmicos e Mudan?a Ling??stica > (19) 3521-1570 > http://www.tycho.iel.unicamp.br/~pablofaria/<http://www.tycho.iel.unicamp.br/%7Epablofaria/> > pablofa...@gmail.com > > > ------------------------------ > > Message: 4 > Date: Mon, 4 May 2009 11:11:58 -0300 > From: "Pablo P. F. de Faria" <pablofa...@gmail.com> > Subject: Re: [Tutor] Encode problem > To: Kent Johnson <ken...@tds.net> > Cc: *tutor python <tutor@python.org> > Message-ID: > <3ea81d4c0905040711p62376925n26fb93a8955fe...@mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > Here is the traceback, after the last change you sugested: > > Traceback (most recent call last): > File "/home/pablo/workspace/E-Dictor/src/MainFrame.py", line 1057, in > OnClose > self.SavePreferences() > File "/home/pablo/workspace/E-Dictor/src/MainFrame.py", line 1069, > in SavePreferences > self.cfg.write(codecs.open(self.properties_file,'w','utf-8')) > File "/usr/lib/python2.5/ConfigParser.py", line 373, in write > (key, str(value).replace('\n', '\n\t'))) > File "/usr/lib/python2.5/codecs.py", line 638, in write > return self.writer.write(data) > File "/usr/lib/python2.5/codecs.py", line 303, in write > data, consumed = self.encode(object, self.errors) > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position > 27: ordinal not in range(128) > > So, in "str(value)" the content is a folder name with an accented character > (?). > > 2009/5/4 Pablo P. F. de Faria <pablofa...@gmail.com>: > > Thanks, Kent, but that doesn't solve my problem. In fact, I need > > ConfigParser to work with non-ascii characters, since my App may run > > in "latin-1" environments (folders e files names). I must find out why > > the str() function in the module ConfigParser doesn't use the encoding > > defined for the application (# -*- coding: utf-8 -*-). The rest of the > > application works properly with utf-8, except for ConfigParser. What I > > found out is that ConfigParser seems to make use of the configuration > > in Site.py (which is set to 'ascii'), instead of the configuration > > defined for the App (if I change . But this is very problematic to > > have to change Site.py in every computer... So I wonder if there is a > > way to replace the settings in Site.py only for my App. > > > > 2009/5/1 Kent Johnson <ken...@tds.net>: > >> On Fri, May 1, 2009 at 4:54 PM, Pablo P. F. de Faria > >> <pablofa...@gmail.com> wrote: > >>> Hi, Kent. > >>> > >>> The stack trace is: > >>> > >>> Traceback (most recent call last): > >>> ?File "/home/pablo/workspace/E-Dictor/src/MainFrame.py", line 1057, in > OnClose > >>> ? ?self.SavePreferences() > >>> ?File "/home/pablo/workspace/E-Dictor/src/MainFrame.py", line 1068, > >>> in SavePreferences > >>> ? ?self.cfg.set(u'File Settings',u'Recent files', > >>> unicode(",".join(self.recent_files))) > >>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position > >>> 12: ordinal not in range(128) > >>> > >>> The "unicode" function, actually doesn't do any difference... The > >>> content of the string being saved is "/home/pablo/?rea de > >>> Trabalho/teste.xml". > >> > >> OK, this error is in your code, not the ConfigParser. The problem is > with > >> ",".join(self.recent_files) > >> > >> Are the entries in self.recent_files unicode strings? If so, then I > >> think the join is trying to convert to a string using the default > >> codec. Try > >> > >> self.cfg.set('File Settings','Recent files', > >> ','.join(name.encode('utf-8') for name in self.recent_files)) > >> > >> Looking at the ConfigParser.write() code, it wants the values to be > >> strings or convertible to strings by calling str(), so non-ascii > >> unicode values will be a problem there. I would use plain strings for > >> all the interaction with ConfigParser and convert to Unicode yourself. > >> > >> Kent > >> > >> PS Please Reply All to reply to the list. > >> > > > > > > > > -- > > --------------------------------- > > "Estamos todos na sarjeta, mas alguns de n?s olham para as estrelas." > > (Oscar Wilde) > > --------------------------------- > > Pablo Faria > > Mestrando em Aquisi??o de Linguagem - IEL/Unicamp > > Bolsista t?cnico FAPESP no Projeto Padr?es R?tmicos e Mudan?a Ling??stica > > (19) 3521-1570 > > http://www.tycho.iel.unicamp.br/~pablofaria/<http://www.tycho.iel.unicamp.br/%7Epablofaria/> > > pablofa...@gmail.com > > > > > > -- > --------------------------------- > "Estamos todos na sarjeta, mas alguns de n?s olham para as estrelas." > (Oscar Wilde) > --------------------------------- > Pablo Faria > Mestrando em Aquisi??o de Linguagem - IEL/Unicamp > Bolsista t?cnico FAPESP no Projeto Padr?es R?tmicos e Mudan?a Ling??stica > (19) 3521-1570 > http://www.tycho.iel.unicamp.br/~pablofaria/<http://www.tycho.iel.unicamp.br/%7Epablofaria/> > pablofa...@gmail.com > > > ------------------------------ > > Message: 5 > Date: Mon, 4 May 2009 10:38:31 -0600 > From: vince spicer <vinces1...@gmail.com> > Subject: Re: [Tutor] Advanced String Search using operators AND, OR > etc.. > To: Alex Feddor <alex.fed...@gmail.com> > Cc: tutor@python.org > Message-ID: > <1e53c510905040938q25d787f3w17f7a18f65bd0...@mail.gmail.com> > Content-Type: text/plain; charset="iso-8859-1" > > Advanced Strings searches are Regex via re module. > > EX: > > import re > > m = re.compile("(FDA.*?(approved|supported)|Ben[^\s])*") > > if m.search(Text): > print m.search(Text).group() > > > Vince > > > On Mon, May 4, 2009 at 6:45 AM, Alex Feddor <alex.fed...@gmail.com> wrote: > > > Hi > > > > I am looking for method enables advanced text string search. Method > > string.find() or re module seems no supporting what I am looking for. > The > > idea is as follows: > > > > Text ="FDA meeting was successful. New drug is approved for whole sale > > distribution!" > > > > I would like to scan the text using AND and OR operators and gets -1 or > > other value if the searching elements haven't found in the text. > > Example 01: > > search criteria: "FDA" AND ( "approve*" OR "supported") > > The catch is that in Text variable FDA and approve words are not one > after > > another (other words are in between). > > Example 02: > > search criteria: "Ben" > > The catch is that code sould find only exact Ben words not also words > which > > that has firts three letters Ben such as Benquick, Benseek etc.. Only Ben > is > > the right word we are looking for. > > > > I would really appreciated your advice - code sample / links how above > can > > be achieved! if possible I would appreciated solution achieved with free > of > > charge module. > > > > Cheers, Alex > > PS: > > A few moths ago I have discovered Python. I am amazed what all can be > done > > with it. Really cool programming language.. > > > > _______________________________________________ > > Tutor maillist - Tutor@python.org > > http://mail.python.org/mailman/listinfo/tutor > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://mail.python.org/pipermail/tutor/attachments/20090504/88993fa6/attachment.htm > > > > ------------------------------ > > _______________________________________________ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > > > End of Tutor Digest, Vol 63, Issue 8 > ************************************ >
_______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor