Hi tutors, I am working on a file and need to replace each occurrence of a certain label (part of speech tag in this case) by a number of sub-labels. The file has the following format:
word1 \t Tag1 word2 \t Tag2 word3 \t Tag3 Now the tags are complex and I wanted to split them in a tab-delimited fashion to have this: word1 \t Tag1Part1 \t Tag2Part2 \t Tag3Part3 I searched online for some solution and found the code below which uses a dictionary to store the tags that I want to replace in keys and the sub-tags as values. The problem with this is that it sometimes replaces tags that are not surrounded by spaces, which I do not like to happen. Also, I wanted each new sub-tag to be followed by a tab, so that the new items that I end up having in my file are tab-delimited. For this, I put tabs between the items of each key in the dictionary. I started thinking that this will not be the best solution of the problem and perhaps a script that uses regular expressions would be better. Since I am new to Python, I thought I should ask you for your thoughts for a best solution. The items I want to replace are about 150 and I did not know how to iterate over them with regular expressions. Below is my previous code: #!usr/bin/python import re, sys f = file(sys.argv[1]) readed= f.read() def replace_words(text, word_dic): for k, v in word_dic.iteritems(): text = text.replace(k, v) return text # the dictionary has target_word:replacement_word pairs word_dic = { 'abbrev': 'abbrev null null', 'adj': 'adj null null', 'adv': 'adv null null', 'case_def_acc': 'case_def acc null', 'case_def_gen': 'case_def gen null', 'case_def_nom': 'case_def nom null', 'case_indef_acc': 'case_indef acc null', 'verb_part': 'verb_part null null'} # call the function and get the changed text myString = replace_words(readed, word_dic) fout = open(sys.argv[2], "w") fout.write(myString) fout.close() --dan
_______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor