On 10/10/2012 09:51, Joon Ki Choi wrote:
Hello Pythonistas,
i have a very large textfile with contents like:
@INBOOK{Ackermann1999-b,
author = {Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and
Ackermann,
K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F.
and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and
Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and
Ackermann,
K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F.
and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and
Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and
Ackermann,
K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F.
and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and
Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and
Ackermann},
year = {1980},
timestamp = {1995-12-02}
}
And i want to delete the duplicate rows except these rows containing the
brackets { or }.
The result should look like:
@INBOOK{Ackermann1999-b,
author = {Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and
Ackermann,
Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and
Ackermann},
year = {1980},
timestamp = {1995-12-02}
}
I come across with this Python-Skript:
lines_seen = set() # holds lines already seen
outfile = open("literatur_clean.txt", "w")
Slight aside, you could use this so there's no need to explicitly close
the file.
with open("literatur_dupl.txt", "r") as infile
for line in infile:
if line not in lines_seen: # not a duplicate
outfile.write(line)
lines_seen.add(line)
Something like:-
if "{" in line or "}" in line or line not in lines_seen:
outfile.close()
But it deletes also the lines with a closing bracket } and the lines with the
same authordata.
Therefor i need the condition of the brackets.
Could someone point me out to adding this condition?
Thanks in advance,
Joon
--
Cheers.
Mark Lawrence.
--
http://mail.python.org/mailman/listinfo/python-list