Re: Suggestions for how to approach this problem?

2007-05-10 Thread John Salerno
James Stroud wrote: I included code in my previous post that will parse the entire bib, making use of the numbering and eliminating the most probable, but still fairly rare, potential ambiguity. You might want to check out that code, as my testing it showed that it worked with your

Re: Suggestions for how to approach this problem?

2007-05-10 Thread John Salerno
James Stroud wrote: import re records = [] record = None counter = 1 regex = re.compile(r'^(\d+)\. (.*)') for aline in lines: m = regex.search(aline) if m is not None: recnum, aline = m.groups() if int(recnum) == counter: if record is not None:

Re: Suggestions for how to approach this problem?

2007-05-09 Thread John Salerno
Necmettin Begiter wrote: Is this how the text looks like: 123 some information 124 some other information 126(tab here)something else If this is the case (the numbers are at the beginning, and after the numbers there is either a newline or a tab, the logic might be this simple:

Re: Suggestions for how to approach this problem?

2007-05-09 Thread John Salerno
Dave Hansen wrote: Questions: 1) Do the citation numbers always begin in column 1? Yes, that's one consistency at least. :) 2) Are the citation numbers always followed by a period and then at least one whitespace character? Yes, it seems to be either one or two whitespaces. find the

Re: Suggestions for how to approach this problem?

2007-05-09 Thread John Salerno
James Stroud wrote: If you can count on the person not skipping any numbers in the citations, you can take an AI approach to hopefully weed out the rare circumstance that a number followed by a period starts a line in the middle of the citation. I don't think any numbers are skipped, but

Re: Suggestions for how to approach this problem?

2007-05-09 Thread John Salerno
John Salerno wrote: So I need to remove the line breaks too, but of course not *all* of them because each reference still needs a line break between it. After doing a bit of search and replace for tabs with my text editor, I think I've narrowed down the problem to just this: I need to remove

Re: Suggestions for how to approach this problem?

2007-05-09 Thread James Stroud
John Salerno wrote: John Salerno wrote: So I need to remove the line breaks too, but of course not *all* of them because each reference still needs a line break between it. After doing a bit of search and replace for tabs with my text editor, I think I've narrowed down the problem to

Re: Suggestions for how to approach this problem?

2007-05-08 Thread John Salerno
John Salerno wrote: typed, there are often line breaks at the end of each line Also, there are sometimes tabs used to indent the subsequent lines of citation, but I assume with that I can just replace the tab with a space. -- http://mail.python.org/mailman/listinfo/python-list

Re: Suggestions for how to approach this problem?

2007-05-08 Thread Marc 'BlackJack' Rintsch
In [EMAIL PROTECTED], John Salerno wrote: I have a large list of publication citations that are numbered. The numbers are simply typed in with the rest of the text. What I want to do is remove the numbers and then put bullets instead. Now, this alone would be easy enough, with a little

Re: Suggestions for how to approach this problem?

2007-05-08 Thread John Salerno
Marc 'BlackJack' Rintsch wrote: I think I have vague idea how the input looks like, but it would be helpful if you show some example input and wanted output. Good idea. Here's what it looks like now: 1. Levy, S.B. (1964) Isologous interference with ultraviolet and X-ray irradiated

Re: Suggestions for how to approach this problem?

2007-05-08 Thread Necmettin Begiter
On Tuesday 08 May 2007 22:23:31 John Salerno wrote: John Salerno wrote: typed, there are often line breaks at the end of each line Also, there are sometimes tabs used to indent the subsequent lines of citation, but I assume with that I can just replace the tab with a space. Is this how the

Re: Suggestions for how to approach this problem?

2007-05-08 Thread Dave Hansen
On May 8, 3:00 pm, John Salerno [EMAIL PROTECTED] wrote: Marc 'BlackJack' Rintsch wrote: I think I have vague idea how the input looks like, but it would be helpful if you show some example input and wanted output. Good idea. Here's what it looks like now: 1. Levy, S.B. (1964) Isologous

Re: Suggestions for how to approach this problem?

2007-05-08 Thread James Stroud
John Salerno wrote: Marc 'BlackJack' Rintsch wrote: Here's what it looks like now: 1. Levy, S.B. (1964) Isologous interference with ultraviolet and X-ray irradiated bacteriophage T2. J. Bacteriol. 87:1330-1338. 2. Levy, S.B. and T. Watanabe (1966) Mepacrine and transfer of R