Flynn, Stephen (L & P - IT) wrote: > Tutors, > > Whilst having a play around with reading in textfiles and reformatting > them I tried to write a python 3.2 script to read a CSV file, looking for > any records which were short (indicating that the data may well contain an > embedded CR/LF. I've attached a small sample file with a "split record" at > line 3, and my code. > > Call the code with > > Python pipesmoker.py MyFile.txt , > > (first parameter is the file being read, second parameter is the field > separator... a comma in this case) > > I can read the file in, I can determine that I'm looking for records which > have 13 fields and I can find a record which is too short (line 3). > > What I can't do is read the successive line to a short line in order to > append it onto the end of short line before writing the entire amended > line out. I'm still thinking about how to persuade the fileinput module to > leap over the successor line so it doesn't get processed again. > > When I run the code as it stands, I get a traceback as I'm obviously not > using fileinput.FileInput.readline() correctly. > > value of file is C:\myfile.txt > value of the delimiter is , > I'm looking for 13 , in each currentLine... > "1","0000000688 ","ABCD","930020854","34","0","1"," ","930020854 > "," ","0","0","0","0" > > "2","0000000688 ","ABCD","930020854","99","0","1"," ","930020854 "," > ","0","0","0","0" > > short line found at line 3 > Traceback (most recent call last): > File "C:\Documents and > Settings\flynns\workspace\PipeSmoker\src\pipesmoker\pipesmoker.py", line > 35, in <module> > nextLine = fileinput.FileInput.readline(args.file) > File "C:\Python32\lib\fileinput.py", line 301, in readline > line = self._buffer[self._bufindex] > AttributeError: 'str' object has no attribute '_buffer' > > > Can someone explain to me how I am supposed to make use of readline() to > grab the next line of a text file please? It may be that I should be using > some other module, but chose fileinput as I was hoping to make the little > routine as generic as possible; able to spot short lines in tab separated, > comma separated, pipe separated, ^~~^ separated and anything else which my > clients feel like sending me.
As you already learned the csv module is the best tool to address your problem. However, I'd like to show a generic way to get an extra item in a for-loop. Instead of iterating over the "iterable" (a list or a FileInput object or whatever) you first convert it into an iterator explicitly with the iter() built-in function and keep the reference around: iterable = ... it = iter(iterable) Then inside the for-loop you get an extra item with the next() function: for item in it: if some_condition(): extra = next(it) next() also allows you to provide a default value; without it you may get a StopIteration exception when you apply it on an exhausted iterator. Here's a self-contained example: >>> items = "alpha- beta gamma- delta- epsilon zeta".split() >>> it = iter(items) >>> for item in it: ... while item.endswith("-"): ... item += next(it) ... print item ... alpha-beta gamma-delta-epsilon zeta _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor