Please evaluate your email program. Some of your newline s are being lost in the paste into your email.
Matt Varner <matt.l.var...@gmail.com> Wrote in message: > TL:DR - Skip to "My Script: "subtrans.py" > > <beg> > > Optional Links to (perhaps) Helpful Images: > 1. The SRT download button: > http://i70.photobucket.com/albums/i82/RavingNoah/Python%20Help/tutor1_zps080f20f7.png > > 2. A visual comparison of my current problem (see 'Desire Versus > Reality' below): > http://i70.photobucket.com/albums/i82/RavingNoah/Python%20Help/newline_problem_zps307f8cab.jpg > > ============ > The SRT File > ============ > > The SRT file that you can download for every lesson that has a video > contains the caption transcript data and is organized according to > text snippets with some connecting time data. > > ======================================== > Reading the SRT File and Outputting Something Useful > ======================================== > > There may be a hundred different ways to read one of these file types. > The reliable method I chose was to use a handy editor for the purpose > called Aegisub. It will open the SRT file and let me immediately > export a version of it, without the time data (which I don't > need...yet). The result of the export is a plain-text file containing > each string snippet and a newline character. > > ========================== > Dealing with the Text File > ========================== > > One of these text files can be anywhere between 130 to 500 lines or > longer, depending (obviously) on the length of its attendant video. > For my purposes, as a springboard for extending my own notes for each > module, I need to concatenate each string with an acceptable format. > My desire for this is to interject spaces where I need them and kill > all the newline characters so that I get just one big lump of properly > spaced paragraph text. From here, I can divide up the paragraphs how > I see fit and I'm golden... > > ============================== > My first Python script: Issues > ============================== > > I did my due diligence. I have read the tutorial at www.python.org. But did you actually try out and analyze each concept? Difference between read and study. > I went to my local library and have a copy of "Python Programming for > the Absolute Beginner, 3rd Edition by Michael Dawson." I started > collecting what seemed like logical little bits here and there from > examples found using Uncle Google, but none of the examples anywhere > were close enough, contextually, to be automatically picked up by my > dense 'noobiosity.' For instance, when discussing string > methods...almost all operations taught to beginners are done on > strings generated "on the fly," directly inputted into IDLE, but not > on strings that are contained in an external file. When it's in the file, it's not a str. Reading it in produces a string or a list of strings. And once created you can not tell if they came from a file, a literal, or some arbitrary expression. > There are other > examples for file operations, but none of them involved doing string > operations afterward. After many errors about not being able to > directly edit strings in a file object, I finally figured out that > lists are used to read and store strings kept in a file like the one > I'm sourcing from...so I tried using that. Then I spent hours > unsuccessfully trying to call strings using index numbers from the > list object (I guess I'm dense). Anyhow, I put together my little > snippets and have been banging my head against the wall for a couple > of days now. > > After many frustrating attempts, I have NEARLY produced what I'm > looking to achieve in my test file. > > ================ > Example - Source > ================ > > My Test file contains just twelve lines of a much larger (but no more > complex) file that is typical for the SRT subtitle caption file, of > which I expect to have to process a hundred...or hundreds, depending > on how many there are in all of the courses I plan to take > (coincidentally, there is one on Python) > > Line 01: # Exported by Aegisub 3.2.1 > Line 02: [Deep Dive] > Line 03: [CSS Values & Units Numeric and Textual Data Types with > Guil Hernandez] > Line 04: In this video, we'll go over the > Line 05: common numeric and textual values > Line 06: that CSS properties can accept. > Line 07: Let's get started. > Line 08: So, here we have a simple HTML page > Line 09: containing a div and a paragraph > Line 10: element nested inside. > Line 11: It's linked to a style sheet named style.css > Line 12: and this is where we'll be creating our new CSS rules. > > ======================== > My Script: "subtrans.py" > ======================== > > # Open the target file, create file object > f = open('tmp.txt', 'r') > > # Create an output file to write the changed strings to > o = open('result.txt', 'w') > > # Create a list object that holds all the strings in the file object > lns = f.readlines() > > # Close the source file you no longer > # need now that you have > your strings > f.close() > > # Import sys to get at stdout (standard output) - "print" results will > be written to file > import sys > > # Associate stdout with the output file > sys.stdout = o > No, just use o.write directly. Going through print is a waste of yout energy. > # Try to print strings to output file using loopback variable (line) > and the list object > for line in lns: > if ".\n" in line: > a = line.replace('.\n','. ') > print(a.strip('\n')) > else: > b = line.strip('\n') > print(b + " ") > Consider joining all the strings in your list with "".join (lns) And just do one o.write of the result. > # Close your output file > o.close() > > ================= > Desire Versus Reality > ================= > > The source file contains a series of strings with newline characters > directly following whatever the last character in the snippet...with > absolutely no whitespace. This is a problem for me if I want to > concatentate it back together into paragraph text to use as the > jumping off point for my additional notes. I've been literally taking > four hours to type explicitly the dialogue from the videos I've been > watching...and I know this is going to save me a lot of time and get > me interacting with the lessons faster and more efficiently. > However... > > My script succeeds in processing the source file and adding the right > amount of spaces for each line, the rule being "two spaces added > following a period, and one space added following a string with no > period in it (technically, a period/newline pairing (which was the > only way I could figure out not target the period in 'example.css' or > 'version 2.3.2'. > > But, even though it successfully kills these additional newlines that > seem to form in the list-making process They aren't extra, they're in the file. > ...I end up with basically a > non-concatenated file of strings...with the right spaces I need, but > not one big chunk of text, like I expect using the s.strip('\n') > functionality. That's because you're using print () which defaults to a trailing newline. To avoid that there's a keyword parameter to print function which can suppress the newline. Note that you haven't explicitly addressed the file encodings for input or output. > > > -- DaveA _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor