Re: [Tutor] Pythonic way
On Tue, Nov 20, 2018 at 08:22:01PM +, Alan Gauld via Tutor wrote: > I think that's a very deliberate feature of Python going back > to its original purpose of being a teaching language that > can be used beyond the classroom. I don't think that is correct -- everything I've read is that Guido designed Python as a scripting language for use in the "Amoeba" operating system. You might be thinking of Python's major influence, ABC, which was designed as a teaching language -- but not by Guido himself. Guido was heavily influenced by ABC, both in what to do, and what not to do. https://www.artima.com/intv/pythonP.html http://python-history.blogspot.com/2009/02/early-language-design-and-development.html -- Steve ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Pythonic way
On 20/11/2018 18:08, Avi Gross wrote: We have two completely separate ways to format strings that end up with fairly similar functionality. Actually, there is an implicit third way You could argue five ways :-) 1. C printf style formatting https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting 2. New style string formatting https://docs.python.org/3/library/string.html#string-formatting 3. f-strings https://docs.python.org/3/reference/lexical_analysis.html#f-strings 4. String templates https://docs.python.org/3/library/string.html#template-strings 5. String methods https://docs.python.org/3/library/stdtypes.html#string-methods Any advance on five anybody? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Pythonic way
On 20/11/2018 18:08, Avi Gross wrote: > ... So there isn’t really ONE pythonic way for many things. That's true and, I think, inevitable for anything developed in the open source world. If you compare it to a language entirely controlled by a single mind - like Oberon or Eiffel say - then there is much less consistency. But how many people actually use Oberon or Eiffel in the real world these days? > We have two completely separate ways to format strings And many options for concurrency and for running external programs. Much of it is history and the need for backward compatibility. And let's not even think about web and GUI frameworks! > ..you can do much without creating objects or using functional programming > ...If you come from an OO background, you can have fun making endless classes >...If you lie functional programming with factories that churn out functions > ...There are other such paradigms supported including lots of miniature > sub-languages > ...effectively means being open to multiple ways I think that's a very deliberate feature of Python going back to its original purpose of being a teaching language that can be used beyond the classroom. It was always intended to support multi paradigms. After all, every programmer should be aware of multiple paradigms and when to best use each. BUt, Python is currently suffering the same fate as C++ in that, as it becomes more mainstream in real-world industry, the feature demands upon it inevitably move it away from some of those original teaching based ideas. It is certainly a much harder language to learn today than it was when I started in 1998. From a pure academic CS view many changes are good (eg. iterators and meta programming) but from a non-academic beginner(or even high school student) they are just plain confusing. It's all part of being a success in the real world. The funding for development comes from the industrial user community not the high schools or colleges, so their needs come first. PS. Just back from vacation so still catching up on the last week's discussions! -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] Pythonic way
This is not a question or reply. Nor is it short. If not interested, feel free to delete. It is an observation based on recent experiences. We have had quite a few messages that pointed out how some people approach solving a problem using subconscious paradigms inherited from their past. This includes people who never programmed and are thinking of how they might do it manually as well as people who are proficient in one or more other computer languages and their first attempt to visualize the solution may lead along paths that are possibly doable in Python but not optimal or even suggested. I recently had to throw together a routine that would extract info from multiple SAS data files and join them together on one key into a large DataFrame (or data.frame or other such names for a tabular object.). Then I needed to write them out to disk either as a CSV or XLSX file for future use. Since I have studied and used (not to mention abused) many programming languages, my first thought was to do this in R. It has lots of the tools needed to do such things including packages (sort of like modules you can import but not exactly) and I have done many data/graphics programs in it. I then redid it in Python after some thought. The pseudocode outline is: * Read in all the files into a set of data.frame objects. * Trim back the variables/columns of some of them as many are not needed. * Join them together on a common index using a full or outer join. * Save the result on disk as a Comma Separated Values file. * Save the result on disk as a named tab in a new style EXCEL file. I determined some of what I might us such as the needed built-in commands, packages and functions I could use for the early parts but ran into an annoyance as some of the files contained duplicate entries. Luckily, the R function reduce (not the same as map/reduce) is like many things in R and takes a list of items and makes it work. Also, by default, it renames duplicates so if you have ALPHA in multiple places, it names them ALPHA.x and ALPHA.x.x and other variations. df_joined <- reduce(df_list, full_join, by = "COMMON") Mind you, when I stripped away many of the columns not needed in some of the files, there were fewer duplicates and a much smaller output file. But I ran into a wall later. Saving into a CSV is trivial. There are multiple packages meant to be used for saving into a XLSX file but they all failed for me. One wanted something in JAVA and another may want PERL and some may want packages I do not have installed. So, rather than bash my head against the wall, Itwisted and used the best XSLX maker there is. I opened the CSV file in EXCEL manually and did a SAVE AS … Then I went to plan C (no, not the language C or its many extensions like C++) as I am still learning Python and have not used it much. As an exercise, I decided to learn how to do this in Python using tools/modules like numpy and pandas that I have not had much need for as well as additional tools for reading and writing files in other formats. My first attempts gradually worked, after lots of mistakes and looking at manual pages. It followed an eclectic set of paradigms but worked. Not immediately, as I ran into a problem in that the pandas version of a join did not tolerate duplicate column names when used on a list. I could get it to rename the left or right list (adding a suffix a suffix) when used on exactly two DataFrames. So, I needed to take the first df and do a df.join(second, …) then take that and join the third and so on. I also needed to keep telling it to set the index to the common value for each and every df including the newly joined series. And, due to size, I chose to keep deleting df no longer in use but that would not be garbage collected. I then looked again at how to tighten it up in a more pythonic way. In English (my sixth language since we are talking about languages ) I did some things linearly then shifted it to a list method. I used lists of file names and lists of the df made from each file after removing unwanted columns. (NOTE: I use “column” but depending on language and context I mean variable or field or axis or many other ways to say a group of related information in a tabular structure that crosses rows or instances.) So I was able to do my multi-step join more like this: join_with_list = dflist[1:] current = df1 suffix = 1 for df in join_with_list: current = current.join(df, how='outer', rsuffix='_'+str(suffix)) suffix += 1 current.set_index('ID') In this formulation, the intermediate DataFrame objects held in current will silently be garbage collected as nothing points to them, for example. Did I mention these were huge files? The old code was much longer and error prone as I had a df1, df2, … df8 as well as other intermediates and was easy to copy and paste then
Re: [Tutor] how to print lines which contain matching words or strings
Asad, Thank you for the clarification. I am glad that you stated (albeit at the end) that you wanted a better idea of how to do it than the code you display. I stripped out the earlier parts of the discussion for storage considerations but they can be found in the archives if needed. There are several ways to look at your code. One is to discuss it the general way it is. The other is to discuss how it could be, and there are often many people that champion one style or another. I will work with your style but point out the more compact form many favor first. As has been pointed out, people coming from languages like C, may try to write in a similar style even in a language that supports other ays. So if your goal is what you say, then all you need is doable in very few lines of code. The basic idea is iteration. You can use it several times. You have a file. In Python (at least recent versions) the opened file is an iterator. So the outline of your program can look like: for line in open(...): process_line(line, re_list) I snuck in a function called process_line that you need to define or replace by code. I also snuck in a list of regular expressions you would create, perhaps above the loop. I will not give you a tutorial on regular expressions. Suffice it to say they tend to be strings. You do not search for 123 but rather for "123" or str(123) or anything that becomes a single string. Here is one of many ways to learn how to make proper expressions and use them: https://docs.python.org/2/howto/regex.html Since you want to repeatedly use the same expressions for each line, you may want to compile each one and have a list of the compiled versions. If you have a list like this: re_str = [ "ABC", "123", "(and)|(AND)", "[_A-Za-z][_A-Za-z0-9]*" ] you can use a loop such as list comprehension like this: re_comp = [ re.compile(pattern) for pattern in re_str ] So in the function above, or in-line, you can loop over the expressions for each line sort of like this: for pat in re_comp: <> print line and break out. The latter line is not actual Python code but a place you use whatever matching function you want. The variable "pat" holds each compiled pattern one at a time so pat.search(line) or pat.match(line) and so on can be used depending on your need. Since you actually do not care what matches you have lots of leeway. There are many other ways but this one is quite simple and broad and adjust to any number or type of pattern if properly used. Back to your code. No need to use a raw string on a normal filename but harmless. f3 = open(r'file1.txt',r) Why file1 is read into variable f3 remains a harmless mystery. But then I see you using another style by reading the entire file into memory f = f3.readlines() d = [] Nothing wrong with that, although the example above shows how to process one line at a time. So far, you seem to want to make a list of lines that match and not print till later. for linenum in range(len(f)): OK, that is valid Python but far from optimal. Yes, you can loop over indices of the list f using the length. But since such a list of strings is an iterable, you could have done something similar to the method I showed above: for line in f: But going with what you have, you decided to create a series of individual if statements. if re.search("ERR-1" ,f[linenum]) print f[linenum] break if re.search("\d\d\d\d\d\d",f[linenum]) --- > seach for a patch number length of six digits for example 123456 print f[line] break and so on. Ignoring the comment in the code that makes it fail, this is presumably valid but not Pythonic. One consideration is that the if statement can look like this: If (condition1 and (condition2 or condition3)) ... So you could do a list of "or" statements in one if. In pseudocode: If (matches(line, re1) or matches(line, re2) ... or ...) The above, if properly written with N parts will return true as soon as the first condition matches. You can then print or copy for later printing. No break needed. But note each of the pseudo-code matches() must return as pythonic True or be False. The extended form of "if" is another way: If condition1 : Something elif condition2: Something else elif condition3: Have fun else: whatever I note you made an empty list with d = [] But you never used it. My initial guess was that you wanted to add lines to the list. Since you printed instead, is it needed. You asked about using dictionaries. Yes, you can store just about anything in dictionaries and iterate over them in the random order. But a list of strings or compiled regular expressions would work fine for this application. Having said that, you can make a dictionary but what would be the key? The key has to be something immutable and is there any obvious advantage? If you care about efficiency, some final notes. The order of the searches
Re: [Tutor] how to print lines which contain matching words or strings
On 11/19/18 8:15 PM, Asad wrote: > Hi Avi Gross /All, > > Thanks for the reply. Yes you are correct , I would like to to > open a file and process a line at a time from the file and want to select > just lines that meet my criteria and print them while ignoring the rest. i > have created the following code : > > >import re >import os > >f3 = open(r'file1.txt',r) >f = f3.readlines() >d = [] >for linenum in range(len(f)): > if re.search("ERR-1" ,f[linenum]) >print f[linenum] >break > if re.search("\d\d\d\d\d\d",f[linenum]) --- > seach for a patch > number length of six digits for example 123456 >print f[line] >break > if re.search("Good Morning",f[linenum]) >print f[line] >break > if re.search("Breakfast",f[linenum]) >print f[line] >break > ... > further 5 more hetrogeneus if conditions I have > > === > This is beginners approach to print the lines which match the if conditions > . > > How should I make it better may be create a dictionary of search items or a > list and then iterate over the lines in a file to print the lines matching > the condition. We usually suggest using a context manager for file handling, so that cleanup happens automatically when the context is complete: with open('file1.txt', 'r') as f3: # do stuff # when you get here, f3 is closed There's no need to do a counting loop, using the count as an index into an array. That's an idiom from other programing languages; in Python you may as well just loop directly over the list (array)... lists are iterable. for line in f: # search in line Indeed, there's no real need to read all the lines in with readlines, you can just loop directly over the file object - the f3 opened above: for line in f3: # search in line There's no need to use a regular expression search if your pattern is a simple string, you can use the "in" keyword: if "Breakfast" in line: print line Keep your REs for more complex matches. Do those help? ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] how to print lines which contain matching words or strings
Hi Avi Gross /All, Thanks for the reply. Yes you are correct , I would like to to open a file and process a line at a time from the file and want to select just lines that meet my criteria and print them while ignoring the rest. i have created the following code : import re import os f3 = open(r'file1.txt',r) f = f3.readlines() d = [] for linenum in range(len(f)): if re.search("ERR-1" ,f[linenum]) print f[linenum] break if re.search("\d\d\d\d\d\d",f[linenum]) --- > seach for a patch number length of six digits for example 123456 print f[line] break if re.search("Good Morning",f[linenum]) print f[line] break if re.search("Breakfast",f[linenum]) print f[line] break ... further 5 more hetrogeneus if conditions I have === This is beginners approach to print the lines which match the if conditions . How should I make it better may be create a dictionary of search items or a list and then iterate over the lines in a file to print the lines matching the condition. Please advice , Thanks, Previous email : == Asad, As others have already pointed out, your request is far from clear. Ignoring the strange use of words, and trying to get the gist of the request, would this be close to what you wanted to say? You have a file you want to open and process a line at a time. You want to select just lines that meet your criteria and print them while ignoring the rest. So what are the criteria? It sounds like you have a list of criteria that might be called patterns. Your example shows a heterogenous collection: [A ,"B is good" ,123456 , "C "] A is either an error or the name of a variable that contains something. We might want a hint as searching for any old object makes no sense. The second and fourth are exact strings. No special regular expression pattern. Searching for them is trivial using normal string functionality. Assuming they can be anywhere in a line: >>> line1 = "Vitamin B is good for you and so is vitamin C" >>> line2 = "Currently nonsensical." >>> line3 = "" >>> "B is good" in line1 True >>> "B is good" in line2 False >>> "B is good" in line3 False >>> "C" in line1 True >>> "C" in line2 True >>> "C" in line2 True To test everything in a list, you need code like for each line: for whatever in [A ,"B is good" ,123456 , "C "] If whatever in line: print(line) Actually, the above could print multiple copies so you should break out after any one matches. 123456 is a challenge to match. You could search for str(whatever) perhaps. Enough. First explain what you really want. If you want to do a more general search using regular expressions, then the list of things to search for would be all the string in RE format. You could search multiple times or use the OR operator carefully inside one regular expression. You have not stated any need to tell what was matched or where it is the line so that would be yet another story. -Original Message- From: Tutor On Behalf Of Asad Sent: Sunday, November 18, 2018 10:19 AM To: tutor@python.org Subject: [Tutor] how to print lines which contain matching words or strings Hi All , I have a set of words and strings : like : p = [A ,"B is good" ,123456 , "C "] I have a file in which I need to print only the lines which matches the pattern in p thanks, On Tue, Nov 20, 2018 at 6:12 AM wrote: > Send Tutor mailing list submissions to > tutor@python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/tutor > or, via email, send a message with subject or body 'help' to > tutor-requ...@python.org > > You can reach the person managing the list at > tutor-ow...@python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Tutor digest..." > Today's Topics: > >1. Re: seeking beginners tutorial for async (Mats Wichmann) >2. Re: seeking beginners tutorial for async (Bob Gailer) >3. Re: how to print lines which contain matching words or > strings (Avi Gross) >4. [Python 3] Threads status, join() and Semaphore queue > (Dimitar Ivanov) > > > > -- Forwarded message -- > From: Mats Wichmann > To: tutor@python.org > Cc: > Bcc: > Date: Mon, 19 Nov 2018 10:05:35 -0700 > Subject: Re: [Tutor] seeking beginners tutorial for async > On 11/18/18 4:50 PM, bob gailer wrote: > > I have yet to find a tutorial that helps me understand and apply async! > > > > The ones I have found are either incomplete, or they wrap some other > > service, or they are immediately so complex that I have no hope of > > understanding them. > > > > I did find a useful javascript tutorial at > >