Re: [Tutor] Pythonic way

2018-11-20 Thread Steven D'Aprano
On Tue, Nov 20, 2018 at 08:22:01PM +, Alan Gauld via Tutor wrote:

> I think that's a very deliberate feature of Python going back
> to its original purpose of being a teaching language that
> can be used beyond the classroom.

I don't think that is correct -- everything I've read is that Guido 
designed Python as a scripting language for use in the "Amoeba" 
operating system.

You might be thinking of Python's major influence, ABC, which was 
designed as a teaching language -- but not by Guido himself. Guido was 
heavily influenced by ABC, both in what to do, and what not to do.

https://www.artima.com/intv/pythonP.html

http://python-history.blogspot.com/2009/02/early-language-design-and-development.html



-- 
Steve
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Pythonic way

2018-11-20 Thread Mark Lawrence

On 20/11/2018 18:08, Avi Gross wrote:



We have two completely separate ways to format strings that end up with fairly 
similar functionality. Actually, there is an implicit third way 



You could argue five ways :-)

1. C printf style formatting 
https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting
2. New style string formatting 
https://docs.python.org/3/library/string.html#string-formatting
3. f-strings 
https://docs.python.org/3/reference/lexical_analysis.html#f-strings
4. String templates 
https://docs.python.org/3/library/string.html#template-strings
5. String methods 
https://docs.python.org/3/library/stdtypes.html#string-methods


Any advance on five anybody?

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Pythonic way

2018-11-20 Thread Alan Gauld via Tutor
On 20/11/2018 18:08, Avi Gross wrote:
> ... So there isn’t really ONE pythonic way for many things. 

That's true and, I think, inevitable for anything developed
in the open source world. If you compare it to a language
entirely controlled by a single mind - like Oberon or Eiffel
say - then there is much less consistency. But how many people
actually use Oberon or Eiffel in the real world these days?

> We have two completely separate ways to format strings

And many options for concurrency and for running external
programs. Much of it is history and the need for backward
compatibility.

And let's not even think about web and GUI frameworks!

> ..you can do much without creating objects or using functional programming
> ...If you come from an OO background, you can have fun making endless classes
>...If you lie functional programming with factories that churn out functions
> ...There are other such paradigms supported including lots of miniature 
> sub-languages

> ...effectively means being open to multiple ways 

I think that's a very deliberate feature of Python going back
to its original purpose of being a teaching language that
can be used beyond the classroom. It was always intended
to support multi paradigms. After all, every programmer
should be aware of multiple paradigms and when to best
use each.

BUt, Python is currently suffering the same fate as C++ in
that, as it becomes more mainstream in real-world industry,
the feature demands upon it inevitably move it away from some
of those original teaching based ideas. It is certainly
a much harder language to learn today than it was when
I started in 1998. From a pure academic CS view many changes
are good (eg. iterators and meta programming) but from a
non-academic beginner(or even high school student) they are
just plain confusing. It's all part of being a success in
the real world. The funding for development comes from the
industrial user community not the high schools or colleges,
so their needs come first.

PS. Just back from vacation so still catching up on the
last week's discussions!

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Pythonic way

2018-11-20 Thread Avi Gross
This is not a question or reply. Nor is it short. If not interested, feel free 
to delete.

 

It is an observation based on recent experiences.

 

We have had quite a few messages that pointed out how some people approach  
solving a problem using subconscious paradigms inherited from their past. This 
includes people who never programmed and are thinking of how they might do it 
manually as well as people who are proficient in one or more other computer 
languages and their first attempt to visualize the solution may lead along 
paths that are possibly doable in Python but not optimal or even suggested.

 

I recently had to throw together a routine that would extract info from 
multiple SAS data files and join them together on one key into a large 
DataFrame (or data.frame or other such names for a tabular object.). Then I 
needed to write them out to disk either as a CSV or XLSX file for future use.

 

Since I have studied and used (not to mention abused) many programming 
languages, my first thought was to do this in R. It has lots of the tools 
needed to do such things including packages (sort of like modules you can 
import but not exactly) and I have done many data/graphics programs in it. I 
then redid it in Python after some thought.

 

The pseudocode outline is:

 

*   Read in all the files into a set of data.frame objects.
*   Trim back the variables/columns of some of them as many are not needed.
*   Join them together on a common index using a full or outer join.
*   Save the result on disk as a Comma Separated Values file.
*   Save the result on disk as a named tab in a new style EXCEL file.

 

I determined some of what I might us such as the needed built-in commands, 
packages and functions I could use for the early parts but ran into an 
annoyance as some of the files contained duplicate entries. Luckily, the R 
function reduce (not the same as map/reduce) is like many things in R and takes 
a list of items and makes it work. Also, by default, it renames duplicates so 
if you have ALPHA in multiple places, it names them ALPHA.x and ALPHA.x.x and 
other variations.

 

df_joined <- reduce(df_list, full_join, by = "COMMON")

 

Mind you, when I stripped away many of the columns not needed in some of the 
files, there were fewer duplicates and a much smaller output file.

 

But I ran into a wall later. Saving into a CSV is trivial. There are multiple 
packages meant to be used for saving into a XLSX file but they all failed for 
me. One wanted something in JAVA and another may want PERL and some may want 
packages I do not have installed. So, rather than bash my head against the 
wall, Itwisted and used the best XSLX maker there is. I opened the CSV file in 
EXCEL manually and did a SAVE AS …

 

Then I went to plan C (no, not the language C or its many extensions like C++) 
as I am still learning Python and have not used it much. As an exercise, I 
decided to learn how to do this in Python using tools/modules like numpy and 
pandas that I have not had much need for as well as additional tools for 
reading and writing files in other formats.

 

My first attempts gradually worked, after lots of mistakes and looking at 
manual pages. It followed an eclectic set of paradigms but worked.  Not 
immediately, as I ran into a problem in that the pandas version of a join did 
not tolerate duplicate column names when used on a list. I could get it to 
rename the left or right list (adding a suffix a suffix) when used on exactly 
two DataFrames. So, I needed to take the first df and do a df.join(second, …) 
then take that and join the third and so on. I also needed to keep telling it 
to set the index to the common value for each and every df including the newly 
joined series. And, due to size, I chose to keep deleting df no longer in use 
but that would not be garbage collected.

 

I then looked again at how to tighten it up in a more pythonic way. In English 
(my sixth language since we are talking about languages  ) I did some things 
linearly then shifted it to a list method. I used lists of file names and lists 
of the df made from each file after removing unwanted columns. (NOTE: I use 
“column” but depending on language and context I mean variable or field or axis 
or many other ways to say a group of related information in a tabular structure 
that crosses rows or instances.)

 

So I was able to do my multi-step join more like this:

 

join_with_list = dflist[1:]

current = df1

suffix = 1

 

for df in join_with_list:

current = current.join(df, how='outer', rsuffix='_'+str(suffix))

suffix += 1

current.set_index('ID')

 

In this formulation, the intermediate DataFrame objects held in current will 
silently be garbage collected as nothing points to them, for example. Did I 
mention these were huge files?

 

The old code was much longer and error prone as I had a df1, df2, … df8 as well 
as other intermediates and was easy to copy and paste then 

Re: [Tutor] how to print lines which contain matching words or strings

2018-11-20 Thread Avi Gross
Asad,

Thank you for the clarification. I am glad that you stated (albeit at the
end) that you wanted a better idea of how to do it than the code you
display. I stripped out the earlier parts of the discussion for storage
considerations but they can be found in the archives if needed.

There are several ways to look at your code.

One is to discuss it the general way it is. 

The other is to discuss how it could be, and there are often many people
that champion one style or another.

I will work with your style but point out the more compact form many favor
first. As has been pointed out, people coming from languages like C, may try
to write in a similar style even in a language that supports other ays.

So if your goal is what you say, then all you need is doable in very few
lines of code.

The basic idea is iteration. You can use it several times.

You have a file. In Python (at least recent versions) the opened file is an
iterator. So the outline of your program can look like:

for line in open(...):
process_line(line, re_list)

I snuck in a function called process_line that you need to define or replace
by code. I also snuck in a list of regular expressions you would create,
perhaps above the loop.

I will not give you a tutorial on regular expressions. Suffice it to say
they tend to be strings. You do not search for 123 but rather for "123" or
str(123) or anything that becomes a single string.

Here is one of many ways to learn how to make proper expressions and use
them:

https://docs.python.org/2/howto/regex.html

Since you want to repeatedly use the same expressions for each line, you may
want to compile each one and have a list of the compiled versions. 

If you have a list like this:

re_str = [ "ABC", "123", "(and)|(AND)", "[_A-Za-z][_A-Za-z0-9]*" ]

you can use a loop such as list comprehension like this:

re_comp = [ re.compile(pattern) for pattern in re_str ]

So in the function above, or in-line, you can loop over the expressions for
each line sort of like this:

for pat in re_comp:
<> print line  and break out.

The latter line is not actual Python code but a place you use whatever
matching function you want. The variable "pat" holds each compiled pattern
one at a time so pat.search(line) or pat.match(line) and so on can be used
depending on your need. Since you actually do not care what matches you have
lots of leeway.

There are many other ways but this one is quite simple and broad and adjust
to any number or type of pattern if properly used.

Back to your code. No need to use a raw string on a normal filename but
harmless.

f3 = open(r'file1.txt',r)

Why file1 is read into variable f3 remains a harmless mystery.

But then I see you using another style by reading the entire file into
memory

f = f3.readlines()
d = []

Nothing wrong with that, although the example above shows how to process one
line at a time. So far, you seem to want to make a list of lines that match
and not print till later.

for linenum in range(len(f)):

OK, that is valid Python but far from optimal. Yes, you can loop over
indices of the list f using the length. But since such a list of strings is
an iterable, you could have done something similar to the method I showed
above:

for line in f:

But going with what you have, you decided to create a series of individual
if statements.

if re.search("ERR-1" ,f[linenum])
   print f[linenum]
   break

if re.search("\d\d\d\d\d\d",f[linenum])   --- > seach for a patch
number length of six digits for example 123456
   print f[line]
   break

and so on.

Ignoring the comment in the code that makes it fail, this is presumably
valid but not Pythonic.

One consideration is that the if statement can look like this:

If (condition1 and (condition2 or condition3)) ...

So you could do a list of "or" statements in one if.

In pseudocode:

If (matches(line, re1) or matches(line, re2) ... or ...)

The above, if properly written with N parts will return true as soon as the
first condition matches. You can then print or copy for later printing. No
break needed. But note each of the pseudo-code matches() must return as
pythonic True or be False.

The extended form of "if" is another way:

If condition1 :
Something
elif condition2:
Something else
elif condition3:
Have fun
else:
whatever


I note you made an empty list with d = []
But you never used it. My initial guess was that you wanted to add lines to
the list. Since you printed instead, is it needed.

You asked about using dictionaries. Yes, you can store just about anything
in dictionaries and iterate over them in the random order. But a list of
strings or compiled regular expressions would work fine for this
application. Having said that, you can make a dictionary but what would be
the key? The key has to be something immutable and is there any obvious
advantage?

If you care about efficiency, some final notes.

The order of the searches 

Re: [Tutor] how to print lines which contain matching words or strings

2018-11-20 Thread Mats Wichmann
On 11/19/18 8:15 PM, Asad wrote:
> Hi Avi Gross /All,
> 
>  Thanks for the reply. Yes you are correct , I would like to to
> open a file and process a line at a time from the file and want to select
> just lines that meet my criteria and print them while ignoring the rest. i
> have created the following code :
> 
> 
>import re
>import os
> 
>f3 = open(r'file1.txt',r)
>f = f3.readlines()
>d = []
>for linenum in range(len(f)):
> if re.search("ERR-1" ,f[linenum])
>print f[linenum]
>break
> if re.search("\d\d\d\d\d\d",f[linenum])   --- > seach for a patch
> number length of six digits for example 123456
>print f[line]
>break
> if re.search("Good Morning",f[linenum])
>print f[line]
>break
> if re.search("Breakfast",f[linenum])
>print f[line]
>break
> ...
> further 5 more hetrogeneus if conditions I have
> 
> ===
> This is beginners approach to print the lines which match the if conditions
> .
> 
> How should I make it better may be create a dictionary of search items or a
> list and then iterate over the lines in a file to print the lines matching
> the condition.

We usually suggest using a context manager for file handling, so that
cleanup happens automatically when the context is complete:

with open('file1.txt', 'r') as f3:
# do stuff
# when you get here, f3 is closed

There's no need to do a counting loop, using the count as an index into
an array. That's an idiom from other programing languages; in Python you
may as well just loop directly over the list (array)... lists are iterable.

for line in f:
   # search in line

Indeed, there's no real need to read all the lines in with readlines,
you can just loop directly over the file object - the f3 opened above:

for line in f3:
# search in line

There's no need to use a regular expression search if your pattern is a
simple string, you can use the "in" keyword:

if "Breakfast" in line:
print line

Keep your REs for more complex matches.

Do those help?

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] how to print lines which contain matching words or strings

2018-11-20 Thread Asad
Hi Avi Gross /All,

 Thanks for the reply. Yes you are correct , I would like to to
open a file and process a line at a time from the file and want to select
just lines that meet my criteria and print them while ignoring the rest. i
have created the following code :


   import re
   import os

   f3 = open(r'file1.txt',r)
   f = f3.readlines()
   d = []
   for linenum in range(len(f)):
if re.search("ERR-1" ,f[linenum])
   print f[linenum]
   break
if re.search("\d\d\d\d\d\d",f[linenum])   --- > seach for a patch
number length of six digits for example 123456
   print f[line]
   break
if re.search("Good Morning",f[linenum])
   print f[line]
   break
if re.search("Breakfast",f[linenum])
   print f[line]
   break
...
further 5 more hetrogeneus if conditions I have

===
This is beginners approach to print the lines which match the if conditions
.

How should I make it better may be create a dictionary of search items or a
list and then iterate over the lines in a file to print the lines matching
the condition.


Please advice ,

Thanks,

Previous email :
==

Asad,

As others have already pointed out, your request is far from clear.

Ignoring the strange use of words, and trying to get the gist of the
request, would this be close to what you wanted to say?

You have a file you want to open and process a line at a time. You want to
select just lines that meet your criteria and print them while ignoring the
rest.

So what are the criteria? It sounds like you have a list of criteria that
might be called patterns. Your example shows a heterogenous collection:

[A ,"B is good" ,123456 , "C "]

A is either an error or the name of a variable that contains something. We
might want a hint as searching for any old object makes no sense.

The second and fourth are exact strings. No special regular expression
pattern. Searching for them is trivial using normal string functionality.
Assuming they can be anywhere in a line:

>>> line1 = "Vitamin B is good for you and so is vitamin C"
>>> line2 = "Currently nonsensical."
>>> line3 = ""
>>> "B is good" in line1
True
>>> "B is good" in line2
False
>>> "B is good" in line3
False
>>> "C" in line1
True
>>> "C" in line2
True
>>> "C" in line2
True

To test everything in a list, you need code like for each line:

for whatever in [A ,"B is good" ,123456 , "C "]
If whatever in line: print(line)

Actually, the above could print multiple copies so you should break out
after any one matches.

123456 is a challenge to match. You could search for str(whatever) perhaps.

Enough. First explain what you really want.

If you want to do a more general search using regular expressions, then the
list of things to search for would be all the string in RE format. You could
search multiple times or use the OR operator carefully inside one regular
expression. You have not stated any need to tell what was matched or where
it is the line so that would be yet another story.

-Original Message-
From: Tutor  On Behalf Of
Asad
Sent: Sunday, November 18, 2018 10:19 AM
To: tutor@python.org
Subject: [Tutor] how to print lines which contain matching words or strings

Hi All ,

   I have a set of words and strings :

like :

p = [A ,"B is good" ,123456 , "C "]

I have a file in which I need to print only the lines which matches the
pattern in p

thanks,


On Tue, Nov 20, 2018 at 6:12 AM  wrote:

> Send Tutor mailing list submissions to
> tutor@python.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://mail.python.org/mailman/listinfo/tutor
> or, via email, send a message with subject or body 'help' to
> tutor-requ...@python.org
>
> You can reach the person managing the list at
> tutor-ow...@python.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Tutor digest..."
> Today's Topics:
>
>1. Re: seeking beginners tutorial for async (Mats Wichmann)
>2. Re: seeking beginners tutorial for async (Bob Gailer)
>3. Re: how to print lines which contain matching words or
>   strings (Avi Gross)
>4. [Python 3] Threads status, join() and Semaphore queue
>   (Dimitar Ivanov)
>
>
>
> -- Forwarded message --
> From: Mats Wichmann 
> To: tutor@python.org
> Cc:
> Bcc:
> Date: Mon, 19 Nov 2018 10:05:35 -0700
> Subject: Re: [Tutor] seeking beginners tutorial for async
> On 11/18/18 4:50 PM, bob gailer wrote:
> > I have yet to find a tutorial that helps me understand and apply async!
> >
> > The ones I have found are either incomplete, or they wrap some other
> > service, or they are immediately so complex that I have no hope of
> > understanding them.
> >
> > I did find a useful javascript tutorial at
> >