Re: convert script awk in python
"Avi Gross" writes: > Just to be clear, Cameron, I retired very early and thus have had no reason > to use AWK in a work situation and for a while was not using UNIX-based > machines. I have no doubt I would have continued using WK as one part of my > toolkit for years albeit less often as I found other tools better for some > situations, let alone the kind I mentioned earlier that are not text-file > based such as databases. > > It is, as noted, a great tool and if you only had one or a few tools like it > available, it can easily be bent and twisted to do much of what the others > do as it is more programmable than most. But following that line of > reasoning, fairly simple python scripts can be written with python -c "..." > or by pointing to a script > > Anyone have a collection of shell scripts that can be used in pipelines > where each piece is just a call to python to do something simple? I'm not doing that, but I am trying to replace a longish bash pipeline with Python code. Within Emacs, often I use Org mode[1] to generate date via some bash commands and then visualise the data via Python. Thus, in a single Org file I run /usr/bin/sacct -u $user -o jobid -X -S $start -E $end -s COMPLETED -n | \ xargs -I {} seff {} | grep 'Efficiency' | sed '$!N;s/\n/ /' | awk '{print $3 " " $9}' | sed 's/%//g' The raw numbers are formatted by Org into a table | cpu_eff | mem_eff | |-+-| |96.6 | 99.11 | | 93.43 | 100.0 | |91.3 | 100.0 | | 88.71 | 100.0 | | 89.79 | 100.0 | | 84.59 | 100.0 | | 83.42 | 100.0 | | 86.09 | 100.0 | | 92.31 | 100.0 | | 90.05 | 100.0 | | 81.98 | 100.0 | | 90.76 | 100.0 | | 75.36 | 64.03 | I then read this into some Python code in the Org file and do something like df = pd.DataFrame(eff_tab[1:], columns=eff_tab[0]) cpu_data = df.loc[: , "cpu_eff"] mem_data = df.loc[: , "mem_eff"] ... n, bins, patches = axis[0].hist(cpu_data, bins=range(0, 110, 5)) n, bins, patches = axis[1].hist(mem_data, bins=range(0, 110, 5)) which generates nice histograms. I decided rewrite the whole thing as a stand-alone Python program so that I can run it as a cron job. However, as a novice Python programmer I am finding translating the bash part slightly clunky. I am in the middle of doing this and started with the following: sacct = subprocess.Popen(["/usr/bin/sacct", "-u", user, "-S", period[0], "-E", period[1], "-o", "jobid", "-X", "-s", "COMPLETED", "-n"], stdout=subprocess.PIPE, ) jobids = [] for line in sacct.stdout: jobid = str(line.strip(), 'UTF-8') jobids.append(jobid) for jobid in jobids: seff = subprocess.Popen(["/usr/bin/seff", jobid], stdin=sacct.stdout, stdout=subprocess.PIPE, ) seff_output = [] for line in seff.stdout: seff_output.append(str(line.strip(), "UTF-8")) ... but compared the to the bash pipeline, this all seems a bit laboured. Does any one have a better approach? Cheers, Loris > -Original Message- > From: Cameron Simpson > Sent: Wednesday, March 24, 2021 6:34 PM > To: Avi Gross > Cc: python-list@python.org > Subject: Re: convert script awk in python > > On 24Mar2021 12:00, Avi Gross wrote: >>But I wonder how much languages like AWK are still used to make new >>programs as compared to a time they were really useful. > > You mentioned in an adjacent post that you've not used AWK since 2000. > By contrast, I still use it regularly. > > It's great for proof of concept at the command line or in small scripts, and > as the innards of quite useful scripts. I've a trite "colsum" script which > does nothing but generate and run a little awk programme to sum a column, > and routinely type "blah | colsum 2" or the like to get a tally. > > I totally agree that once you're processing a lot of data from places
Re: convert script awk in python
Am 25.03.21 um 00:30 schrieb Avi Gross: It [awk] is, as noted, a great tool and if you only had one or a few tools like it available, it can easily be bent and twisted to do much of what the others do as it is more programmable than most. But following that line of reasoning, fairly simple python scripts can be written with python -c "..." or by pointing to a script The thing with awk is that lots of useful text processing is directly built into the main syntax; whereas in Python, you can certainly do it as well, but it requires to load a library. The simple column summation mentioned before by Cameron would be awk ' {sum += $2 } END {print sum}' which can be easily typed into a command line, with the benefit that it skips every line where the 2nd col is not a valid number. This is important because often there are empty lines, often there is an empty line at the end, some ascii headers whatever. The closest equivalent I can come up with in Python is this: == import sys s=0 for line in sys.stdin: try: s += float(line.split()[1]) except: pass print(s) === I don't want to cram this into a python -c " " line, if it even is possible; how do you handle indentation levels and loops?? Of course, for big fancy programs Python is a much better choice than awk, no questions asked - but awk has a place for little things which fit the special programming model, and there are surprisingly many applications where this is just the easiest and fastest way to do the job. It's like regexes - a few simple characters can do the job which otherwise requires a bulky program, but once the parsing gets to certain complexity, a true parsing language, or even just handcoded Python is much more maintainable. Christian PS: Exercise - handle lines commented out with a '#', i.e. skip those. In awk: gawk '!/^\s*#/ {sum += $2 } END {print sum}' -- https://mail.python.org/mailman/listinfo/python-list
Re: convert script awk in python
On 25/03/2021 08:14, Loris Bennett wrote: I'm not doing that, but I am trying to replace a longish bash pipeline with Python code. Within Emacs, often I use Org mode[1] to generate date via some bash commands and then visualise the data via Python. Thus, in a single Org file I run /usr/bin/sacct -u $user -o jobid -X -S $start -E $end -s COMPLETED -n | \ xargs -I {} seff {} | grep 'Efficiency' | sed '$!N;s/\n/ /' | awk '{print $3 " " $9}' | sed 's/%//g' The raw numbers are formatted by Org into a table | cpu_eff | mem_eff | |-+-| |96.6 | 99.11 | | 93.43 | 100.0 | |91.3 | 100.0 | | 88.71 | 100.0 | | 89.79 | 100.0 | | 84.59 | 100.0 | | 83.42 | 100.0 | | 86.09 | 100.0 | | 92.31 | 100.0 | | 90.05 | 100.0 | | 81.98 | 100.0 | | 90.76 | 100.0 | | 75.36 | 64.03 | I then read this into some Python code in the Org file and do something like df = pd.DataFrame(eff_tab[1:], columns=eff_tab[0]) cpu_data = df.loc[: , "cpu_eff"] mem_data = df.loc[: , "mem_eff"] ... n, bins, patches = axis[0].hist(cpu_data, bins=range(0, 110, 5)) n, bins, patches = axis[1].hist(mem_data, bins=range(0, 110, 5)) which generates nice histograms. I decided rewrite the whole thing as a stand-alone Python program so that I can run it as a cron job. However, as a novice Python programmer I am finding translating the bash part slightly clunky. I am in the middle of doing this and started with the following: sacct = subprocess.Popen(["/usr/bin/sacct", "-u", user, "-S", period[0], "-E", period[1], "-o", "jobid", "-X", "-s", "COMPLETED", "-n"], stdout=subprocess.PIPE, ) jobids = [] for line in sacct.stdout: jobid = str(line.strip(), 'UTF-8') jobids.append(jobid) for jobid in jobids: seff = subprocess.Popen(["/usr/bin/seff", jobid], stdin=sacct.stdout, stdout=subprocess.PIPE, ) The statement above looks odd. If seff can read the jobids from stdin there should be no need to pass them individually, like: sacct = ... seff = Popen( ["/usr/bin/seff"], stdin=sacct.stdout, stdout=subprocess.PIPE, universal_newlines=True ) for line in seff.communicate()[0].splitlines(): ... seff_output = [] for line in seff.stdout: seff_output.append(str(line.strip(), "UTF-8")) ... but compared the to the bash pipeline, this all seems a bit laboured. Does any one have a better approach? Cheers, Loris -Original Message- From: Cameron Simpson Sent: Wednesday, March 24, 2021 6:34 PM To: Avi Gross Cc: python-list@python.org Subject: Re: convert script awk in python On 24Mar2021 12:00, Avi Gross wrote: But I wonder how much languages like AWK are still used to make new programs as compared to a time they were really useful. You mentioned in an adjacent post that you've not used AWK since 2000. By contrast, I still use it regularly. It's great for proof of concept at the command line or in small scripts, and as the innards of quite useful scripts. I've a trite "colsum" script which does nothing but generate and run a little awk programme to sum a column, and routinely type "blah | colsum 2" or the like to get a tally. I totally agree that once you're processing a lot of data from places or where a shell script is making long pipelines or many command invocations, if that's a performance issue it is time to recode. Cheers, Cameron Simpson Footnotes: [1] https://orgmode.org/ -- https://mail.python.org/mailman/listinfo/python-list
Re: convert script awk in python
Peter Otten <__pete...@web.de> writes: > On 25/03/2021 08:14, Loris Bennett wrote: > >> I'm not doing that, but I am trying to replace a longish bash pipeline >> with Python code. >> >> Within Emacs, often I use Org mode[1] to generate date via some bash >> commands and then visualise the data via Python. Thus, in a single Org >> file I run >> >>/usr/bin/sacct -u $user -o jobid -X -S $start -E $end -s COMPLETED -n | >> \ >>xargs -I {} seff {} | grep 'Efficiency' | sed '$!N;s/\n/ /' | awk '{print >> $3 " " $9}' | sed 's/%//g' >> >> The raw numbers are formatted by Org into a table >> >>| cpu_eff | mem_eff | >>|-+-| >>|96.6 | 99.11 | >>| 93.43 | 100.0 | >>|91.3 | 100.0 | >>| 88.71 | 100.0 | >>| 89.79 | 100.0 | >>| 84.59 | 100.0 | >>| 83.42 | 100.0 | >>| 86.09 | 100.0 | >>| 92.31 | 100.0 | >>| 90.05 | 100.0 | >>| 81.98 | 100.0 | >>| 90.76 | 100.0 | >>| 75.36 | 64.03 | >> >> I then read this into some Python code in the Org file and do something like >> >>df = pd.DataFrame(eff_tab[1:], columns=eff_tab[0]) >>cpu_data = df.loc[: , "cpu_eff"] >>mem_data = df.loc[: , "mem_eff"] >> >>... >> >>n, bins, patches = axis[0].hist(cpu_data, bins=range(0, 110, 5)) >>n, bins, patches = axis[1].hist(mem_data, bins=range(0, 110, 5)) >> >> which generates nice histograms. >> >> I decided rewrite the whole thing as a stand-alone Python program so >> that I can run it as a cron job. However, as a novice Python programmer >> I am finding translating the bash part slightly clunky. I am in the >> middle of doing this and started with the following: >> >> sacct = subprocess.Popen(["/usr/bin/sacct", >>"-u", user, >>"-S", period[0], "-E", period[1], >>"-o", "jobid", "-X", >>"-s", "COMPLETED", "-n"], >> stdout=subprocess.PIPE, >> ) >> >> jobids = [] >> >> for line in sacct.stdout: >> jobid = str(line.strip(), 'UTF-8') >> jobids.append(jobid) >> >> for jobid in jobids: >> seff = subprocess.Popen(["/usr/bin/seff", jobid], >> stdin=sacct.stdout, >> stdout=subprocess.PIPE, >> ) > > The statement above looks odd. If seff can read the jobids from stdin > there should be no need to pass them individually, like: > > sacct = ... > seff = Popen( > ["/usr/bin/seff"], stdin=sacct.stdout, stdout=subprocess.PIPE, > universal_newlines=True > ) > for line in seff.communicate()[0].splitlines(): > ... Indeed, seff cannot read multiple jobids. That's why had 'xargs' in the original bash code. Initially I thought of calling 'xargs' via Popen, but this seemed very fiddly (I didn't manage to get it working) and anyway seemed a bit weird to me as it is really just a loop, which I can implement perfectly well in Python. Cheers, Loris >> seff_output = [] >> for line in seff.stdout: >> seff_output.append(str(line.strip(), "UTF-8")) >> >> ... >> >> but compared the to the bash pipeline, this all seems a bit laboured. >> >> Does any one have a better approach? >> >> Cheers, >> >> Loris >> >> >>> -Original Message- >>> From: Cameron Simpson >>> Sent: Wednesday, March 24, 2021 6:34 PM >>> To: Avi Gross >>> Cc: python-list@python.org >>> Subject: Re: convert script awk in python >>> >>> On 24Mar2021 12:00, Avi Gross wrote: But I wonder how much languages like AWK are still used to make new programs as compared to a time they were really useful. >>> >>> You mentioned in an adjacent post that you've not used AWK since 2000. >>> By contrast, I still use it regularly. >>> >>> It's great for proof of concept at the command line or in small scripts, and >>> as the innards of quite useful scripts. I've a trite "colsum" script which >>> does nothing but generate and run a little awk programme to sum a column, >>> and routinely type "blah | colsum 2" or the like to get a tally. >>> >>> I totally agree that once you're processing a lot of data from places or >>> where a shell script is making long pipelines or many command invocations, >>> if that's a performance issue it is time to recode. >>> >>> Cheers, >>> Cameron Simpson >> >> Footnotes: >> [1] https://orgmode.org/ >> > -- Dr. Loris Bennett (Hr./Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de -- https://mail.python.org/mailman/listinfo/python-list
Re: convert script awk in python
... funny thing is that OP never contributed to this discussion. Several people provided very valuable inputs but OP did not even bother to say "thank you". just saying ... On Wed, Mar 24, 2021 at 11:22:02AM -0400, Avi Gross via Python-list wrote: Cameron, I agree with you. I first encountered AWK in 1982 when I went to work for Bell Labs. I have not had any reason to use AWK since before the year 2000 so I was not sure that unused variables were initialized to zero. The code seemed to assume that. I have learned quite a few languages since and after a while, they tend to blend into each other. I think it would indeed have been more AWKthonic (or should that be called AWKward?) to have a BEGIN section in which functions were declared and variables clearly initialized but the language does allow some quick and dirty ways to do things and clearly the original programmer used some. Which brings us back to languages like python. When I started using AWK and a slew of other UNIX programs years ago, what I found interesting is how much AWK was patterned a bit on the C language, not a surprise as the K in AWK is Brian Kernighan who had a hand in C. But unlike C that made me wait around as it compiled, AWK was a bit more of an interpreted language and I could write one-liner shell scripts (well, stretched over a few lines if needed) that did things. True, if you stuck an entire program in a BEGIN statement and did not actually loop over data, it seems a tad wasteful. But sometimes it was handy to use it to test out a bit of C code I was writing without waiting for the whole compile thing. In a sense, it was bit like using the python REPL and getting raid feedback. Of course, when I was an early adopter of C++, too many things were not in AWK! What gets me is the original question which made it sound a bit like asking how you would translate some fairly simple program from language A to language B. For some fairly simple programs, the translation effort could be minimal. There are often trivial mappings between similar constructs. Quite a bit of python simply takes a block of code in another language that is between curly braces, and lines it up indented below whatever it modifies and after a colon. The reverse may be similarly trivial. There are of course many such changes needed for some languages but when some novel twist is used that the language does not directly support, you may need to innovate or do a rewrite that avoids it. But still, except in complicated expressions, you can rewrite x++ to "x += 1" if that is available or "x = x + 1" or "x -> x + 1" or whatever. What gets me here is that AWK in his program was being used exactly for what it was designed. Python is more general-purpose. Had we been asked (not on this forum) to convert that AWK script to PERL, it would have been much more straightforward because PERL was also designed to be able to read in lines and break them into parts and act on them. It has constructs like the diamond operator or split that make it easy. Hence, at the end, I suggested Tomasz may want to do his task not using just basic python but some module others have already shared that emulates some of the filter aspects of AWK. That may make it easier to just translate the bits of code to python while largely leaving the logic in place, depending on the module. Just to go way off the rails, was our annoying cross-poster from a while back also promising to include a language like AWK into their universal translator by just saving some JSON descriptions? -Original Message- From: Python-list On Behalf Of Cameron Simpson Sent: Tuesday, March 23, 2021 6:38 PM To: Tomasz Rola Cc: Avi Gross via Python-list Subject: Re: convert script awk in python On 23Mar2021 16:37, Tomasz Rola wrote: On Tue, Mar 23, 2021 at 10:40:01AM -0400, Avi Gross via Python-list wrote: [...] I am a tod concerned as to where any of the variables x, y or z have been defined at this point. I have not seen a BEGIN {...} pattern/action or anywhere these have been initialized but they are set in a function that as far as I know has not been called. Weird. Maybe awk is allowing an uninitialized variable to be tested for in your code but if so, you need to be cautious how you do this in python. As far as I can say, the type of uninitialised variable is groked from the first operation on it. I.e., "count += 1" first initializes count to 0 and then adds 1. This might depend on exact awk being used. There were few of them during last 30+ years. I just assume it does as I wrote above. I'm pretty sure this behaviour's been there since very early times. I think it was there when I learnt awk, decades ago. Using BEGIN would be in better style, of course. Aye. Always good to be up front about initial values. There is a very nice book, "The AWK Programming Language" by Aho, Kernighan and Weinberger. First printed in 1988, now free and in pdf format. Go search. Yes, a really nice book. [.
Let's celebrate: 20 years of EuroPython
This year's conference will mark the 20th edition of the EuroPython conference. * EuroPython 2021 * https://ep2021.europython.eu/ Since we started touring Europe in 2002 in Charleroi, Belgium, we have come a long way. The conference has grown from the initial 240 to around 1200-1400 attendees every year. The organization had started with a small group of people in a somewhat ad-hoc way and has now grown into a well structured team backed by the non-profit EuroPython Society (EPS), while still keeping the fun spirit of the early days. EuroPython 2002 was the first major all volunteer run event for Python. A lot of other conferences have since emerged in Europe and we're actively trying to help them wherever we can, with our grants program, growing conference knowledge base and our Organizers' Lunch, for which we regularly invite representatives of all European Python conferences to get together to network, exchange experience in organizing events community building and understand how we can most effectively help each other. To celebrate the anniversary, we took a deep look into our archives and have put together live website copies of the last few years, going back all the way to EP2009, which was held in Birmingham, UK. For previous years, archive.org stored some pages of the websites, so let's do a quick tour of 20 years of EuroPython ... https://blog.europython.eu/20th-anniversary-of-europython/ Help spread the word Please help us spread this message by sharing it on your social networks as widely as possible. Thank you ! Link to the blog post: https://blog.europython.eu/20th-anniversary-of-europython/ Tweet: https://twitter.com/europython/status/1375077502851420160 Enjoy, -- EuroPython 2021 Team https://www.europython-society.org/ -- https://mail.python.org/mailman/listinfo/python-list