Re: convert script awk in python

2021-03-25 Thread Loris Bennett
"Avi Gross"  writes:

> Just to be clear, Cameron, I retired very early and thus have had no reason
> to use AWK in a work situation and for a while was not using UNIX-based
> machines. I have no doubt I would have continued using WK as one part of my
> toolkit for years albeit less often as I found other tools better for some
> situations, let alone the kind I mentioned earlier that are not text-file
> based such as databases.
>
> It is, as noted, a great tool and if you only had one or a few tools like it
> available, it can easily be bent and twisted to do much of what the others
> do as it is more programmable than most. But following that line of
> reasoning, fairly simple python scripts can be written with python -c "..."
> or by pointing to a script
>
> Anyone have a collection of shell scripts that can be used in pipelines
> where each piece is just a call to python to do something simple?

I'm not doing that, but I am trying to replace a longish bash pipeline
with Python code.

Within Emacs, often I use Org mode[1] to generate date via some bash
commands and then visualise the data via Python.  Thus, in a single Org
file I run

  /usr/bin/sacct  -u $user -o jobid -X -S $start -E $end -s COMPLETED -n  | \   


  xargs -I {} seff {} | grep 'Efficiency' | sed '$!N;s/\n/ /' | awk '{print $3 
" " $9}' | sed 's/%//g' 
 
   
The raw numbers are formatted by Org into a table

  | cpu_eff | mem_eff |
  |-+-|
  |96.6 |   99.11 |
  |   93.43 |   100.0 |
  |91.3 |   100.0 |
  |   88.71 |   100.0 |
  |   89.79 |   100.0 |
  |   84.59 |   100.0 |
  |   83.42 |   100.0 |
  |   86.09 |   100.0 |
  |   92.31 |   100.0 |
  |   90.05 |   100.0 |
  |   81.98 |   100.0 |
  |   90.76 |   100.0 |
  |   75.36 |   64.03 |

I then read this into some Python code in the Org file and do something like

  df = pd.DataFrame(eff_tab[1:], columns=eff_tab[0])
  cpu_data = df.loc[: , "cpu_eff"]  

 
  mem_data = df.loc[: , "mem_eff"]  



  ...

  n, bins, patches = axis[0].hist(cpu_data, bins=range(0, 110, 5))  

 
  n, bins, patches = axis[1].hist(mem_data, bins=range(0, 110, 5)) 

which generates nice histograms.

I decided rewrite the whole thing as a stand-alone Python program so
that I can run it as a cron job.  However, as a novice Python programmer
I am finding translating the bash part slightly clunky.  I am in the
middle of doing this and started with the following:

sacct = subprocess.Popen(["/usr/bin/sacct",
  "-u", user,
  "-S", period[0], "-E", period[1],
  "-o", "jobid", "-X",
  "-s", "COMPLETED", "-n"],
 stdout=subprocess.PIPE,
)

jobids = []

for line in sacct.stdout:
jobid = str(line.strip(), 'UTF-8')
jobids.append(jobid)

for jobid in jobids:
seff = subprocess.Popen(["/usr/bin/seff", jobid],
stdin=sacct.stdout,
stdout=subprocess.PIPE,
)
seff_output = []
for line in seff.stdout:
seff_output.append(str(line.strip(), "UTF-8"))

...

but compared the to the bash pipeline, this all seems a bit laboured. 

Does any one have a better approach?

Cheers,

Loris


> -Original Message-
> From: Cameron Simpson  
> Sent: Wednesday, March 24, 2021 6:34 PM
> To: Avi Gross 
> Cc: python-list@python.org
> Subject: Re: convert script awk in python
>
> On 24Mar2021 12:00, Avi Gross  wrote:
>>But I wonder how much languages like AWK are still used to make new 
>>programs as compared to a time they were really useful.
>
> You mentioned in an adjacent post that you've not used AWK since 2000.  
> By contrast, I still use it regularly.
>
> It's great for proof of concept at the command line or in small scripts, and
> as the innards of quite useful scripts. I've a trite "colsum" script which
> does nothing but generate and run a little awk programme to sum a column,
> and routinely type "blah  | colsum 2" or the like to get a tally.
>
> I totally agree that once you're processing a lot of data from places 

Re: convert script awk in python

2021-03-25 Thread Christian Gollwitzer

Am 25.03.21 um 00:30 schrieb Avi Gross:

It [awk] is, as noted, a great tool and if you only had one or a few tools like 
it
available, it can easily be bent and twisted to do much of what the others
do as it is more programmable than most. But following that line of
reasoning, fairly simple python scripts can be written with python -c "..."
or by pointing to a script


The thing with awk is that lots of useful text processing is directly 
built into the main syntax; whereas in Python, you can certainly do it 
as well, but it requires to load a library. The simple column summation 
mentioned before by Cameron would be


   awk ' {sum += $2 } END {print sum}'

which can be easily typed into a command line, with the benefit that it 
skips every line where the 2nd col is not a valid number. This is 
important because often there are empty lines, often there is an empty 
line at the end, some ascii headers whatever.


The closest equivalent I can come up with in Python is this:

==
import sys

s=0
for line in sys.stdin:
try:
s += float(line.split()[1])
except:
pass
print(s)
===


I don't want to cram this into a python -c " "  line, if it even is 
possible; how do you handle indentation levels and loops??


Of course, for big fancy programs Python is a much better choice than 
awk, no questions asked - but awk has a place for little things which 
fit the special programming model, and there are surprisingly many 
applications where this is just the easiest and fastest way to do the job.


It's like regexes - a few simple characters can do the job which 
otherwise requires a bulky program, but once the parsing gets to certain 
complexity, a true parsing language, or even just handcoded Python is 
much more maintainable.


Christian

PS: Exercise - handle lines commented out with a '#', i.e. skip those. 
In awk:


gawk '!/^\s*#/ {sum += $2 } END {print sum}'

--
https://mail.python.org/mailman/listinfo/python-list


Re: convert script awk in python

2021-03-25 Thread Peter Otten

On 25/03/2021 08:14, Loris Bennett wrote:


I'm not doing that, but I am trying to replace a longish bash pipeline
with Python code.

Within Emacs, often I use Org mode[1] to generate date via some bash
commands and then visualise the data via Python.  Thus, in a single Org
file I run

   /usr/bin/sacct  -u $user -o jobid -X -S $start -E $end -s COMPLETED -n  | \
   xargs -I {} seff {} | grep 'Efficiency' | sed '$!N;s/\n/ /' | awk '{print $3 " 
" $9}' | sed 's/%//g'

The raw numbers are formatted by Org into a table


   | cpu_eff | mem_eff |
   |-+-|
   |96.6 |   99.11 |
   |   93.43 |   100.0 |
   |91.3 |   100.0 |
   |   88.71 |   100.0 |
   |   89.79 |   100.0 |
   |   84.59 |   100.0 |
   |   83.42 |   100.0 |
   |   86.09 |   100.0 |
   |   92.31 |   100.0 |
   |   90.05 |   100.0 |
   |   81.98 |   100.0 |
   |   90.76 |   100.0 |
   |   75.36 |   64.03 |

I then read this into some Python code in the Org file and do something like

   df = pd.DataFrame(eff_tab[1:], columns=eff_tab[0])
   cpu_data = df.loc[: , "cpu_eff"]
   mem_data = df.loc[: , "mem_eff"]

   ...

   n, bins, patches = axis[0].hist(cpu_data, bins=range(0, 110, 5))
   n, bins, patches = axis[1].hist(mem_data, bins=range(0, 110, 5))

which generates nice histograms.

I decided rewrite the whole thing as a stand-alone Python program so
that I can run it as a cron job.  However, as a novice Python programmer
I am finding translating the bash part slightly clunky.  I am in the
middle of doing this and started with the following:

 sacct = subprocess.Popen(["/usr/bin/sacct",
   "-u", user,
   "-S", period[0], "-E", period[1],
   "-o", "jobid", "-X",
   "-s", "COMPLETED", "-n"],
  stdout=subprocess.PIPE,
 )

 jobids = []

 for line in sacct.stdout:
 jobid = str(line.strip(), 'UTF-8')
 jobids.append(jobid)

 for jobid in jobids:
 seff = subprocess.Popen(["/usr/bin/seff", jobid],
 stdin=sacct.stdout,
 stdout=subprocess.PIPE,
 )


The statement above looks odd. If seff can read the jobids from stdin 
there should be no need to pass them individually, like:


sacct = ...
seff = Popen(
  ["/usr/bin/seff"], stdin=sacct.stdout, stdout=subprocess.PIPE,
  universal_newlines=True
)
for line in seff.communicate()[0].splitlines():
...



 seff_output = []
 for line in seff.stdout:
 seff_output.append(str(line.strip(), "UTF-8"))

 ...

but compared the to the bash pipeline, this all seems a bit laboured.

Does any one have a better approach?

Cheers,

Loris



-Original Message-
From: Cameron Simpson 
Sent: Wednesday, March 24, 2021 6:34 PM
To: Avi Gross 
Cc: python-list@python.org
Subject: Re: convert script awk in python

On 24Mar2021 12:00, Avi Gross  wrote:

But I wonder how much languages like AWK are still used to make new
programs as compared to a time they were really useful.


You mentioned in an adjacent post that you've not used AWK since 2000.
By contrast, I still use it regularly.

It's great for proof of concept at the command line or in small scripts, and
as the innards of quite useful scripts. I've a trite "colsum" script which
does nothing but generate and run a little awk programme to sum a column,
and routinely type "blah  | colsum 2" or the like to get a tally.

I totally agree that once you're processing a lot of data from places or
where a shell script is making long pipelines or many command invocations,
if that's a performance issue it is time to recode.

Cheers,
Cameron Simpson 


Footnotes:
[1]  https://orgmode.org/




--
https://mail.python.org/mailman/listinfo/python-list


Re: convert script awk in python

2021-03-25 Thread Loris Bennett
Peter Otten <__pete...@web.de> writes:

> On 25/03/2021 08:14, Loris Bennett wrote:
>
>> I'm not doing that, but I am trying to replace a longish bash pipeline
>> with Python code.
>>
>> Within Emacs, often I use Org mode[1] to generate date via some bash
>> commands and then visualise the data via Python.  Thus, in a single Org
>> file I run
>>
>>/usr/bin/sacct  -u $user -o jobid -X -S $start -E $end -s COMPLETED -n  | 
>> \
>>xargs -I {} seff {} | grep 'Efficiency' | sed '$!N;s/\n/ /' | awk '{print 
>> $3 " " $9}' | sed 's/%//g'
>>
>> The raw numbers are formatted by Org into a table
>>
>>| cpu_eff | mem_eff |
>>|-+-|
>>|96.6 |   99.11 |
>>|   93.43 |   100.0 |
>>|91.3 |   100.0 |
>>|   88.71 |   100.0 |
>>|   89.79 |   100.0 |
>>|   84.59 |   100.0 |
>>|   83.42 |   100.0 |
>>|   86.09 |   100.0 |
>>|   92.31 |   100.0 |
>>|   90.05 |   100.0 |
>>|   81.98 |   100.0 |
>>|   90.76 |   100.0 |
>>|   75.36 |   64.03 |
>>
>> I then read this into some Python code in the Org file and do something like
>>
>>df = pd.DataFrame(eff_tab[1:], columns=eff_tab[0])
>>cpu_data = df.loc[: , "cpu_eff"]
>>mem_data = df.loc[: , "mem_eff"]
>>
>>...
>>
>>n, bins, patches = axis[0].hist(cpu_data, bins=range(0, 110, 5))
>>n, bins, patches = axis[1].hist(mem_data, bins=range(0, 110, 5))
>>
>> which generates nice histograms.
>>
>> I decided rewrite the whole thing as a stand-alone Python program so
>> that I can run it as a cron job.  However, as a novice Python programmer
>> I am finding translating the bash part slightly clunky.  I am in the
>> middle of doing this and started with the following:
>>
>>  sacct = subprocess.Popen(["/usr/bin/sacct",
>>"-u", user,
>>"-S", period[0], "-E", period[1],
>>"-o", "jobid", "-X",
>>"-s", "COMPLETED", "-n"],
>>   stdout=subprocess.PIPE,
>>  )
>>
>>  jobids = []
>>
>>  for line in sacct.stdout:
>>  jobid = str(line.strip(), 'UTF-8')
>>  jobids.append(jobid)
>>
>>  for jobid in jobids:
>>  seff = subprocess.Popen(["/usr/bin/seff", jobid],
>>  stdin=sacct.stdout,
>>  stdout=subprocess.PIPE,
>>  )
>
> The statement above looks odd. If seff can read the jobids from stdin
> there should be no need to pass them individually, like:
>
> sacct = ...
> seff = Popen(
>   ["/usr/bin/seff"], stdin=sacct.stdout, stdout=subprocess.PIPE,
>   universal_newlines=True
> )
> for line in seff.communicate()[0].splitlines():
> ...

Indeed, seff cannot read multiple jobids.  That's why had 'xargs' in the
original bash code.  Initially I thought of calling 'xargs' via
Popen, but this seemed very fiddly (I didn't manage to get it working)
and anyway seemed a bit weird to me as it is really just a loop, which I
can implement perfectly well in Python.

Cheers,

Loris


>>  seff_output = []
>>  for line in seff.stdout:
>>  seff_output.append(str(line.strip(), "UTF-8"))
>>
>>  ...
>>
>> but compared the to the bash pipeline, this all seems a bit laboured.
>>
>> Does any one have a better approach?
>>
>> Cheers,
>>
>> Loris
>>
>>
>>> -Original Message-
>>> From: Cameron Simpson 
>>> Sent: Wednesday, March 24, 2021 6:34 PM
>>> To: Avi Gross 
>>> Cc: python-list@python.org
>>> Subject: Re: convert script awk in python
>>>
>>> On 24Mar2021 12:00, Avi Gross  wrote:
 But I wonder how much languages like AWK are still used to make new
 programs as compared to a time they were really useful.
>>>
>>> You mentioned in an adjacent post that you've not used AWK since 2000.
>>> By contrast, I still use it regularly.
>>>
>>> It's great for proof of concept at the command line or in small scripts, and
>>> as the innards of quite useful scripts. I've a trite "colsum" script which
>>> does nothing but generate and run a little awk programme to sum a column,
>>> and routinely type "blah  | colsum 2" or the like to get a tally.
>>>
>>> I totally agree that once you're processing a lot of data from places or
>>> where a shell script is making long pipelines or many command invocations,
>>> if that's a performance issue it is time to recode.
>>>
>>> Cheers,
>>> Cameron Simpson 
>>
>> Footnotes:
>> [1]  https://orgmode.org/
>>
>
-- 
Dr. Loris Bennett (Hr./Mr.)
ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: convert script awk in python

2021-03-25 Thread Dan Ciprus (dciprus) via Python-list
... funny thing is that OP never contributed to this discussion. Several people 
provided very valuable inputs but OP did not even bother to say "thank you".


just saying ...

On Wed, Mar 24, 2021 at 11:22:02AM -0400, Avi Gross via Python-list wrote:

Cameron,

I agree with you. I first encountered AWK in 1982 when I went to work for
Bell Labs.

I have not had any reason to use AWK since before the year 2000 so I was not
sure that unused variables were initialized to zero. The code seemed to
assume that. I have learned quite a few languages since and after a while,
they tend to blend into each other.

I think it would indeed have been more AWKthonic (or should that be called
AWKward?) to have a BEGIN section in which functions were declared and
variables clearly initialized but the language does allow some quick and
dirty ways to do things and clearly the original programmer used some.

Which brings us back to languages like python. When I started using AWK and
a slew of other UNIX programs years ago, what I found interesting is how
much AWK was patterned a bit on the C language, not a surprise as the K in
AWK is Brian Kernighan who had a hand in C. But unlike C that made me wait
around as it compiled, AWK was a bit more of an interpreted language and I
could write one-liner shell scripts (well, stretched over a few lines if
needed) that did things. True, if you stuck an entire program in a BEGIN
statement and did not actually loop over data, it seems a tad wasteful. But
sometimes it was handy to use it to test out a bit of C code I was writing
without waiting for the whole compile thing. In a sense, it was  bit like
using the python REPL and getting raid feedback. Of course, when I was an
early adopter of C++, too many things were not in AWK!

What gets me is the original question which made it sound a bit like asking
how you would translate some fairly simple program from language A to
language B. For some fairly simple programs, the translation effort could be
minimal. There are often trivial mappings between similar constructs. Quite
a bit of python simply takes a block of code in another language that is
between curly braces, and lines it up indented below whatever it modifies
and after a colon. The reverse may be similarly trivial. There are of course
many such changes needed for some languages but when some novel twist is
used that the language does not directly support, you may need to innovate
or do a rewrite that avoids it. But still, except in complicated
expressions, you can rewrite x++ to "x += 1" if that is available or "x = x
+ 1" or "x -> x + 1" or whatever.

What gets me here is that AWK in his program  was being used exactly for
what it was designed. Python is more general-purpose. Had we been asked (not
on this forum) to convert that AWK script to PERL, it would have been much
more straightforward because PERL was also designed to be able to read in
lines and break them into parts and act on them. It has constructs like the
diamond operator or split that make it easy.

Hence, at the end, I suggested Tomasz may want to do his task not using just
basic python but some module others have already shared that emulates some
of the filter aspects of AWK. That may make it easier to just translate the
bits of code to python while largely leaving the logic in place, depending
on the module.

Just to go way off the rails, was our annoying cross-poster from a while
back also promising to include a language like AWK into their universal
translator by just saving some JSON descriptions?

-Original Message-
From: Python-list  On
Behalf Of Cameron Simpson
Sent: Tuesday, March 23, 2021 6:38 PM
To: Tomasz Rola 
Cc: Avi Gross via Python-list 
Subject: Re: convert script awk in python

On 23Mar2021 16:37, Tomasz Rola  wrote:

On Tue, Mar 23, 2021 at 10:40:01AM -0400, Avi Gross via Python-list wrote:
[...]

I am a tod concerned as to where any of the variables x, y or z have
been defined at this point. I have not seen a BEGIN {...}
pattern/action or anywhere these have been initialized but they are
set in a function that as far as I know has not been called. Weird.
Maybe awk is allowing an uninitialized variable to be tested for in
your code but if so, you need to be cautious how you do this in python.


As far as I can say, the type of uninitialised variable is groked from
the first operation on it. I.e., "count += 1" first initializes count
to 0 and then adds 1.

This might depend on exact awk being used. There were few of them
during last 30+ years. I just assume it does as I wrote above.


I'm pretty sure this behaviour's been there since very early times. I think
it was there when I learnt awk, decades ago.


Using BEGIN would be in better style, of course.


Aye. Always good to be up front about initial values.


There is a very nice book, "The AWK Programming Language" by Aho,
Kernighan and Weinberger. First printed in 1988, now free and in pdf
format. Go search.


Yes, a really nice book. [.

Let's celebrate: 20 years of EuroPython

2021-03-25 Thread M.-A. Lemburg
This year's conference will mark the 20th edition of the EuroPython
conference.

* EuroPython 2021 *

   https://ep2021.europython.eu/


Since we started touring Europe in 2002 in Charleroi, Belgium, we have
come a long way. The conference has grown from the initial 240 to around
1200-1400 attendees every year. The organization had started with a
small group of people in a somewhat ad-hoc way and has now grown into a
well structured team backed by the non-profit EuroPython Society (EPS),
while still keeping the fun spirit of the early days.

EuroPython 2002 was the first major all volunteer run event for Python.
A lot of other conferences have since emerged in Europe and we're
actively trying to help them wherever we can, with our grants program,
growing conference knowledge base and our Organizers' Lunch, for which
we regularly invite representatives of all European Python conferences
to get together to network, exchange experience in organizing events
community building and understand how we can most effectively help each
other.

To celebrate the anniversary, we took a deep look into our archives and
have put together live website copies of the last few years, going back
all the way to EP2009, which was held in Birmingham, UK. For previous
years, archive.org stored some pages of the websites, so let's do a
quick tour of 20 years of EuroPython ...

https://blog.europython.eu/20th-anniversary-of-europython/


Help spread the word


Please help us spread this message by sharing it on your social
networks as widely as possible. Thank you !

Link to the blog post:

https://blog.europython.eu/20th-anniversary-of-europython/

Tweet:

https://twitter.com/europython/status/1375077502851420160


Enjoy,
--
EuroPython 2021 Team
https://www.europython-society.org/

-- 
https://mail.python.org/mailman/listinfo/python-list