Re: [osv-dev] [PATCH] scripts: upgrade to python 3 - part 1

Nadav Har'El Wed, 19 Feb 2020 14:29:08 -0800

On Wed, Feb 19, 2020 at 12:52 AM Waldek Kozaczuk <[email protected]>
wrote:


> So I think I have a small subset of this patch ready that encompassed
> everything from the original path but loader.py and trace related files.
>
> But I just realized a problem with updates setup.py. It now depends on
> distro package - https://pypi.org/project/distro/ - which requires pip to
> install. Which means we now require python3-pip to be installed first
> before running setup.py. Chicken or egg kind of problem. This is not an
> issue with docker as we simply have docker install it first. How do we
> solve it in non-docker setup?
>

My answer would be: *don't* depend on some random package that doesn't come
preinstalled on 99% of the installations and cause people grief.
What I would have done personally is to just parse /etc/os-release with a
few lines of hand-written code. Not everything needs to come from a
package... Something like (untested)

with open("/etc/os-release") as f:
    distro = {}
    for line in f:
        k,v = line.rstrip().split("=")
        distro[k] = v
        distro[k] = v.strip('"')


Now distro['NAME'] is Fedora, distro['VERSION_ID'] is 31, for example.



> Ideas?
> Waldek
>
> On Tuesday, February 18, 2020 at 12:04:36 PM UTC-5, Waldek Kozaczuk wrote:
>>
>> Let me split this patch into some smaller ones and take your approach of
>> first checking if anything breaks with python3.
>>
>> I think that loader.py needs to be treated separately. Is it true that
>> gdb comes in with its own python runtime and these days it should be the
>> version 3 (we are not relying on externals anymore, right)? The problem
>> with loader.py is that it relies in some places on tracing related scripts:
>>
>> from osv.trace import (Trace, Thread, TracePoint, BacktraceFormatter,
>> from osv import trace, debug
>>
>> Does it mean we want to make those trace related scripts be 2 and 3
>> compatible?
>>
>> On Tuesday, February 18, 2020 at 11:53:19 AM UTC-5, Nadav Har'El wrote:
>>
>>>
>>> On Tue, Feb 18, 2020 at 3:54 PM Waldek Kozaczuk <[email protected]>
>>> wrote:
>>>
>>> This -
>>> https://stackoverflow.com/questions/27476079/why-does-2to3-change-mydict-keys-to-listmydict-keys
>>>  -
>>> might explain this "list" related changes by 2to3.
>>>
>>>
>>> Yet it doesn't explain the change - it explains why it is *not* actually
>>> needed ;-)
>>> I know why it's needed in some specific cases where a list is required
>>> and .keys() is no longer a list. But  it's not needed here (in a for loop)
>>> and in most other locations where this list() was added :-(
>>>
>>>
>>> On Tuesday, February 18, 2020 at 8:42:47 AM UTC-5, Waldek Kozaczuk wrote:
>>>
>>> Hey,
>>>
>>> Thanks for the review.
>>>
>>> Let me first explain what my process was. At first, I just tried to
>>> submit Matt's changes with proper signoff. But then I decided to test it a
>>> bit and discovered some things were breaking. Now I do not think it was
>>> because of any problems in his original script, but mostly due to the fact
>>> that his changes were intertwined with other scripts and I had to change
>>> those. Also, pretty modules/openjdk8-from-host/module.py (was not there
>>> when Matt was changing) did not work And finally trace.py somehow came
>>> along. So here we go.
>>>
>>> As far as mechanics goes, I retained most of the Matt's patch as is and
>>> I believe he used the "Future" module. I used the 2to3 tool (more
>>> specifically 2to3-2.7 in my case) first and then manually fixed any
>>> problems I discovered. Most painful to fix was trace.py and all related
>>> ones (scripts/osv/*.py) which I tested quite thoroughly by running trace.py
>>> (I learned more about it which is great but probably way too much than what
>>> I wanted to know at this point ;-)). I also tested all other tests more
>>> implicitly by running scripts/build with some variations of image= (java is
>>> good as it exercises a lot) and fs=.
>>>
>>> But you are right that I could not thoroughly test loader.py (did many
>>> commands but not all). Shall we exclude it from this patch?
>>>
>>> Lastly, I am probably not the right person to do this upgrade exercise.
>>> I do not use Python daily so I am not an expert. Lack of compiler (aka
>>> 'complainer') did not make me very confident especially with those larger
>>> scripts. But I did not want Matt's work to go to waste. So here we go :-)
>>>
>>> On Tuesday, February 18, 2020 at 5:34:58 AM UTC-5, Nadav Har'El wrote:
>>>
>>> Thanks. I commented with a few concerns below. I'm only really worried
>>> about loader.py - the gdb script, which was already supposed to be working
>>> for both Python 2 and Python 3 (although we probably didn't test it
>>> recently), and I'm worried these changes are breaking it, rather than
>>> improving it - and it's difficult to test because these changes change all
>>> sorts of "osv" commands in gdb which I guess you didn't test individually.
>>>
>>> Shall we leave it out?
>>>
>>>
>>> I ask that you please at least try to run the affected scripts in
>>> Python3 before "fixing" them at all. In particular, some scripts that had
>>> "/usr/bin/python" at the top (and not /usr/bin/python2) at the top already
>>> worked correctly for Python 3 (and Python 2) because some people already
>>> had python 3 as their default python (although, it is quite likely that we
>>> haven't tested this in a while so things broke).
>>>
>>> In most cases, I first ran 2to3 and then applied manual changes if I
>>> found any problems (like related to bytes vs str type of issue - really
>>> painful to find and fix). Indeed 2to3 is somewhat suspicious as it
>>> sometimes put extra parentheses when there were already (I manually removed
>>> those changes). Not sure about list-like related changes - all done by 2to3.
>>>
>>>
>>> --
>>> Nadav Har'El
>>> [email protected]
>>>
>>>
>>> On Mon, Feb 17, 2020 at 8:22 AM Waldemar Kozaczuk <[email protected]>
>>> wrote:
>>>
>>> --- a/scripts/loader.py
>>> +++ b/scripts/loader.py
>>>
>>>
>>> Please make sure to test this file a bit, because it isn't part of the
>>> standard build or run.
>>> This is the script loaded automatically when you use gdb OSv.
>>>
>>> Also, I was always under the impression that this script already worked
>>> on both Python 2 and
>>> Python 3, because we had no idea what gdb runs for us (the "#!" in the
>>> top of this script isn't relevant, right?)?
>>>
>>> Shall we change at least this '#!/usr/bin/python2' to
>>> '#!/usr/bin/python' then?
>>>
>>>
>>> @@ -1,4 +1,4 @@
>>> -#!/usr/bin/python2
>>> +#!/usr/bin/python3
>>>
>>>  import gdb
>>>  import re
>>> @@ -37,8 +37,8 @@ def phys_cast(addr, type):
>>>
>>>  def values(_dict):
>>>      if hasattr(_dict, 'viewvalues'):
>>> -        return _dict.viewvalues()
>>> -    return _dict.values()
>>> +        return _dict.values()
>>>
>>>
>>> This doesn't look right - you check if there is a viewvalues() function
>>> and then call values()?
>>> Maybe the whole "hasattr" check isn't needed on Python 2 any more?
>>>
>>> +    return list(_dict.values())
>>>
>>>
>>> I wonder why you need to convert it to a list - what's wrong with a
>>> generator which isn't officially a list? Did this solve a real problem, or
>>> some automatic converter suggested it?
>>>
>>> Changed by 2to3.
>>>
>>>
>>>
>>>  def read_vector(v):
>>>      impl = v['_M_impl']
>>> @@ -426,19 +426,19 @@ class osv_zfs(gdb.Command):
>>>
>>>          print ("\n:: ARC SIZES ::")
>>>          print ("\tCurrent size:    %d (%d MB)" %
>>> -               (arc_size, arc_size / 1024 / 1024))
>>> +               (arc_size, arc_size // 1024 // 1024))
>>>          print ("\tTarget size:     %d (%d MB)" %
>>> -               (arc_target_size, arc_target_size / 1024 / 1024))
>>> +               (arc_target_size, arc_target_size // 1024 // 1024))
>>>          print ("\tMin target size: %d (%d MB)" %
>>> -               (arc_min_size, arc_min_size / 1024 / 1024))
>>> +               (arc_min_size, arc_min_size // 1024 // 1024))
>>>          print ("\tMax target size: %d (%d MB)" %
>>> -               (arc_max_size, arc_max_size / 1024 / 1024))
>>> +               (arc_max_size, arc_max_size // 1024 // 1024))
>>>
>>>          print ("\n:: ARC SIZE BREAKDOWN ::")
>>>          print ("\tMost recently used cache size:   %d (%d MB) (%.2f%%)"
>>> %
>>> -               (arc_mru_size, arc_mru_size / 1024 / 1024, arc_mru_perc))
>>> +               (arc_mru_size, arc_mru_size // 1024 // 1024,
>>> arc_mru_perc))
>>>          print ("\tMost frequently used cache size: %d (%d MB) (%.2f%%)"
>>> %
>>> -               (arc_mfu_size, arc_mfu_size / 1024 / 1024, arc_mfu_perc))
>>> +               (arc_mfu_size, arc_mfu_size // 1024 // 1024,
>>> arc_mfu_perc))
>>>
>>>          # Cache efficiency
>>>          arc_hits = get_stat_by_name(arc_stats_struct, arc_stats_cast,
>>> 'arcstat_hits')
>>> @@ -618,7 +618,7 @@ class osv_mmap(gdb.Command):
>>>              end = ulong(vma['_range']['_end'])
>>>              flags = flagstr(ulong(vma['_flags']))
>>>              perm = permstr(ulong(vma['_perm']))
>>> -            size = '{:<16}'.format('[%s kB]' % (ulong(end -
>>> start)/1024))
>>> +            size = '{:<16}'.format('[%s kB]' % (ulong(end -
>>> start)//1024))
>>>
>>>              if 'F' in flags:
>>>                  file_vma =
>>> vma.cast(gdb.lookup_type('mmu::file_vma').pointer())
>>> @@ -648,7 +648,7 @@ class osv_vma_find(gdb.Command):
>>>                  if start <= addr and end > addr:
>>>                      flags = flagstr(ulong(vma['_flags']))
>>>                      perm = permstr(ulong(vma['_perm']))
>>> -                    size = '{:<16}'.format('[%s kB]' % (ulong(end -
>>> start)/1024))
>>> +                    size = '{:<16}'.format('[%s kB]' % (ulong(end -
>>> start)//1024))
>>>                      print('0x%016x -> vma 0x%016x' % (addr, vma_addr))
>>>                      print('0x%016x 0x%016x %s flags=%s perm=%s' %
>>> (start, end, size, flags, perm))
>>>                      break
>>> @@ -671,7 +671,7 @@ def ulong(x):
>>>  def to_int(gdb_value):
>>>      if hasattr(globals()['__builtins__'], 'long'):
>>>          # For GDB with python2
>>> -        return long(gdb_value)
>>> +        return int(gdb_value)
>>>
>>>
>>> Again, this change is wrong, was it automated?
>>> The whole point of this code was to support *both* python 2
>>> and python 3, and now we go breaking the old python 2 code,
>>> but not even removing the if :-)
>>>
>>>      return int(gdb_value)
>>>
>>>  class osv_syms(gdb.Command):
>>> @@ -751,7 +751,7 @@ def get_base_class_offset(gdb_type, base_class_name):
>>>      name_pattern = re.escape(base_class_name) + "(<.*>)?$"
>>>      for field in gdb_type.fields():
>>>          if field.is_base_class and re.match(name_pattern, field.name):
>>> -            return field.bitpos / 8
>>> +            return field.bitpos // 8
>>>
>>>  def derived_from(type, base_class):
>>>      return len([x for x in type.fields()
>>> @@ -808,11 +808,8 @@ class intrusive_list:
>>>              yield node_ptr.cast(self.node_type.pointer()).dereference()
>>>              hook = hook['next_']
>>>
>>> -    def __nonzero__(self):
>>> -        return self.root['next_'] != self.root.address
>>> -
>>>
>>>
>>> Any reason to remove this? (I don't know why it was needed, but why
>>> remove?)
>>>
>>> Changed by 2to3.
>>>
>>>
>>>      def __bool__(self):
>>> -        return self.__nonzero__()
>>> +        return self.root['next_'] != self.root.address
>>>
>>>  class vmstate(object):
>>>      def __init__(self):
>>> @@ -832,7 +829,7 @@ class vmstate(object):
>>>          self.cpu_list = cpu_list
>>>
>>>      def load_thread_list(self):
>>> -        threads = map(gdb.Value.dereference,
>>> unordered_map(gdb.lookup_global_symbol('sched::thread_map').value()))
>>> +        threads = list(map(gdb.Value.dereference,
>>> unordered_map(gdb.lookup_global_symbol('sched::thread_map').value())))
>>>
>>>
>>> I don't understand why this change is needed... sorted() should work
>>> just fine on an interable, it doesn't need a list as far as I know.
>>> Did something now work without this change?
>>>
>>> Changed by 2to3.
>>>
>>>
>>>          self.thread_list = sorted(threads, key=lambda x: int(x["_id"]))
>>>
>>>      def cpu_from_thread(self, thread):
>>> @@ -896,7 +893,7 @@ def show_thread_timers(t):
>>>          gdb.write('  timers:')
>>>          for timer in timer_list:
>>>              expired = '*' if timer['_state'] == timer_state_expired
>>> else ''
>>> -            expiration = int(timer['_time']['__d']['__r']) / 1.0e9
>>> +            expiration = int(timer['_time']['__d']['__r']) // 1.0e9
>>>              gdb.write(' %11.9f%s' % (expiration, expired))
>>>          gdb.write('\n')
>>>
>>> @@ -911,7 +908,7 @@ class ResolvedFrame:
>>>          self.frame = frame
>>>          self.file_name = file_name
>>>          self.line = line
>>> -        self.func_name = func_name
>>> +        self.__name__ = func_name
>>>
>>>
>>> Why rename this field??
>>>
>>> Changed by 2to3.
>>>
>>>
>>>
>>>  def traverse_resolved_frames(frame):
>>>      while frame:
>>> @@ -989,14 +986,14 @@ class osv_info_threads(gdb.Command):
>>>                  function_whitelist = [sched_thread_join]
>>>
>>>                  def is_interesting(resolved_frame):
>>> -                    is_whitelisted = resolved_frame.func_name in
>>> function_whitelist
>>> +                    is_whitelisted = resolved_frame.__name__ in
>>> function_whitelist
>>>                      is_blacklisted =
>>> os.path.basename(resolved_frame.file_name) in file_blacklist
>>>                      return is_whitelisted or not is_blacklisted
>>>
>>>                  fr = find_or_give_last(is_interesting,
>>> traverse_resolved_frames(newest_frame))
>>>
>>>                  if fr:
>>> -                    location = '%s at %s:%s' % (fr.func_name,
>>> strip_dotdot(fr.file_name), fr.line)
>>> +                    location = '%s at %s:%s' % (fr.__name__,
>>> strip_dotdot(fr.file_name), fr.line)
>>>                  else:
>>>                      location = '??'
>>>
>>> @@ -1009,7 +1006,7 @@ class osv_info_threads(gdb.Command):
>>>                             )
>>>                            )
>>>
>>> -                if fr and fr.func_name == sched_thread_join:
>>> +                if fr and fr.__name__ == sched_thread_join:
>>>                      gdb.write("\tjoining on %s\n" %
>>> fr.frame.read_var("this"))
>>>
>>>                  show_thread_timers(t)
>>> @@ -1176,6 +1173,7 @@ def all_traces():
>>>          max_trace = ulong(trace_buffer['_size'])
>>>
>>>          if not trace_log_base:
>>> +            print('!!! Could not find any trace data! Make sure
>>> "--trace" option matches some tracepoints.')
>>>              raise StopIteration
>>>
>>>          trace_log = inf.read_memory(trace_log_base, max_trace)
>>> @@ -1214,7 +1212,7 @@ def all_traces():
>>>              unpacker.align_up(8)
>>>              yield Trace(tp, Thread(thread, thread_name), time, cpu,
>>> data, backtrace=backtrace)
>>>
>>> -    iters = map(lambda cpu: one_cpu_trace(cpu), values(state.cpu_list))
>>> +    iters = [one_cpu_trace(cpu) for cpu in values(state.cpu_list)]
>>>
>>>      return heapq.merge(*iters)
>>>
>>>  def save_traces_to_file(filename):
>>> @@ -1281,7 +1279,7 @@ def show_leak():
>>>      gdb.flush()
>>>      allocs = []
>>>      for i in range(size_allocations):
>>> -        newpercent = '%2d%%' % round(100.0*i/(size_allocations-1))
>>> +        newpercent = '%2d%%' % round(100.0*i//(size_allocations-1))
>>>
>>>
>>> It seems like *all* divisions in this file were converted to integer
>>> divisions. Are you sure this was the intent in all of them?
>>> Here in particular, it's pretty clear by the 100.0 that floating point
>>> division was intended!
>>>
>>> Changed by 2to3. But you are right this does not look correct.
>>>
>>>
>>>
>>>
>>>          if newpercent != percent:
>>>              percent = newpercent
>>>              gdb.write('\b\b\b%s' % newpercent)
>>> @@ -1343,10 +1341,10 @@ def show_leak():
>>>                     allocations=cur_n,
>>>                     minsize=cur_min_size,
>>>                     maxsize=cur_max_size,
>>> -                   avgsize=cur_total_size/cur_n,
>>> +                   avgsize=cur_total_size//cur_n,
>>>
>>>                     minbirth=cur_first_seq,
>>>                     maxbirth=cur_last_seq,
>>> -                   avgbirth=cur_total_seq/cur_n,
>>> +                   avgbirth=cur_total_seq//cur_n,
>>>                     callchain=callchain)
>>>          records.append(r)
>>>          cur_n = 0
>>> @@ -1538,7 +1536,7 @@ class osv_percpu(gdb.Command):
>>>              gdb.write('%s\n'%e)
>>>              return
>>>          percpu_addr = percpu.address
>>> -        for cpu in vmstate().cpu_list.values():
>>> +        for cpu in list(vmstate().cpu_list.values()):
>>>              gdb.write("CPU %d:\n" % cpu.id)
>>>              base = cpu.obj['percpu_base']
>>>              addr = base+to_int(percpu_addr)
>>>
>>>
>>>
>>>
>>> diff --git a/scripts/module.py b/scripts/module.py
>>> index 548dd0c7..c253968c 100755
>>> --- a/scripts/module.py
>>> +++ b/scripts/module.py
>>> @@ -1,4 +1,4 @@
>>> -#!/usr/bin/python
>>> +#!/usr/bin/python3
>>>
>>>  import re
>>>  import os
>>> @@ -226,7 +226,7 @@ def build(args):
>>>          else:
>>>              print(prefix)
>>>
>>> -    for module, run_config_name in modules_to_run.items():
>>> +    for module, run_config_name in list(modules_to_run.items()):
>>>
>>>
>>> This smells like an automated change - I don't see any reason why this
>>> list is needed!
>>>
>>> Can you please review all these changes? Better yet, just run the code
>>> with python3 and see if anything *really* doesn't work...
>>>
>>> Will do.
>>>
>>>
>>>
>>>
>>> diff --git a/scripts/osv/prof.py b/scripts/osv/prof.py
>>> index 95db15b8..465a223f 100644
>>> --- a/scripts/osv/prof.py
>>> +++ b/scripts/osv/prof.py
>>> @@ -51,7 +51,7 @@ time_units = [
>>>  ]
>>>
>>>  def parse_time_as_nanos(text, default_unit='ns'):
>>> -    for level, name in sorted(time_units, key=lambda (level, name):
>>> -len(name)):
>>> +    for level, name in sorted(time_units, key=lambda level_name:
>>> -len(level_name[1])):
>>>
>>>
>>> I don't understand the changes in this file, they don't look like simple
>>> syntax changes... Did you review them?
>>>
>>> 2to3.  I may have not tested trace.py with min_duration argument.
>>>
>>>
>>>          if text.endswith(name):
>>>              return float(text.rstrip(name)) * level
>>>      for level, name in time_units:
>>> @@ -60,7 +60,7 @@ def parse_time_as_nanos(text, default_unit='ns'):
>>>      raise Exception('Unknown unit: ' + default_unit)
>>>
>>>  def format_time(time, format="%.2f %s"):
>>> -    for level, name in sorted(time_units, key=lambda (level, name):
>>> -level):
>>> +    for level, name in sorted(time_units, key=lambda level_name1:
>>> -level_name1[0]):
>>>          if time >= level:
>>>              return format % (float(time) / level, name)
>>>      return str(time)
>>> @@ -207,10 +207,16 @@ class timed_trace_producer(object):
>>>          self.last_time = None
>>>
>>>      def __call__(self, sample):
>>> +        if not sample.time:
>>> +            return
>>> +
>>>          if not sample.cpu in self.earliest_trace_per_cpu:
>>>              self.earliest_trace_per_cpu[sample.cpu] = sample
>>>
>>> -        self.last_time = max(self.last_time, sample.time)
>>> +        if not self.last_time:
>>> +            self.last_time = sample.time
>>> +        else:
>>> +            self.last_time = max(self.last_time, sample.time)
>>>
>>>          matcher = self.matcher_by_name.get(sample.name, None)
>>>          if not matcher:
>>> @@ -239,7 +245,7 @@ class timed_trace_producer(object):
>>>              return trace.TimedTrace(entry_trace, duration)
>>>
>>>      def finish(self):
>>> -        for sample in self.open_samples.itervalues():
>>> +        for sample in self.open_samples.values():
>>>              duration = self.last_time - sample.time
>>>              yield trace.TimedTrace(sample, duration)
>>>
>>> @@ -275,7 +281,7 @@ def get_idle_profile(traces):
>>>
>>>      def trim_samples(cpu, end_time):
>>>          if cpu.idle:
>>> -            for w in cpu.waits.values():
>>> +            for w in list(cpu.waits.values()):
>>>                  begin = max(w.time, cpu.idle.time)
>>>                  yield ProfSample(begin, w.cpu, w.thread, w.backtrace,
>>> resident_time=end_time - begin)
>>>
>>> @@ -295,7 +301,7 @@ def get_idle_profile(traces):
>>>
>>>          last = t
>>>
>>> -    for cpu in cpus.values():
>>> +    for cpu in list(cpus.values()):
>>>          for s in trim_samples(cpu, t.time):
>>>              yield s
>>>
>>> @@ -402,7 +408,7 @@ def print_profile(samples, symbol_resolver,
>>> caller_oriented=False,
>>>      if not order:
>>>          order = lambda node: (-node.resident_time, -node.hit_count)
>>>
>>> -    for group, tree_root in sorted(groups.iteritems(), key=lambda
>>> (thread, node): order(node)):
>>> +    for group, tree_root in sorted(iter(groups.items()), key=lambda
>>> thread_node: order(thread_node[1])):
>>>          collapse_similar(tree_root)
>>>
>>>          if max_levels:
>>> +++ b/scripts/osv/trace.py
>>> @@ -65,7 +65,12 @@ class TimeRange(object):
>>>          return self.end - self.begin
>>>
>>>      def intersection(self, other):
>>> -        begin = max(self.begin, other.begin)
>>> +        if not self.begin:
>>> +            begin = other.begin
>>> +        elif not other.begin:
>>> +            begin = self.begin
>>> +        else:
>>> +            begin = max(self.begin, other.begin)
>>>
>>>          if self.end is None:
>>>              end = other.end
>>> @@ -143,11 +148,11 @@ class Trace:
>>>  class TimedTrace:
>>>      def __init__(self, trace, duration=None):
>>>          self.trace = trace
>>> -        self.duration = duration
>>> +        self.duration_ = duration
>>>
>>>      @property
>>>      def duration(self):
>>> -        return self.duration
>>> +        return self.duration_
>>>
>>>      @property
>>>      def time(self):
>>> @@ -183,6 +188,8 @@ def do_split_format(format_str):
>>>
>>>  _split_cache = {}
>>>  def split_format(format_str):
>>> +    if not format_str:
>>> +        return []
>>>      result = _split_cache.get(format_str, None)
>>>      if not result:
>>>          result = list(do_split_format(format_str))
>>> @@ -190,7 +197,7 @@ def split_format(format_str):
>>>      return result
>>>
>>>  formatters = {
>>> -    '*': lambda bytes: '{' + ' '.join('%02x' % ord(b) for b in bytes) +
>>> '}'
>>> +    '*': lambda bytes: '{' + ' '.join('%02x' % b for b in bytes) + '}'
>>>
>>>
>>> Hmm, doesn't ord(b) work in Python3 any more?
>>>
>>>
>>> Python 2:
>>> >>> f = lambda bytes: '{' + ' '.join('%02x' % ord(b) for b in bytes) +
>>> '}'
>>> >>> f(b'test')
>>> '{74 65 73 74}'
>>>
>>> Python 3:
>>> >>> f = lambda bytes: '{' + ' '.join('%02x' % ord(b) for b in bytes) +
>>> '}'
>>> >>> f(b'test')
>>> Traceback (most recent call last):
>>>   File "<stdin>", line 1, in <module>
>>>   File "<stdin>", line 1, in <lambda>
>>>   File "<stdin>", line 1, in <genexpr>
>>> TypeError: ord() expected string of length 1, but int found
>>>
>>> f2 = lambda bytes: '{' + ' '.join('%02x' % b for b in bytes) + '}'
>>> >>> f2(b'test')
>>> '{74 65 73 74}'
>>>
>>> The ord still works but in the end this lambda breaks.
>>>
>>>
>>>  }
>>>
>>>  def get_alignment_of(fmt):
>>> @@ -238,16 +245,15 @@ class SlidingUnpacker:
>>>                  size = struct.calcsize(fmt)
>>>                  val, = struct.unpack_from(fmt,
>>> self.buffer[self.offset:self.offset+size])
>>>                  self.offset += size
>>> -                values.append(val)
>>> +                if fmt.startswith('50p'):
>>> +                   values.append(val.decode('utf-8'))
>>> +                else:
>>> +                   values.append(val)
>>>
>>>          return tuple(values)
>>>
>>> -    def __nonzero__(self):
>>> -        return self.offset < len(self.buffer)
>>> -
>>> -    # Python3
>>>      def __bool__(self):
>>> -        return self.__nonzero__()
>>> +        return self.offset < len(self.buffer)
>>>
>>>  class WritingPacker:
>>>      def __init__(self, writer):
>>> @@ -270,7 +276,10 @@ class WritingPacker:
>>>              if fmt == '*':
>>>                  self.pack_blob(arg)
>>>              else:
>>> -                self.writer(struct.pack(fmt, arg))
>>> +                if fmt == '50p':
>>> +                    self.writer(struct.pack(fmt, arg.encode('utf-8')))
>>> +                else:
>>> +                    self.writer(struct.pack(fmt, arg))
>>>                  self.offset += struct.calcsize(fmt)
>>>
>>>      def pack_blob(self, arg):
>>> @@ -298,7 +307,7 @@ class TraceDumpReaderBase :
>>>          self.endian = '<'
>>>          self.file = open(filename, 'rb')
>>>          try:
>>> -            tag = self.file.read(4)
>>> +            tag = self.file.read(4).decode()
>>>              if tag == "OSVT":
>>>                  endian = '>'
>>>              elif tag != "TVSO":
>>> @@ -347,7 +356,7 @@ class TraceDumpReaderBase :
>>>
>>>      def readString(self):
>>>          len = self.read('H')
>>> -        return self.file.read(len)
>>> +        return self.file.read(len).decode()
>>>
>>>  class TraceDumpReader(TraceDumpReaderBase) :
>>>      def __init__(self, filename):
>>> @@ -378,7 +387,7 @@ class TraceDumpReader(TraceDumpReaderBase) :
>>>              sig = ""
>>>              for j in range(0, n_args):
>>>                  arg_name = self.readString()
>>> -                arg_sig = self.file.read(1)
>>> +                arg_sig = self.file.read(1).decode()
>>>                  if arg_sig == 'p':
>>>                      arg_sig = '50p'
>>>                  sig += arg_sig
>>> @@ -405,7 +414,7 @@ class TraceDumpReader(TraceDumpReaderBase) :
>>>
>>>              backtrace = None
>>>              if flags & 1:
>>> -                backtrace = filter(None, unpacker.unpack('Q' *
>>> self.backtrace_len))
>>> +                backtrace = [_f for _f in unpacker.unpack('Q' *
>>> self.backtrace_len) if _f]
>>>
>>>              data = unpacker.unpack(tp.signature)
>>>              unpacker.align_up(8)
>>> @@ -414,7 +423,7 @@ class TraceDumpReader(TraceDumpReaderBase) :
>>>              yield last_trace
>>>
>>>      def traces(self):
>>> -        iters = map(lambda data: self.oneTrace(data),
>>> self.trace_buffers)
>>> +        iters = [self.oneTrace(data) for data in self.trace_buffers]
>>>          return heapq.merge(*iters)
>>>
>>>
>>> @@ -523,7 +532,7 @@ def read(buffer_view):
>>>
>>>      while unpacker:
>>>          tp_key, thread_ptr, thread_name, time, cpu =
>>> unpacker.unpack('QQ16sQI')
>>> -        thread_name = thread_name.rstrip('\0')
>>> +        thread_name = thread_name.rstrip(b'\0').decode('utf-8')
>>>          tp = tracepoints[tp_key]
>>>
>>>          backtrace = []
>>> @@ -551,7 +560,7 @@ def write(traces, writer):
>>>                      trace.time, trace.cpu)
>>>
>>>          if trace.backtrace:
>>> -            for frame in filter(None, trace.backtrace):
>>> +            for frame in [_f for _f in trace.backtrace if _f]:
>>>                  packer.pack('Q', frame)
>>>          packer.pack('Q', 0)
>>>
>>> diff --git a/scripts/osv/tree.py b/scripts/osv/tree.py
>>> index 594b00e2..86345157 100644
>>> --- a/scripts/osv/tree.py
>>> +++ b/scripts/osv/tree.py
>>> @@ -18,11 +18,11 @@ class TreeNode(object):
>>>
>>>      def squash_child(self):
>>>          assert self.has_only_one_child()
>>> -        self.children_by_key =
>>> next(self.children_by_key.itervalues()).children_by_key
>>> +        self.children_by_key =
>>> next(iter(self.children_by_key.values())).children_by_key
>>>
>>>      @property
>>>      def children(self):
>>> -        return self.children_by_key.itervalues()
>>> +        return iter(self.children_by_key.values())
>>>
>>>      def has_only_one_child(self):
>>>          return len(self.children_by_key) == 1
>>> diff --git a/scripts/run.py b/scripts/run.py
>>> index 9ab8d86f..f4452345 100755
>>> --- a/scripts/run.py
>>> +++ b/scripts/run.py
>>> @@ -1,5 +1,5 @@
>>> -#!/usr/bin/env python
>>> -from __future__ import print_function
>>> +#!/usr/bin/env python3
>>> +
>>>  import subprocess
>>>  import sys
>>>  import argparse
>>> diff --git a/scripts/setup.py b/scripts/setup.py
>>> index 2a7f1c18..a58132ab 100755
>>> --- a/scripts/setup.py
>>> +++ b/scripts/setup.py
>>> @@ -1,8 +1,8 @@
>>> -#!/usr/bin/python2
>>> +#!/usr/bin/python3
>>>
>>>  # set up a development environment for OSv.  Run as root.
>>>
>>> -import sys, platform, argparse
>>> +import sys, distro, argparse
>>>  import subprocess
>>>
>>>  standard_ec2_packages = ['python-pip', 'wget']
>>> @@ -319,11 +319,11 @@ parser.add_argument("-t", "--test",
>>> action="store_true",
>>>                      help="install packages required by testing tools")
>>>  cmdargs = parser.parse_args()
>>>
>>> -(name, version, id) = platform.linux_distribution()
>>> +(name, version, id) = distro.linux_distribution()
>>>
>>>  for distro in distros:
>>>      if type(distro.name) == type([]):
>>> -        dname = filter(lambda n: name.startswith(n), distro.name)
>>> +        dname = [n for n in distro.name if name.startswith(n)]
>>>          if len(dname):
>>>              distro.name = dname[0]
>>>          else:
>>> @@ -349,5 +349,5 @@ for distro in distros:
>>>          print ('Your distribution %s version %s is not supported by
>>> this script' % (name, version))
>>>          sys.exit(1)
>>>
>>> -print 'Your distribution is not supported by this script.'
>>> +print('Your distribution is not supported by this script.')
>>>  sys.exit(2)
>>> diff --git a/scripts/test.py b/scripts/test.py
>>> index 02eb4b55..0f9c07c1 100755
>>> --- a/scripts/test.py
>>> +++ b/scripts/test.py
>>> @@ -1,4 +1,4 @@
>>> -#!/usr/bin/env python
>>> +#!/usr/bin/env python3
>>>  import atexit
>>>  import subprocess
>>>  import argparse
>>> @@ -81,7 +81,7 @@ def is_not_skipped(test):
>>>      return test.name not in blacklist
>>>
>>>  def run_tests_in_single_instance():
>>> -    run(filter(lambda test: not isinstance(test, TestRunnerTest),
>>> tests))
>>> +    run([test for test in tests if not isinstance(test,
>>> TestRunnerTest)])
>>>
>>>      blacklist_tests = ' '.join(blacklist)
>>>      args = run_py_args + ["-s", "-e", "/testrunner.so -b %s" %
>>> (blacklist_tests)]
>>> @@ -103,7 +103,7 @@ def pluralize(word, count):
>>>
>>>  def make_export_and_conf():
>>>      export_dir = tempfile.mkdtemp(prefix='share')
>>> -    os.chmod(export_dir, 0777)
>>> +    os.chmod(export_dir, 0o777)
>>>      (conf_fd, conf_path) = tempfile.mkstemp(prefix='export')
>>>      conf = os.fdopen(conf_fd, "w")
>>>      conf.write("%s 127.0.0.1(insecure,rw)\n" % export_dir)
>>> @@ -155,12 +155,12 @@ def run_tests():
>>>              "/tst-nfs.so --server 192.168.122.1 --share %s" %
>>>              export_dir) ]
>>>
>>> -        line = proc.stdout.readline()
>>> +        line = proc.stdout.readline().decode()
>>>          while line:
>>>               print(line)
>>>               if "/tmp" in line:
>>>                  break
>>> -             line = proc.stdout.readline()
>>> +             line = proc.stdout.readline().decode()
>>>
>>>
>>>          run(tests_to_run)
>>> diff --git a/scripts/tests/test_app.py b/scripts/tests/test_app.py
>>> index 27112ff7..1738e755 100755
>>> --- a/scripts/tests/test_app.py
>>> +++ b/scripts/tests/test_app.py
>>> @@ -1,7 +1,6 @@
>>> -#!/usr/bin/python
>>> +#!/usr/bin/python3
>>>  from testing import *
>>>  import argparse
>>> -import subprocess
>>>  from time import sleep
>>>
>>>  def run(command, hypervisor_name, image_path=None, line=None,
>>> guest_port=None, host_port=None, input_lines=[], kill_app=False):
>>> diff --git a/scripts/tests/test_app_with_test_script.py
>>> b/scripts/tests/test_app_with_test_script.py
>>> index 2ab1b731..8a2c7295 100755
>>> --- a/scripts/tests/test_app_with_test_script.py
>>> +++ b/scripts/tests/test_app_with_test_script.py
>>> @@ -1,4 +1,4 @@
>>> -#!/usr/bin/python
>>> +#!/usr/bin/python3
>>>  from testing import *
>>>  import argparse
>>>  import runpy
>>> diff --git a/scripts/tests/test_http_app_with_curl_and_ab.py
>>> b/scripts/tests/test_http_app_with_curl_and_ab.py
>>> index 067fcc83..9d613d01 100755
>>> --- a/scripts/tests/test_http_app_with_curl_and_ab.py
>>> +++ b/scripts/tests/test_http_app_with_curl_and_ab.py
>>> @@ -1,11 +1,11 @@
>>> -#!/usr/bin/python
>>> +#!/usr/bin/python3
>>>  from testing import *
>>>  import argparse
>>>  import subprocess
>>>  from time import sleep
>>>
>>>  def check_with_curl(url, expected_http_line):
>>> -    output = subprocess.check_output(["curl", "-s", url])
>>> +    output = subprocess.check_output(["curl", "-s",
>>> url]).decode('utf-8')
>>>      print(output)
>>>      if expected_http_line not in output:
>>>         print("FAILED curl: wrong output")
>>> @@ -39,9 +39,9 @@ def run(command, hypervisor_name, host_port,
>>> guest_port, http_path, expected_htt
>>>          check_with_curl(app_url, expected_http_line)
>>>
>>>      if no_keep_alive:
>>> -        output = subprocess.check_output(["ab", "-l", "-c",
>>> str(concurrency), "-n", str(count), app_url]).split('\n')
>>> +        output = subprocess.check_output(["ab", "-l", "-c",
>>> str(concurrency), "-n", str(count), app_url]).decode('utf-8').split('\n')
>>>      else:
>>> -        output = subprocess.check_output(["ab", "-l", "-k", "-c",
>>> str(concurrency), "-n", str(count), app_url]).split('\n')
>>> +        output = subprocess.check_output(["ab", "-l", "-k", "-c",
>>> str(concurrency), "-n", str(count), app_url]).decode('utf-8').split('\n')
>>>
>>>      failed_requests = 1
>>>      complete_requests = 0
>>> @@ -74,11 +74,11 @@ def run(command, hypervisor_name, host_port,
>>> guest_port, http_path, expected_htt
>>>              success = False
>>>
>>>      if failed_requests > 0:
>>> -        print("FAILED ab - encountered failed requests: %d" %
>>> failed_requests)
>>> +        print("FAILED ab - encountered failed requests: %d" %
>>> failed_requests)
>>>          success = False
>>>
>>>      if complete_requests < count:
>>> -        print("FAILED ab - too few complete requests : %d ? %d" %
>>> (complete_requests, count))
>>> +        print("FAILED ab - too few complete requests : %d ? %d" %
>>> (complete_requests, count))
>>>          success = False
>>>
>>>      if success:
>>> diff --git a/scripts/tests/testing.py b/scripts/tests/testing.py
>>> index c5249753..c3aaf218 100644
>>> --- a/scripts/tests/testing.py
>>> +++ b/scripts/tests/testing.py
>>> @@ -133,7 +133,7 @@ class SupervisedProcess:
>>>              self.cv.release()
>>>
>>>          line = ''
>>> -        ch_bytes = ''
>>> +        ch_bytes = bytes()
>>>          while True:
>>>              ch_bytes = ch_bytes + self.process.stdout.read(1)
>>>              try:
>>> @@ -144,7 +144,7 @@ class SupervisedProcess:
>>>                  if ch == '\n':
>>>                      append_line(line)
>>>                      line = ''
>>> -                ch_bytes = ''
>>> +                ch_bytes = bytes()
>>>              except UnicodeError:
>>>                  continue
>>>
>>> diff --git a/scripts/trace.py b/scripts/trace.py
>>> index 34cfb2ab..71dc47d5 100755
>>> --- a/scripts/trace.py
>>> +++ b/scripts/trace.py
>>> @@ -1,4 +1,4 @@
>>> -#!/usr/bin/env python2
>>> +#!/usr/bin/env python3
>>>  import sys
>>>  import errno
>>>  import argparse
>>> @@ -13,6 +13,7 @@ from collections import defaultdict
>>>  from osv import trace, debug, prof
>>>  from osv.client import Client
>>>  import memory_analyzer
>>> +from functools import reduce
>>>
>>>  class InvalidArgumentsException(Exception):
>>>      def __init__(self, message):
>>> @@ -114,7 +115,7 @@ def list_trace(args):
>>>      with get_trace_reader(args) as reader:
>>>          for t in reader.get_traces():
>>>              if t.time in time_range:
>>> -                print t.format(backtrace_formatter,
>>> data_formatter=data_formatter)
>>> +                print(t.format(backtrace_formatter,
>>> data_formatter=data_formatter))
>>>
>>>  def mem_analys(args):
>>>      mallocs = {}
>>> @@ -164,7 +165,7 @@ def add_profile_options(parser):
>>>      add_time_slicing_options(parser)
>>>      group = parser.add_argument_group('profile options')
>>>      group.add_argument("-r", "--caller-oriented", action='store_true',
>>> help="change orientation to caller-based; reverses order of frames")
>>> -    group.add_argument("-g", "--group-by", choices=groupers.keys(),
>>> default='none', help="group samples by given criteria")
>>> +    group.add_argument("-g", "--group-by",
>>> choices=list(groupers.keys()), default='none', help="group samples by given
>>> criteria")
>>>      group.add_argument("--function", action='store', help="use given
>>> function as tree root")
>>>      group.add_argument("--min-duration", action='store', help="show
>>> only nodes with resident time not shorter than this, eg: 200ms")
>>>      add_backtrace_options(group)
>>> @@ -236,7 +237,7 @@ def show_profile(args, sample_producer):
>>>
>>>      def node_filter(*args):
>>>          for filter in node_filters:
>>> -            if not filter(*args):
>>> +            if not list(filter(*args)):
>>>                  return False
>>>          return True
>>>
>>> @@ -276,7 +277,7 @@ def extract(args):
>>>              stderr=subprocess.STDOUT)
>>>          _stdout, _ = proc.communicate()
>>>          if proc.returncode or not os.path.exists(args.tracefile):
>>> -            print(_stdout)
>>> +            print(_stdout.decode())
>>>              sys.exit(1)
>>>      else:
>>>          print("error: %s not found" % (elf_path))
>>> @@ -332,8 +333,10 @@ def write_sample_to_pcap(sample, pcap_writer):
>>>          }
>>>
>>>          pkt = dpkt.ethernet.Ethernet()
>>> -        pkt.data = sample.data[1]
>>>          pkt.type = eth_types[proto]
>>> +        pkt.src = b''
>>> +        pkt.dst = b''
>>> +        pkt.data = sample.data[1]
>>>          pcap_writer.writepkt(pkt, ts=ts)
>>>
>>>  def format_packet_sample(sample):
>>> @@ -343,7 +346,7 @@ def format_packet_sample(sample):
>>>      pcap = dpkt.pcap.Writer(proc.stdin)
>>>      write_sample_to_pcap(sample, pcap)
>>>      pcap.close()
>>> -    assert(proc.stdout.readline() == "reading from file -, link-type
>>> EN10MB (Ethernet)\n")
>>> +    assert(proc.stdout.readline().decode() == "reading from file -,
>>> link-type EN10MB (Ethernet)\n")
>>>      packet_line = proc.stdout.readline().rstrip()
>>>      proc.wait()
>>>      return packet_line
>>> @@ -361,7 +364,7 @@ def pcap_dump(args, target=None):
>>>      needs_dpkt()
>>>
>>>      if not target:
>>> -        target = sys.stdout
>>> +        target = sys.stdout.buffer
>>>
>>>      pcap_file = dpkt.pcap.Writer(target)
>>>      try:
>>> @@ -439,7 +442,10 @@ def print_summary(args, printer=sys.stdout.write):
>>>                  else:
>>>                      min_time = min(min_time, t.time)
>>>
>>> -                max_time = max(max_time, t.time)
>>> +                if not max_time:
>>> +                    max_time = t.time
>>> +                else:
>>> +                    max_time = max(max_time, t.time)
>>>
>>>              if args.timed:
>>>                  timed = timed_producer(t)
>>> @@ -450,42 +456,42 @@ def print_summary(args, printer=sys.stdout.write):
>>>          timed_samples.extend((timed_producer.finish()))
>>>
>>>      if count == 0:
>>> -        print "No samples"
>>> +        print("No samples")
>>>          return
>>>
>>> -    print "Collected %d samples spanning %s" % (count,
>>> prof.format_time(max_time - min_time))
>>> +    print("Collected %d samples spanning %s" % (count,
>>> prof.format_time(max_time - min_time)))
>>>
>>> -    print "\nTime ranges:\n"
>>> -    for cpu, r in sorted(cpu_time_ranges.items(), key=lambda (c, r):
>>> r.min):
>>> -        print "  CPU 0x%02d: %s - %s = %10s" % (cpu,
>>> +    print("\nTime ranges:\n")
>>> +    for cpu, r in sorted(list(cpu_time_ranges.items()), key=lambda c_r:
>>> c_r[1].min):
>>> +        print("  CPU 0x%02d: %s - %s = %10s" % (cpu,
>>>              trace.format_time(r.min),
>>>              trace.format_time(r.max),
>>> -            prof.format_time(r.max - r.min))
>>> +            prof.format_time(r.max - r.min)))
>>>
>>> -    max_name_len = reduce(max, map(lambda tp: len(tp.name),
>>> count_per_tp.iterkeys()))
>>> +    max_name_len = reduce(max, [len(tp.name) for tp in
>>> iter(count_per_tp.keys())])
>>>      format = "  %%-%ds %%8s" % (max_name_len)
>>> -    print "\nTracepoint statistics:\n"
>>> -    print format % ("name", "count")
>>> -    print format % ("----", "-----")
>>> +    print("\nTracepoint statistics:\n")
>>> +    print(format % ("name", "count"))
>>> +    print(format % ("----", "-----"))
>>>
>>> -    for tp, count in sorted(count_per_tp.iteritems(), key=lambda (tp,
>>> count): tp.name):
>>> -        print format % (tp.name, count)
>>> +    for tp, count in sorted(iter(count_per_tp.items()), key=lambda
>>> tp_count: tp_count[0].name):
>>> +        print(format % (tp.name, count))
>>>
>>>      if args.timed:
>>>          format = "  %-20s %8s %8s %8s %8s %8s %8s %8s %15s"
>>> -        print "\nTimed tracepoints [ms]:\n"
>>> +        print("\nTimed tracepoints [ms]:\n")
>>>
>>> -        timed_samples = filter(lambda t:
>>> t.time_range.intersection(time_range), timed_samples)
>>> +        timed_samples = [t for t in timed_samples if
>>> t.time_range.intersection(time_range)]
>>>
>>>          if not timed_samples:
>>> -            print "  None"
>>> +            print("  None")
>>>          else:
>>> -            print format % ("name", "count", "min", "50%", "90%",
>>> "99%", "99.9%", "max", "total")
>>> -            print format % ("----", "-----", "---", "---", "---",
>>> "---", "-----", "---", "-----")
>>> +            print(format % ("name", "count", "min", "50%", "90%",
>>> "99%", "99.9%", "max", "total"))
>>> +            print(format % ("----", "-----", "---", "---", "---",
>>> "---", "-----", "---", "-----"))
>>>
>>> -            for name, traces in
>>> get_timed_traces_per_function(timed_samples).iteritems():
>>> +            for name, traces in
>>> get_timed_traces_per_function(timed_samples).items():
>>>                  samples =
>>> sorted(list((t.time_range.intersection(time_range).length() for t in
>>> traces)))
>>> -                print format % (
>>> +                print(format % (
>>>                      name,
>>>                      len(samples),
>>>                      format_duration(get_percentile(samples, 0)),
>>> @@ -494,9 +500,9 @@ def print_summary(args, printer=sys.stdout.write):
>>>                      format_duration(get_percentile(samples, 0.99)),
>>>                      format_duration(get_percentile(samples, 0.999)),
>>>                      format_duration(get_percentile(samples, 1)),
>>> -                    format_duration(sum(samples)))
>>> +                    format_duration(sum(samples))))
>>>
>>> -    print
>>> +    print()
>>>
>>>  def list_cpu_load(args):
>>>      load_per_cpu = {}
>>> @@ -550,7 +556,7 @@ def list_timed(args):
>>>
>>> ...
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "OSv Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/osv-dev/b877b770-65d7-4d19-a7e3-71f520083bf8%40googlegroups.com
> <https://groups.google.com/d/msgid/osv-dev/b877b770-65d7-4d19-a7e3-71f520083bf8%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/CANEVyjufKxoJYKCCsur02CvRrPTqCFaCxEmPrv5FEeRnrB0d7w%40mail.gmail.com.

Re: [osv-dev] [PATCH] scripts: upgrade to python 3 - part 1

Reply via email to