Re: tree frobbing facilities in Perl6?

Michael G Schwern Tue, 24 Dec 2002 01:30:40 -0800

I'm going to take a left turn in replying and say that your approach to the
problem is causing the problem.  This is diverging from the question of tree
manipulation, but I don't think that's what you really need.

Anyhow, on with the show...

On Tue, Dec 24, 2002 at 12:02:09AM -0800, Rich Morin wrote:
> Let's say that I've got a daemon which is running ps(1) on a regular
> basis and logging the results.  A brute force approach would be to
> save the raw ASCII output, but these days I'm trying to use XML.  So,
> I write out the output as (informal) XML:
> 
>   <log>
>     <ps time=123456789>
>       <process>
>         <pid>123</>
>         <pcpu>4.6</>
>         <stat>SN+</>
>         ...
>       </process>
>     </ps>
>     ...
>   </log>

So with simple data like this, I'd just use YAML.  This isn't really
important, just a YAML plug. :)  But it does have a better resulting data
structure as we'll see below.

      - time: 123456789
        processes:
          - pid:  123
            pcpu: 4.6
            stat: SN+
          - pid:  234
            pcpu: 2.3
            stat: R
      - time: 234567890
        processes:
          - pid:  123
            pcpu: 2.4
            stat: R
          - pid:  456
            pcpu: 3.4
            stat: SN

(I've eliminated the redundant "log" and "ps" parts)

> A bit bulky, bit nicely tagged and serialized.  Now, I want to do
> something with it.  OK, the first thing I do is read it in as a tree.
> I use my own SAX handler, because I want a pure Perl way to load in
> a tree, preserving order.  It loads in something like this:
> 
>   [ 'log', {},
>     [ 'ps', { time => 123456789 },
>       [ 'process', {},
>         [ 'pid',  {}, '123' ],
>         [ 'pcpu', {}, '4.6' ],
>         [ 'stat', {}, 'SN+' ],
>         ...
>       ],
>     ],
>     ...
>   ]
> 
> The problem is that, although the data structure I've loaded in is a
> tree, I generally want to use it as something else.

And there's your problem.  The data struture you've created above is not
really a comfortable one in Perl.  You're trying to create a Tree-like
structure using array references as nodes.  This is awkward.  Instead, use
hashes.  Here's how YAML dumps the structure:

my @ps_snapshots = [
  {
    'processes' => [
      {
        'stat' => 'SN+',
        'pcpu' => '4.6',
        'pid' => '123'
      },
      {
        'stat' => 'R',
        'pcpu' => '2.3',
        'pid' => '234'
      }
    ],
    'time' => '123456789'
  },
  {
    'processes' => [
      {
        'stat' => 'R',
        'pcpu' => '2.4',
        'pid' => '123'
      },
      {
        'stat' => 'SN',
        'pcpu' => '3.4',
        'pid' => '456'
      }
    ],
    'time' => '234567890'
  }
]

Since YAML itself is made up of hashes and arrays, it maps very well into
Perl.  The XML tree structure comes off awkward because Perl has no native
tree handling.

At this point you've got a fairly straightforward hash of list style
structure rather than the oddly put together set of array refs as tree
nodes.

> For example, let's
> say that I want to "boil down" these log files a bit.  This means I
> have to pick up the static values (e.g., pid), tally the distribution
> of the flag values (e.g., stat), and average the numeric snapshots, as:
> 
>   foreach $time (sort(keys(%ps))) {
>     $pid  =  $ps{$time}{pid} unless defined ($pid);
>     $pcpu += $ps{$time}{pcpu};
>     $stat{$ps{$time}{stat}}++;
>     ...
>   }

I'm not sure I follow the code above, but I'll do something similar.  I'll
tally up all the flag values.

    for @ps_snapshots -> $snap {
        for @$snap{processes} -> $process {
            %stats{$proc{stat}}++;
        }
    }

> My approach to this, currently, is to walk the tree, creating the data
> structure I'd _like_ to have, before I try to do the actual work.  This
> isn't TOO painful, but it isn't the sort of DWIMitude I'd like to see.

Basically, we're just manipulating a straight-forward list of hashes of
lists.  The already naturally formatted structure by YAML avoids the
necessity to create the intermediate structure.  Despite my use of Perl 6,
you can do the same in Perl 5.

That sort of look I've written above can probably better be done using
hyper-operators, but I'll let someone else take a stab at that.  I'm also
not sure what the slicing syntax is, so I made something up.

> More to the point, let's say that I simply want to transform the data
> into a different order.  In a multiply subscripted array, this is just
> a matter of swapping subscripts on the output loop(s).  Turning the tree
> above into something like:
> 
>   <process pid="123">
>     <time>123456789,...</>
>     <pcpu>4.6,...</>
>     <stat>SN+,...</>
>   </process>

Sort of an odd structure, but ok.  Here's how I'd flip around the YAML
structure (again with the caveat about hyperoperators).

    for @ps_shapshots -> $snapshot {
        my $time = $snapshot{time};

        for @$snapshot{processes} -> $proc {
            my $pid = $proc{pid};
            push @%procs{$pid}{time}, $time;

            for qw(stat pcpu pid) -> $key {
                push @%procs{$pid}{$key}, $proc{$key};
            }
        }
    }

    YAML::Dump(%procs);

This would produce something like:

    123:
        time: [123456789, 234567890]
        pcpu: [4.6, 2.4]
        stat: [SN+, R]
    234:
        time: [123456789]
        pcpu: [2.3]
        stat: [R]
    456:
        time: [234567890]
        pcpu: [3.4]
        stat: [SN]

> is not something I want to try in XSLT.  I can do it in Perl, of course,
> but I end up writing a lot of code.  Am I missing something?  

I think your external format (XML which is a tree) is not mapping well to
your internal format (Perl which uses hashes,arrays and scalars) causing you
to have to shuffle your awkward XML->tree structure into something more
Perlish.  By picking an external format, YAML, which maps better to your
internal format you can avoid the intermediate step.

Alternatively, I'm sure you can rewrite your XML parser to produce a
structure similar to that which YAML produces.  The point being to pull in
your data in a way which better fits Perl.

> And, to bring the posting back on topic, will Perl6 bring anything 
> new to the campfire?

Hyperoperators will help.  A simplified slicing syntax, especially when
dealing with references, will help.  A simplified reference syntax helps,
too.

And, of course, Perl 6 will hopefully ship with a YAML parser. ;)

-- 

Michael G. Schwern   <[EMAIL PROTECTED]>    http://www.pobox.com/~schwern/
Perl Quality Assurance      <[EMAIL PROTECTED]>         Kwalitee Is Job One
My enormous capacity for love is being WASTED on YOU guys
        -- http://www.angryflower.com/497day.gif

Re: tree frobbing facilities in Perl6?

Reply via email to