I'm going to take a left turn in replying and say that your approach to the problem is causing the problem. This is diverging from the question of tree manipulation, but I don't think that's what you really need.
Anyhow, on with the show... On Tue, Dec 24, 2002 at 12:02:09AM -0800, Rich Morin wrote: > Let's say that I've got a daemon which is running ps(1) on a regular > basis and logging the results. A brute force approach would be to > save the raw ASCII output, but these days I'm trying to use XML. So, > I write out the output as (informal) XML: > > <log> > <ps time=123456789> > <process> > <pid>123</> > <pcpu>4.6</> > <stat>SN+</> > ... > </process> > </ps> > ... > </log> So with simple data like this, I'd just use YAML. This isn't really important, just a YAML plug. :) But it does have a better resulting data structure as we'll see below. - time: 123456789 processes: - pid: 123 pcpu: 4.6 stat: SN+ - pid: 234 pcpu: 2.3 stat: R - time: 234567890 processes: - pid: 123 pcpu: 2.4 stat: R - pid: 456 pcpu: 3.4 stat: SN (I've eliminated the redundant "log" and "ps" parts) > A bit bulky, bit nicely tagged and serialized. Now, I want to do > something with it. OK, the first thing I do is read it in as a tree. > I use my own SAX handler, because I want a pure Perl way to load in > a tree, preserving order. It loads in something like this: > > [ 'log', {}, > [ 'ps', { time => 123456789 }, > [ 'process', {}, > [ 'pid', {}, '123' ], > [ 'pcpu', {}, '4.6' ], > [ 'stat', {}, 'SN+' ], > ... > ], > ], > ... > ] > > The problem is that, although the data structure I've loaded in is a > tree, I generally want to use it as something else. And there's your problem. The data struture you've created above is not really a comfortable one in Perl. You're trying to create a Tree-like structure using array references as nodes. This is awkward. Instead, use hashes. Here's how YAML dumps the structure: my @ps_snapshots = [ { 'processes' => [ { 'stat' => 'SN+', 'pcpu' => '4.6', 'pid' => '123' }, { 'stat' => 'R', 'pcpu' => '2.3', 'pid' => '234' } ], 'time' => '123456789' }, { 'processes' => [ { 'stat' => 'R', 'pcpu' => '2.4', 'pid' => '123' }, { 'stat' => 'SN', 'pcpu' => '3.4', 'pid' => '456' } ], 'time' => '234567890' } ] Since YAML itself is made up of hashes and arrays, it maps very well into Perl. The XML tree structure comes off awkward because Perl has no native tree handling. At this point you've got a fairly straightforward hash of list style structure rather than the oddly put together set of array refs as tree nodes. > For example, let's > say that I want to "boil down" these log files a bit. This means I > have to pick up the static values (e.g., pid), tally the distribution > of the flag values (e.g., stat), and average the numeric snapshots, as: > > foreach $time (sort(keys(%ps))) { > $pid = $ps{$time}{pid} unless defined ($pid); > $pcpu += $ps{$time}{pcpu}; > $stat{$ps{$time}{stat}}++; > ... > } I'm not sure I follow the code above, but I'll do something similar. I'll tally up all the flag values. for @ps_snapshots -> $snap { for @$snap{processes} -> $process { %stats{$proc{stat}}++; } } > My approach to this, currently, is to walk the tree, creating the data > structure I'd _like_ to have, before I try to do the actual work. This > isn't TOO painful, but it isn't the sort of DWIMitude I'd like to see. Basically, we're just manipulating a straight-forward list of hashes of lists. The already naturally formatted structure by YAML avoids the necessity to create the intermediate structure. Despite my use of Perl 6, you can do the same in Perl 5. That sort of look I've written above can probably better be done using hyper-operators, but I'll let someone else take a stab at that. I'm also not sure what the slicing syntax is, so I made something up. > More to the point, let's say that I simply want to transform the data > into a different order. In a multiply subscripted array, this is just > a matter of swapping subscripts on the output loop(s). Turning the tree > above into something like: > > <process pid="123"> > <time>123456789,...</> > <pcpu>4.6,...</> > <stat>SN+,...</> > </process> Sort of an odd structure, but ok. Here's how I'd flip around the YAML structure (again with the caveat about hyperoperators). for @ps_shapshots -> $snapshot { my $time = $snapshot{time}; for @$snapshot{processes} -> $proc { my $pid = $proc{pid}; push @%procs{$pid}{time}, $time; for qw(stat pcpu pid) -> $key { push @%procs{$pid}{$key}, $proc{$key}; } } } YAML::Dump(%procs); This would produce something like: 123: time: [123456789, 234567890] pcpu: [4.6, 2.4] stat: [SN+, R] 234: time: [123456789] pcpu: [2.3] stat: [R] 456: time: [234567890] pcpu: [3.4] stat: [SN] > is not something I want to try in XSLT. I can do it in Perl, of course, > but I end up writing a lot of code. Am I missing something? I think your external format (XML which is a tree) is not mapping well to your internal format (Perl which uses hashes,arrays and scalars) causing you to have to shuffle your awkward XML->tree structure into something more Perlish. By picking an external format, YAML, which maps better to your internal format you can avoid the intermediate step. Alternatively, I'm sure you can rewrite your XML parser to produce a structure similar to that which YAML produces. The point being to pull in your data in a way which better fits Perl. > And, to bring the posting back on topic, will Perl6 bring anything > new to the campfire? Hyperoperators will help. A simplified slicing syntax, especially when dealing with references, will help. A simplified reference syntax helps, too. And, of course, Perl 6 will hopefully ship with a YAML parser. ;) -- Michael G. Schwern <[EMAIL PROTECTED]> http://www.pobox.com/~schwern/ Perl Quality Assurance <[EMAIL PROTECTED]> Kwalitee Is Job One My enormous capacity for love is being WASTED on YOU guys -- http://www.angryflower.com/497day.gif