Rumour has it this thread got warnocked... ;-) My original task from leo was to sort out the PASM and PIR debug segment to handle multiple files. I thought I might try and sort out the HLL debug seg while I was on the job.
From Roger's input and further discussion on IRC, it seems that we need
something more clever for the HLL debug seg than the PASM/PIR one. So, I'll back off trying to deal with HLL debug for now (provided my supply of time goes on, I'll try and come back to that in the not too distant future) and implement something much like I spec'd for PASM and PIR, which only needs a simple debug segment with file and line number.

"Roger Browne" <[EMAIL PROTECTED]> wrote:
FORMAT PROPOSAL...

Great! Anything that brings parrot closer to being able to report the
HLL filename and line numbers is a good thing!

Seems there will be a slighlty longer wait on this one now, but this is very much needed, I agree.

SOURCE SEGMENTS
... the idea would seem to be
that this segment can contain source code.  I suspect the intention of it
was to store the source code of high level languages rather than PASM or
PIR.

I don't think Parrot should care about what languages are in the source
segments. If someone is writing directly in PASM or PIR, that can go in
a source segment. If someone is writing in a high-level langauge, that
can go in a source segment. If someone is writing data from which HLL
code is generated by some utility (e.g. yacc, a UML tool, or a GUI
designer), that data can go in a source segment too.

Any kind of source code for which there exists some kind of debugging
tool is a candidate to go into a source segment. This implies that there
could be more than one source segment per .pbc file, and more than one
source location for each opcode. It also implies that (eventually)
parrot will have a way of knowing how to call all the candidate
debuggers for a particular bytecode location (according to which source
language the programmer wants to debug in).

[Incidentally, source segments may also meet the needs of those who wish
to distribute source with every application, without burdening those who
just want to run the compiled code.]

Pretty much agree with this.

...
2) Allowing for a reference into the source segment in place of a filename.

Some development tools are still going to want the filename, even if
there is a corresponding source segment in the .pbc file. I think it
should be possible to include both.

I was thinking of putting the filename in the source segment, so you could iterate over the source segments and get the filenames of the source files. So the filenames would be there.

COMPATIBILITY
This change is incompatible with the current debug segment format.  But
that's OK, we're still in development.

Sure, but if we're going to change it, let's change it to something
general that won't need to be changed again after version 1.0 is
released.

This is the argument that makes me think we hold off the HLL debug seg for a little while, until somebody (maybe myself) can come up with a design that meets the needs of HLLs better.

This is something that Dan Sugalski mooted in his "WCB: Full bytecode
metadata" blog entry:
http://www.sidhe.org/~dan/blog/archives/000419.html

I like the idea that each HLL can store whatever kind of metadata it
wants. In particular, I'd like to have my Amber compiler put column
numbers as well as line numbers into the .pbc file, and perhaps even
information about which optimizations it has applied.

Yeah, though we also have to consider how Parrot will know what metadata to show when an error occurs. I guess we need something per language that gets called along with a reference to the appropriate chunk of meta-data for the current location and knows how to render an error message for that language. Then just have a default way to dump the data when this is not supplied. Also need some thought with regard to how we can efficiently store such metadata in a packfile.

3) Still being space-efficient on disk

Source segments should probably be compressed. There's a lot of
repetition and whitespace in most source languages, so they tend to
compress really well. Any reference into the source would be an offset
into the uncompressed source (which would only need to be uncompressed
during debugging runs).

Hadn't thought of this...may be a good idea provided we can find a cheap to implement and free of legal issues compression algorithm. I'll admit now to not knowing a great deal about this kinda stuff.

The opcode stream will contain one line number per
bytecode instruction.

You are proposing to use a chain of mappings to record the filename; why
not use the same system for recording all kinds of metadata including
line numbers? Sure, there's a small performance penalty - only during
debugging runs - but there's a worthwhile space saving on disk (because
typical HLLs produce a lot of bytecodes per line of source).

HLLs do, but for PASM/PIR that isn't the case. Thus another reason to do something different for each.

Thanks,

Jonathan

Reply via email to