Re: [Haskell-cafe] More Language.C work for Google's Summer of Code

2010-03-31 Thread Malcolm Wallace

Malcolm would have to attest to how complete it is w.r.t. say, gcc's
preprocessor,


cpphs is intended to be as faithful to the CPP standard as possible,  
whilst still retaining the extra flexibility we want in a non-C  
environment, e.g. retaining the operator symbols //, /*, and */.  If  
the behaviour of cpphs does not match gcc -E, then it is either a bug  
(please report it) or an intentional feature.


Real CPP is rather horribly defined as a lexical analyser for C, so  
has a builtin notion of identifier, operator, etc, which is not so  
useful for all the other settings in which we just want to use  
conditional inclusion or macros.  Also, CPP fully intermingles  
conditionals, file inclusion, and macro expansion, whereas cpphs makes  
a strenuous effort to separate those things into logical phases: first  
the conditionals and inclusions, then macro expansion.  This  
separation makes it possible to run only one or other of the phases,  
which can occasionally be useful.


 One concern is that Language.C is BSD-licensed (and it would be  
nice to keep it that way), and cpphs is LGPL. However, if cpphs  
remained a separate program, producing C + extra stuff as output, and  
the Language.C parser understood the extra stuff, this could  
accomplish what I'm interested in.


As for licensing, yes, cpphs as a standalone binary, is GPL.  The  
library version is LGPL.  One misconception is that a BSD-licensed  
library cannot use an LGPL'd library - of course it can.  You just  
need to ensure that everyone can update the LGPL'd part if they wish.   
And as I always state for all of my tools, if the licence is a problem  
for any user, contact me to negotiate terms.  I'm perfectly willing to  
allow commercial distribution with exemption from some of the GPL  
obligations.  (And I note in passing that other alternatives like gcc  
are also GPL'd.)


Regards,
Malcolm
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] More Language.C work for Google's Summer of Code

2010-03-30 Thread Serguey Zefirov
I tried to devise a C preprocessor, but then I figured out that I
could write something like that:
---
#define A(arg) A_start (arg) A_end

#define A_start this is A_start definition.
#define A_end this is A_end definition.

A (
#undef A_start
#define A_start A_end
)
---

gcc preprocesses it into the following:
---
# 1 a.c
# 1 built-in
# 1 command-line
# 1 a.c





this is A_end definition. () this is A_end definition.
---

Another woes are filenames in angle brackets for #include. They
require special case for tokenizer.

So I given it (fully compliant C preprocessor) up. ;)

Other than that, C preprocessor looks simple.

I hardly qualify as a student, though.

2010/3/30 Aaron Tomb at...@galois.com:
 The first is to integrate preprocessing into the library. Currently, the
 library calls out to GCC to preprocess source files before parsing them.
 This has some unfortunate consequences, however, because comments and macro
 information are lost. A number of program analyses could benefit from
 metadata encoded in comments, because C doesn't have any sort of formal
 annotation mechanism, but in the current state we have to resort to ugly
 hacks (at best) to get at the contents of comments. Also, effective
 diagnostic messages need to be closely tied to original source code. In the
 presence of pre-processed macros, column number information is unreliable,
 so it can be difficult to describe to a user exactly what portion of a
 program a particular analysis refers to. An integrated preprocessor could
 retain comments and remember information about macros, eliminating both of
 these problems.

 The second possible project is to create a nicer interface for traversals
 over Language.C ASTs. Currently, the symbol table is built to include only
 information about global declarations and those other declarations currently
 in scope. Therefore, when performing multiple traversals over an AST, each
 traversal must re-analyze all global declarations and the entire AST of the
 function of interest. A better solution might be to build a traversal that
 creates a single symbol table describing all declarations in a translation
 unit (including function- and block-scoped variables), for easy reference
 during further traversals. It may also be valuable to have this traversal
 produce a slightly-simplified AST in the process. I'm not thinking of
 anything as radical as the simplifications performed by something like CIL,
 however. It might simply be enough to transform variable references into a
 form suitable for easy lookup in a complete symbol table like I've just
 described. Other simple transformations such as making all implicit casts
 explicit, or normalizing compound initializers, could also be good.

 A third possibility, which would probably depend on the integrated
 preprocessor, would be to create an exact pretty-printer. That is, a
 pretty-printing function such that pretty . parse is the identity.
 Currently, parse . pretty should be the identity, but it's not true the
 other way around. An exact pretty-printer would be very useful in creating
 rich presentations of C source code --- think LXR on steroids.

 If you're interested in any combination of these, or anything similar, let
 me know. The deadline is approaching quickly, but I'd be happy to work
 together with a student to flesh any of these out into a full proposal.

 Thanks,
 Aaron

 --
 Aaron Tomb
 Galois, Inc. (http://www.galois.com)
 at...@galois.com
 Phone: (503) 808-7206
 Fax: (503) 350-0833

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] More Language.C work for Google's Summer of Code

2010-03-30 Thread Stephen Tetley
On 30 March 2010 18:55, Serguey Zefirov sergu...@gmail.com wrote:


 Other than that, C preprocessor looks simple.



Ah no - apparently anything but simple.

You might want to see Jean-Marie Favre's (very readable, amusing)
papers on subject. Much of the behaviour of CPP is not defined and
often inaccurately described, certainly it wouldn't appear to make an
ideal one summer, student project.


http://megaplanet.org/jean-marie-favre/papers/CPPDenotationalSemantics.pdf

There are some others as well from his home page.

Best wishes

Stephen
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] More Language.C work for Google's Summer of Code

2010-03-30 Thread austin seipp
(sorry for the dupe aaron! forgot to add haskell-cafe to senders list!)

Perhaps the best course of action would be to try and extend cpphs to
do things like this? From the looks of the interface, it can already
do some of these things e.g. do not strip comments from a file:

http://hackage.haskell.org/packages/archive/cpphs/1.11/doc/html/Language-Preprocessor-Cpphs.html#t%3ABoolOptions

Malcolm would have to attest to how complete it is w.r.t. say, gcc's
preprocessor, but if this were to be a SOC project, extending cpphs to
include needed functionality would probably be much more realistic
than writing a new one.

On Tue, Mar 30, 2010 at 12:30 PM, Aaron Tomb at...@galois.com wrote:
 Hello,

 I'm wondering whether there's anyone on the list with an interest in doing
 additional work on the Language.C library for the Summer of Code. There are
 a few enhancements that I'd be very interested seeing, and I'd love be a
 mentor for such a project if there's a student interested in working on
 them.

 The first is to integrate preprocessing into the library. Currently, the
 library calls out to GCC to preprocess source files before parsing them.
 This has some unfortunate consequences, however, because comments and macro
 information are lost. A number of program analyses could benefit from
 metadata encoded in comments, because C doesn't have any sort of formal
 annotation mechanism, but in the current state we have to resort to ugly
 hacks (at best) to get at the contents of comments. Also, effective
 diagnostic messages need to be closely tied to original source code. In the
 presence of pre-processed macros, column number information is unreliable,
 so it can be difficult to describe to a user exactly what portion of a
 program a particular analysis refers to. An integrated preprocessor could
 retain comments and remember information about macros, eliminating both of
 these problems.

 The second possible project is to create a nicer interface for traversals
 over Language.C ASTs. Currently, the symbol table is built to include only
 information about global declarations and those other declarations currently
 in scope. Therefore, when performing multiple traversals over an AST, each
 traversal must re-analyze all global declarations and the entire AST of the
 function of interest. A better solution might be to build a traversal that
 creates a single symbol table describing all declarations in a translation
 unit (including function- and block-scoped variables), for easy reference
 during further traversals. It may also be valuable to have this traversal
 produce a slightly-simplified AST in the process. I'm not thinking of
 anything as radical as the simplifications performed by something like CIL,
 however. It might simply be enough to transform variable references into a
 form suitable for easy lookup in a complete symbol table like I've just
 described. Other simple transformations such as making all implicit casts
 explicit, or normalizing compound initializers, could also be good.

 A third possibility, which would probably depend on the integrated
 preprocessor, would be to create an exact pretty-printer. That is, a
 pretty-printing function such that pretty . parse is the identity.
 Currently, parse . pretty should be the identity, but it's not true the
 other way around. An exact pretty-printer would be very useful in creating
 rich presentations of C source code --- think LXR on steroids.

 If you're interested in any combination of these, or anything similar, let
 me know. The deadline is approaching quickly, but I'd be happy to work
 together with a student to flesh any of these out into a full proposal.

 Thanks,
 Aaron

 --
 Aaron Tomb
 Galois, Inc. (http://www.galois.com)
 at...@galois.com
 Phone: (503) 808-7206
 Fax: (503) 350-0833

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe




-- 
- Austin
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] More Language.C work for Google's Summer of Code

2010-03-30 Thread Edward Amsden
I'd be very much interested in working on this library for GSoC. I'm
currently working on an idea for another project, but I'm not certain
how widely beneficial it would be. The preprocessor and
pretty-printing projects sound especially intriguing.

On Tue, Mar 30, 2010 at 1:30 PM, Aaron Tomb at...@galois.com wrote:
 Hello,

 I'm wondering whether there's anyone on the list with an interest in doing
 additional work on the Language.C library for the Summer of Code. There are
 a few enhancements that I'd be very interested seeing, and I'd love be a
 mentor for such a project if there's a student interested in working on
 them.

 The first is to integrate preprocessing into the library. Currently, the
 library calls out to GCC to preprocess source files before parsing them.
 This has some unfortunate consequences, however, because comments and macro
 information are lost. A number of program analyses could benefit from
 metadata encoded in comments, because C doesn't have any sort of formal
 annotation mechanism, but in the current state we have to resort to ugly
 hacks (at best) to get at the contents of comments. Also, effective
 diagnostic messages need to be closely tied to original source code. In the
 presence of pre-processed macros, column number information is unreliable,
 so it can be difficult to describe to a user exactly what portion of a
 program a particular analysis refers to. An integrated preprocessor could
 retain comments and remember information about macros, eliminating both of
 these problems.

 The second possible project is to create a nicer interface for traversals
 over Language.C ASTs. Currently, the symbol table is built to include only
 information about global declarations and those other declarations currently
 in scope. Therefore, when performing multiple traversals over an AST, each
 traversal must re-analyze all global declarations and the entire AST of the
 function of interest. A better solution might be to build a traversal that
 creates a single symbol table describing all declarations in a translation
 unit (including function- and block-scoped variables), for easy reference
 during further traversals. It may also be valuable to have this traversal
 produce a slightly-simplified AST in the process. I'm not thinking of
 anything as radical as the simplifications performed by something like CIL,
 however. It might simply be enough to transform variable references into a
 form suitable for easy lookup in a complete symbol table like I've just
 described. Other simple transformations such as making all implicit casts
 explicit, or normalizing compound initializers, could also be good.

 A third possibility, which would probably depend on the integrated
 preprocessor, would be to create an exact pretty-printer. That is, a
 pretty-printing function such that pretty . parse is the identity.
 Currently, parse . pretty should be the identity, but it's not true the
 other way around. An exact pretty-printer would be very useful in creating
 rich presentations of C source code --- think LXR on steroids.

 If you're interested in any combination of these, or anything similar, let
 me know. The deadline is approaching quickly, but I'd be happy to work
 together with a student to flesh any of these out into a full proposal.

 Thanks,
 Aaron

 --
 Aaron Tomb
 Galois, Inc. (http://www.galois.com)
 at...@galois.com
 Phone: (503) 808-7206
 Fax: (503) 350-0833

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] More Language.C work for Google's Summer of Code

2010-03-30 Thread Aaron Tomb
Yes, that would definitely be one productive way forward. One concern  
is that Language.C is BSD-licensed (and it would be nice to keep it  
that way), and cpphs is LGPL. However, if cpphs remained a separate  
program, producing C + extra stuff as output, and the Language.C  
parser understood the extra stuff, this could accomplish what I'm  
interested in. It would be interesting, even, to just extend the  
Language.C parser to support comments, and to tell cpphs to leave them  
in.


There's also another pre-processor, mcpp [1], that is quite featureful  
and robust, and which supports an output mode with special syntax  
describing the origin of the code resulting from macro expansion.


Aaron

[1] http://mcpp.sourceforge.net/

On Mar 30, 2010, at 12:14 PM, austin seipp wrote:

(sorry for the dupe aaron! forgot to add haskell-cafe to senders  
list!)


Perhaps the best course of action would be to try and extend cpphs to
do things like this? From the looks of the interface, it can already
do some of these things e.g. do not strip comments from a file:

http://hackage.haskell.org/packages/archive/cpphs/1.11/doc/html/Language-Preprocessor-Cpphs.html#t%3ABoolOptions

Malcolm would have to attest to how complete it is w.r.t. say, gcc's
preprocessor, but if this were to be a SOC project, extending cpphs to
include needed functionality would probably be much more realistic
than writing a new one.

On Tue, Mar 30, 2010 at 12:30 PM, Aaron Tomb at...@galois.com wrote:

Hello,

I'm wondering whether there's anyone on the list with an interest  
in doing
additional work on the Language.C library for the Summer of Code.  
There are
a few enhancements that I'd be very interested seeing, and I'd love  
be a
mentor for such a project if there's a student interested in  
working on

them.

The first is to integrate preprocessing into the library.  
Currently, the
library calls out to GCC to preprocess source files before parsing  
them.
This has some unfortunate consequences, however, because comments  
and macro

information are lost. A number of program analyses could benefit from
metadata encoded in comments, because C doesn't have any sort of  
formal
annotation mechanism, but in the current state we have to resort to  
ugly

hacks (at best) to get at the contents of comments. Also, effective
diagnostic messages need to be closely tied to original source  
code. In the
presence of pre-processed macros, column number information is  
unreliable,
so it can be difficult to describe to a user exactly what portion  
of a
program a particular analysis refers to. An integrated preprocessor  
could
retain comments and remember information about macros, eliminating  
both of

these problems.

The second possible project is to create a nicer interface for  
traversals
over Language.C ASTs. Currently, the symbol table is built to  
include only
information about global declarations and those other declarations  
currently
in scope. Therefore, when performing multiple traversals over an  
AST, each
traversal must re-analyze all global declarations and the entire  
AST of the
function of interest. A better solution might be to build a  
traversal that
creates a single symbol table describing all declarations in a  
translation
unit (including function- and block-scoped variables), for easy  
reference
during further traversals. It may also be valuable to have this  
traversal

produce a slightly-simplified AST in the process. I'm not thinking of
anything as radical as the simplifications performed by something  
like CIL,
however. It might simply be enough to transform variable references  
into a
form suitable for easy lookup in a complete symbol table like I've  
just
described. Other simple transformations such as making all implicit  
casts

explicit, or normalizing compound initializers, could also be good.

A third possibility, which would probably depend on the integrated
preprocessor, would be to create an exact pretty-printer. That is, a
pretty-printing function such that pretty . parse is the identity.
Currently, parse . pretty should be the identity, but it's not true  
the
other way around. An exact pretty-printer would be very useful in  
creating

rich presentations of C source code --- think LXR on steroids.

If you're interested in any combination of these, or anything  
similar, let
me know. The deadline is approaching quickly, but I'd be happy to  
work
together with a student to flesh any of these out into a full  
proposal.


Thanks,
Aaron

--
Aaron Tomb
Galois, Inc. (http://www.galois.com)
at...@galois.com
Phone: (503) 808-7206
Fax: (503) 350-0833

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe





--
- Austin
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe



Re: [Haskell-cafe] More Language.C work for Google's Summer of Code

2010-03-30 Thread Nick Bowler
On 19:54 Tue 30 Mar , Stephen Tetley wrote:
 On 30 March 2010 18:55, Serguey Zefirov sergu...@gmail.com wrote:
  Other than that, C preprocessor looks simple.
 
 Ah no - apparently anything but simple.

I would describe it as simple but somewhat annoying.  This means that
guessing at its specification will not result in anything resembling a
correct implementation, but reading the specification and implementing
accordingly is straightforward.

Probably the hardest part is expression evaluation.

 You might want to see Jean-Marie Favre's (very readable, amusing)
 papers on subject. Much of the behaviour of CPP is not defined and
 often inaccurately described, certainly it wouldn't appear to make an
 ideal one summer, student project.

The only specification of the C preprocessor that matters is the one
contained in the specification of the C programming language.  The
accuracy of any other description of it is not relevant.  C is quite
possibly the language with the greatest quantity of inaccurate
descriptions in existence (scratch that, C++ is likely worse).

As with most of the C programming language, a lot of the behaviour is
implementation-defined or even undefined, as you suggest.  For example:

/* implementation-defined */
#pragma launch_missiles

/* undefined */
#define explosion defined
#if explosion
# pragma launch_missiles
#endif

This makes a preprocessor /easier/ to implement, because in these cases
the implementer can do /whatever she wants/, including doing nothing or
starting the missile launch procedure.  In the implementation-defined
case, the implementor must additionally write the decision down
somewhere, i.e. Upon execution of a #pragma launch_missiles directive,
all missiles are launched.

 http://megaplanet.org/jean-marie-favre/papers/CPPDenotationalSemantics.pdf

If this paper had criticised the actual C standard as opposed to a
working draft, it would have been easier to take it seriously.  I find
the published standard quite clear about the requirements of a C
preprocessor.

Nevertheless, assuming that the complaints of the paper remain valid, it
appears to boil down to The C is preprocessor is weird, and one must
read its whole specification to understand all of it.  It also seems to
contain a bit of The C standard does not precisely describe the GNU C
preprocessor.

This work is certainly within the scope of a summer project.

-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] More Language.C work for Google's Summer of Code

2010-03-30 Thread Aaron Tomb

That's very good to hear!

When it comes to preprocessing and exact printing, I think that there  
are various stages of completeness that we could support.


  1) Add support for parsing comments to the Language.C parser. Keep  
using an external pre-processor but tell it to leave comments in the  
source code. The cpphs pre-processor can do this. The trickiest bit  
here would have to do with where to record the comments in the AST.  
What AST node is a given comment associate with? We could probably  
come up with some general rules, and perhaps certain comments, in  
weird locations, would still be ignored.


  2) Support correct column numbers for source locations. This falls  
short of complete macro support, but covers one of the key problems  
that macros introduce. The mcpp preprocessor [1] has a special  
diagnostic mode where it adds special comments describing the origin  
of code that resulted from macro expansion. If the parser retained  
comments, we could use this information to help with exact pretty- 
printing.


  3) Modify the pretty-printer to take position information into  
account when pretty-printing (at least optionally). As long as macro  
definitions themselves (as well as #ifdef, etc.) are not in the AST,  
the output will still not be exactly the same as the input, but it'll  
come closer.


  4) Add full support for parsing and expanding macros internally, so  
that both macro definitions and expansions appear in the Language.C  
AST. This is probably a huge project, partly because macros do not  
have to obey the tree structure of the C language in any way. This is  
perhaps beyond the scope of a summer project, but the other steps  
could help prepare for it in the future, and still fully address some  
of the problems caused by the preprocessor along the way.


Do you think you'd be interested in some subset or variation of 1, 2,  
and 3? Are there other ideas you have? Things I've missed? Things  
you'd do differently?


Thanks,
Aaron


[1] http://mcpp.sourceforge.net/


On Mar 30, 2010, at 1:46 PM, Edward Amsden wrote:


I'd be very much interested in working on this library for GSoC. I'm
currently working on an idea for another project, but I'm not certain
how widely beneficial it would be. The preprocessor and
pretty-printing projects sound especially intriguing.

On Tue, Mar 30, 2010 at 1:30 PM, Aaron Tomb at...@galois.com wrote:

Hello,

I'm wondering whether there's anyone on the list with an interest  
in doing
additional work on the Language.C library for the Summer of Code.  
There are
a few enhancements that I'd be very interested seeing, and I'd love  
be a
mentor for such a project if there's a student interested in  
working on

them.

The first is to integrate preprocessing into the library.  
Currently, the
library calls out to GCC to preprocess source files before parsing  
them.
This has some unfortunate consequences, however, because comments  
and macro

information are lost. A number of program analyses could benefit from
metadata encoded in comments, because C doesn't have any sort of  
formal
annotation mechanism, but in the current state we have to resort to  
ugly

hacks (at best) to get at the contents of comments. Also, effective
diagnostic messages need to be closely tied to original source  
code. In the
presence of pre-processed macros, column number information is  
unreliable,
so it can be difficult to describe to a user exactly what portion  
of a
program a particular analysis refers to. An integrated preprocessor  
could
retain comments and remember information about macros, eliminating  
both of

these problems.

The second possible project is to create a nicer interface for  
traversals
over Language.C ASTs. Currently, the symbol table is built to  
include only
information about global declarations and those other declarations  
currently
in scope. Therefore, when performing multiple traversals over an  
AST, each
traversal must re-analyze all global declarations and the entire  
AST of the
function of interest. A better solution might be to build a  
traversal that
creates a single symbol table describing all declarations in a  
translation
unit (including function- and block-scoped variables), for easy  
reference
during further traversals. It may also be valuable to have this  
traversal

produce a slightly-simplified AST in the process. I'm not thinking of
anything as radical as the simplifications performed by something  
like CIL,
however. It might simply be enough to transform variable references  
into a
form suitable for easy lookup in a complete symbol table like I've  
just
described. Other simple transformations such as making all implicit  
casts

explicit, or normalizing compound initializers, could also be good.

A third possibility, which would probably depend on the integrated
preprocessor, would be to create an exact pretty-printer. That is, a
pretty-printing function such that pretty . parse is the identity.

Re: [Haskell-cafe] More Language.C work for Google's Summer of Code

2010-03-30 Thread Tom Hawkins
On Tue, Mar 30, 2010 at 7:30 PM, Aaron Tomb at...@galois.com wrote:
 Hello,

 I'm wondering whether there's anyone on the list with an interest in doing
 additional work on the Language.C library for the Summer of Code. There are
 a few enhancements that I'd be very interested seeing, and I'd love be a
 mentor for such a project if there's a student interested in working on
 them.

Here's another suggestion: A transformer to convert Language.C's AST
to RTL, thus hiding a lot of tedious details like structures, case
statements, variable declarations, typedefs, etc.

I started writing a model checker [1] based on Language.C, but got so
bogged down in all the details of C I lost interest.

-Tom

[1] http://hackage.haskell.org/package/afv
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] More Language.C work for Google's Summer of Code

2010-03-30 Thread Aaron Tomb

On Mar 30, 2010, at 3:16 PM, Tom Hawkins wrote:


On Tue, Mar 30, 2010 at 7:30 PM, Aaron Tomb at...@galois.com wrote:

Hello,

I'm wondering whether there's anyone on the list with an interest  
in doing
additional work on the Language.C library for the Summer of Code.  
There are
a few enhancements that I'd be very interested seeing, and I'd love  
be a
mentor for such a project if there's a student interested in  
working on

them.


Here's another suggestion: A transformer to convert Language.C's AST
to RTL, thus hiding a lot of tedious details like structures, case
statements, variable declarations, typedefs, etc.

I started writing a model checker [1] based on Language.C, but got so
bogged down in all the details of C I lost interest.


I would also love to have something along these lines, and would be  
happy to mentor such a project.


On a related note, I have some code sitting around that converts  
Language.C ASTs into a variant of Guarded Commands, and I expect I'll  
release that at some point. For the moment, it's a little too  
intimately tied to the program it's part of, though.


Aaron
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] More Language.C work for Google's Summer of Code

2010-03-30 Thread Edward Amsden
On Tue, Mar 30, 2010 at 5:14 PM, Aaron Tomb at...@galois.com wrote:
 That's very good to hear!

 When it comes to preprocessing and exact printing, I think that there are
 various stages of completeness that we could support.

  1) Add support for parsing comments to the Language.C parser. Keep using an
 external pre-processor but tell it to leave comments in the source code. The
 cpphs pre-processor can do this. The trickiest bit here would have to do
 with where to record the comments in the AST. What AST node is a given
 comment associate with? We could probably come up with some general rules,
 and perhaps certain comments, in weird locations, would still be ignored.


  2) Support correct column numbers for source locations. This falls short of
 complete macro support, but covers one of the key problems that macros
 introduce. The mcpp preprocessor [1] has a special diagnostic mode where it
 adds special comments describing the origin of code that resulted from macro
 expansion. If the parser retained comments, we could use this information to
 help with exact pretty-printing.

  3) Modify the pretty-printer to take position information into account when
 pretty-printing (at least optionally). As long as macro definitions
 themselves (as well as #ifdef, etc.) are not in the AST, the output will
 still not be exactly the same as the input, but it'll come closer.

  4) Add full support for parsing and expanding macros internally, so that
 both macro definitions and expansions appear in the Language.C AST. This is
 probably a huge project, partly because macros do not have to obey the tree
 structure of the C language in any way. This is perhaps beyond the scope of
 a summer project, but the other steps could help prepare for it in the
 future, and still fully address some of the problems caused by the
 preprocessor along the way.
I haven't looked at the C spec on macros, but I'm pretty motivated and
would like to shoot for a big project.


 Do you think you'd be interested in some subset or variation of 1, 2, and 3?
 Are there other ideas you have? Things I've missed? Things you'd do
 differently?

I'm very interested in all 3 of them, and actually somewhat in #4,
though I'll have to do some reading to understand why you're saying
it's such a big undertaking.


 Thanks,
 Aaron


 [1] http://mcpp.sourceforge.net/


 On Mar 30, 2010, at 1:46 PM, Edward Amsden wrote:

 I'd be very much interested in working on this library for GSoC. I'm
 currently working on an idea for another project, but I'm not certain
 how widely beneficial it would be. The preprocessor and
 pretty-printing projects sound especially intriguing.

 On Tue, Mar 30, 2010 at 1:30 PM, Aaron Tomb at...@galois.com wrote:

 Hello,

 I'm wondering whether there's anyone on the list with an interest in
 doing
 additional work on the Language.C library for the Summer of Code. There
 are
 a few enhancements that I'd be very interested seeing, and I'd love be a
 mentor for such a project if there's a student interested in working on
 them.

 The first is to integrate preprocessing into the library. Currently, the
 library calls out to GCC to preprocess source files before parsing them.
 This has some unfortunate consequences, however, because comments and
 macro
 information are lost. A number of program analyses could benefit from
 metadata encoded in comments, because C doesn't have any sort of formal
 annotation mechanism, but in the current state we have to resort to ugly
 hacks (at best) to get at the contents of comments. Also, effective
 diagnostic messages need to be closely tied to original source code. In
 the
 presence of pre-processed macros, column number information is
 unreliable,
 so it can be difficult to describe to a user exactly what portion of a
 program a particular analysis refers to. An integrated preprocessor could
 retain comments and remember information about macros, eliminating both
 of
 these problems.

 The second possible project is to create a nicer interface for traversals
 over Language.C ASTs. Currently, the symbol table is built to include
 only
 information about global declarations and those other declarations
 currently
 in scope. Therefore, when performing multiple traversals over an AST,
 each
 traversal must re-analyze all global declarations and the entire AST of
 the
 function of interest. A better solution might be to build a traversal
 that
 creates a single symbol table describing all declarations in a
 translation
 unit (including function- and block-scoped variables), for easy reference
 during further traversals. It may also be valuable to have this traversal
 produce a slightly-simplified AST in the process. I'm not thinking of
 anything as radical as the simplifications performed by something like
 CIL,
 however. It might simply be enough to transform variable references into
 a
 form suitable for easy lookup in a complete symbol table like I've just
 described. Other simple transformations such 

Re: [Haskell-cafe] More Language.C work for Google's Summer of Code

2010-03-30 Thread wren ng thornton

Stephen Tetley wrote:

Much of the behaviour of CPP is not defined and
often inaccurately described, certainly it wouldn't appear to make an
ideal one summer, student project.


But to give Language.C integrated support for preprocessing, one needn't
implement CPP. They only need to implement the right API for a 
preprocessor to communicate with the parser/analyzer.


Considering all the folks outside of C who use the CPP
*cough*Haskell*cough* having a stand-alone CPP would be good in its own
right. In fact, I seem to recall there's already one of those floating
around somewhere... ;)

I think it'd be far cooler and more useful to give Language.C integrated
preprocessor support without hard-wiring it to the CPP. Especially given
as there are divergent semantics for different CPP implementations, and
given we could easily imagine wanting to use another preprocessor (e.g.,
for annotations, documentation, etc)

--
Live well,
~wren
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe