Re: Corpus of Java source code with JUnit

Sebastian Lohmeier Fri, 25 May 2012 05:47:13 -0700

I'm looking for a corpus/collection of Java source code. The corpus


This is one of the better ones:
http://qualitascorpus.com/


Thanks, Derek!

should comprise multiple projects that come with JUnit test cases that
pass and have good test coverage.


This is the flying pig part of your request.


Wouldn't it be possible in theory?

I want to test a new programming construct that is supposed to shorten
programs without making them harder to understand. In the first instance


How do you plan to measure understanding?

That requires some info on the programming construct: I'm addingindirect anaphora to an extension of Java. Anaphora is a backwardrelation to a referent previously mentioned in the text, e.g. "He" in"James Gosling invented Java. He does not work for Sun anymore."Indirect anaphora is a backward relation to a referent that has not yetbeen mentioned in the text but is related to a previously mentionedreferent. The relation can be a semantic or a conceptual one. In "Anif-then-statement is executed by first evaluating the Expression.", "theExpression" is an indirect anaphor that refers to the expression that ispart of an if-then-statement. The semantic information, thatif-then-statements contain expressions is used to resolve the indirectanaphor.

I used an account of indirect anaphora resolution from cognitivelinguistics as kind of a blue print for implementing indirect anaphorain an extension of Java. The underlying assumption is that the so-calledtext world model used in the cognitive account to resolve an indirectanaphor is equivalent to an AST constructed by a Java compiler. Also,conceptual schemata are assumed to be similar to class declaration, e.g.WRT to part-whole relations that both specify. Since text understandingis in cognitive linguistics described as the construction of a textworld model and I treat the AST as if it was a text world model, one wayto measure understanding would then be to measure how manynodes/relations the compiler creates in the AST.

I.e. if a compiler is constructed according to a cognitive theory oftext understanding and both implementation and theory match humanperformance, if source code is successfully processed by a compilerwithout error, it will also be understood by a programmer.

To figure out whether the implementation of the compiler matches thetheory as well as how humans understand text/source code, a controlledexperiment could be used. IDEs provide functions like "go todeclaration" to allow a programmer to get more info on a programelement. One could count how often a programmer uses such functions forindirect anaphors, i.e. how often a programmer asks the IDE to presentthe referent of an indirect anaphor because he is not able to resolve ithimself. The more often a programmer asks for the resolution of areferent, the lower his understanding of indirect anaphors in source code.


Sebastian

--
The Open University is incorporated by Royal Charter (RC 000391), an exempt charity 
in England & Wales and a charity registered in Scotland (SC 038302).

Re: Corpus of Java source code with JUnit

Reply via email to