Re: Corpus of Java source code with JUnit

2012-05-25 Thread Derek M Jones

Sebastian,


should comprise multiple projects that come with JUnit test cases that
pass and have good test coverage.


This is the flying pig part of your request.


Wouldn't it be possible in theory?


I'm sure you can find plenty of Java software that comes with some
kind of test suite.  Unit test level and/or good coverage, possible
in theory.

As for flying pigs, I'm sure they could be genetically engineered to
grow wings.  Power to weight ratio is the big problem.  Perhaps they
could be taught to climb trees and throw themselves off like flying
foxes.


That requires some info on the programming construct: I'm adding
indirect anaphora to an extension of Java. Anaphora is a backward
relation to a referent previously mentioned in the text, e.g. "He" in
"James Gosling invented Java. He does not work for Sun anymore."
Indirect anaphora is a backward relation to a referent that has not yet
been mentioned in the text but is related to a previously mentioned
referent. The relation can be a semantic or a conceptual one. In "An


Sounds a bit like name binding in lambda calculus.


I used an account of indirect anaphora resolution from cognitive
linguistics as kind of a blue print for implementing indirect anaphora
in an extension of Java. The underlying assumption is that the so-called


There is also a big underlying assumption that there is enough locality
of reference to make a new construct supporting anaphora worthwhile.
This might apply in some domains, scientific computing springs to mind.

Too much use of anaphora will create lots of ambiguity.
"Jim killed the man with the telescope" (who was the telescope the
murder weapon?)


To figure out whether the implementation of the compiler matches the
theory as well as how humans understand text/source code, a controlled
experiment could be used. IDEs provide functions like "go to
declaration" to allow a programmer to get more info on a program
element. One could count how often a programmer uses such functions for
indirect anaphors, i.e. how often a programmer asks the IDE to present
the referent of an indirect anaphor because he is not able to resolve it
himself. The more often a programmer asks for the resolution of a
referent, the lower his understanding of indirect anaphors in source code.


or the more ambiguous the anaphora were, or because other information
was required, or that option was easier to use, or the programmer did
not understand the language construct, ...

--
Derek M. Jones  tel: +44 (0) 1252 520 667
Knowledge Software Ltd  blog:shape-of-code.coding-guidelines.com
Source code analysishttp://www.knosof.co.uk

--
The Open University is incorporated by Royal Charter (RC 000391), an exempt charity 
in England & Wales and a charity registered in Scotland (SC 038302).


Re: Corpus of Java source code with JUnit

2012-05-25 Thread Lorin Hochstein

On May 25, 2012, at 7:56 AM, Sebastian Lohmeier wrote:

> I'm looking for a corpus/collection of Java source code. The corpus should 
> comprise multiple projects that come with JUnit test cases that pass and have 
> good test coverage.
> 
> I want to test a new programming construct that is supposed to shorten 
> programs without making them harder to understand. In the first instance I 
> want to make sure that it doesn't break the code at compile- or runtime.
> 
> If anyone knows of a good corpus/collection, please let me know.
> 
> Thanks in advance!
> 
> Sebastian
> 

The Software-artifact Infrastructure Repository (SIR) hosted at University of 
Nebraska-Lincoln might have some content that could help you: 
http://sir.unl.edu/

Take care,

Lorin Hochstein

smime.p7s
Description: S/MIME cryptographic signature

-- 
The Open University is incorporated by Royal Charter (RC 000391), an exempt 
charity in England & Wales and a charity registered in Scotland (SC 038302).


Re: Corpus of Java source code with JUnit

2012-05-25 Thread Sebastian Lohmeier

I'm looking for a corpus/collection of Java source code. The corpus


This is one of the better ones:
http://qualitascorpus.com/


Thanks, Derek!


should comprise multiple projects that come with JUnit test cases that
pass and have good test coverage.


This is the flying pig part of your request.


Wouldn't it be possible in theory?


I want to test a new programming construct that is supposed to shorten
programs without making them harder to understand. In the first instance


How do you plan to measure understanding?


That requires some info on the programming construct: I'm adding 
indirect anaphora to an extension of Java. Anaphora is a backward 
relation to a referent previously mentioned in the text, e.g. "He" in 
"James Gosling invented Java. He does not work for Sun anymore." 
Indirect anaphora is a backward relation to a referent that has not yet 
been mentioned in the text but is related to a previously mentioned 
referent. The relation can be a semantic or a conceptual one. In "An 
if-then-statement is executed by first evaluating the Expression.", "the 
Expression" is an indirect anaphor that refers to the expression that is 
part of an if-then-statement. The semantic information, that 
if-then-statements contain expressions is used to resolve the indirect 
anaphor.


I used an account of indirect anaphora resolution from cognitive 
linguistics as kind of a blue print for implementing indirect anaphora 
in an extension of Java. The underlying assumption is that the so-called 
text world model used in the cognitive account to resolve an indirect 
anaphor is equivalent to an AST constructed by a Java compiler. Also, 
conceptual schemata are assumed to be similar to class declaration, e.g. 
WRT to part-whole relations that both specify. Since text understanding 
is in cognitive linguistics described as the construction of a text 
world model and I treat the AST as if it was a text world model, one way 
to measure understanding would then be to measure how many 
nodes/relations the compiler creates in the AST.


I.e. if a compiler is constructed according to a cognitive theory of 
text understanding and both implementation and theory match human 
performance, if source code is successfully processed by a compiler 
without error, it will also be understood by a programmer.


To figure out whether the implementation of the compiler matches the 
theory as well as how humans understand text/source code, a controlled 
experiment could be used. IDEs provide functions like "go to 
declaration" to allow a programmer to get more info on a program 
element. One could count how often a programmer uses such functions for 
indirect anaphors, i.e. how often a programmer asks the IDE to present 
the referent of an indirect anaphor because he is not able to resolve it 
himself. The more often a programmer asks for the resolution of a 
referent, the lower his understanding of indirect anaphors in source code.


Sebastian

--
The Open University is incorporated by Royal Charter (RC 000391), an exempt charity 
in England & Wales and a charity registered in Scotland (SC 038302).


Re: Corpus of Java source code with JUnit

2012-05-25 Thread Derek M Jones

Sebastian,


I'm looking for a corpus/collection of Java source code. The corpus


This is one of the better ones:
http://qualitascorpus.com/


should comprise multiple projects that come with JUnit test cases that
pass and have good test coverage.


This is the flying pig part of your request.


I want to test a new programming construct that is supposed to shorten
programs without making them harder to understand. In the first instance


How do you plan to measure understanding?

--
Derek M. Jones  tel: +44 (0) 1252 520 667
Knowledge Software Ltd  blog:shape-of-code.coding-guidelines.com
Source code analysishttp://www.knosof.co.uk

--
The Open University is incorporated by Royal Charter (RC 000391), an exempt charity 
in England & Wales and a charity registered in Scotland (SC 038302).