Re: Redefining the word language

2011-03-04 Thread Thomas Green
And where do people want to put Inform 7, the interactive fiction  
language that is a subset of English and has some semantic inference  
built in?


http://inform7.com/

Is it a programming language? a restricted natural language? both?  
neither?


Thomas

On 4 Mar 2011, at 17:17, Kari Laitinen wrote:



 The definition of 'language' depends on who you are talking to.

I think that based on this discussion and earlier
discussions, it is not always clear what the term
programming language means. From the compilation
point of view the term is clear, i.e., the lexical
rules and the syntax of the language specify
accurately what the language is.
Kernighan and Ritchie used the term in that sense
in their famous book, thus excluding printf from the language.

From a human point of view, the term is less clear.
A computer program usually contains names (identifiers) that
must be understood by a person who wants to know how
that particular program works. If a program contains
definitions such as

  int nwhite, nother;
  int ndigit[10];

can we say that the names nwhite, nother, and ndigit,
belong to the used programming language?

 If the answer is 'yes', all lexically correct names
 belong to the programming language, resulting in that
 the programming language is a huge set of symbols.

 If the answer is 'no', one might ask that into which
 language these names belong if they do not belong
 to the used programming language. They are not
 English words, if English words are those that can
 be found in an English dictionary.

As both of these answers are somehow 'not good', I ended
up proposing the idea that each computer program or any
other document could be seen as containing its own
language. Thus the above names would belong to the
language of the program in which they were used.

A name such as 'nwhite' can mean different things
in different programs. In the program from which I
copied it it was used to count white space characters.
In a chess-playing application it might be used to
count white chess pieces. A symbol can have a
different meaning depending on in which language it
is used.

In the paper
http://www.naturalprogramming.com/to_read/estimating_understandability_etc.pdf
I have shown that these 'new' languages can be used
to compare different naming styles in computer programs, and
to show how computer programs relate to other
software documents.

Writing computer programs is a creative activity.
If a computer program is considered to contain
its own language, part of the programming project
is then to create the language that is used in
the program.



--
The Open University is incorporated by Royal Charter (RC 000391), an  
exempt charity in England  Wales and a charity registered in  
Scotland (SC 038302).




73 Huntington Rd, York YO31 8RL
01904-673675
http://homepage.ntlworld.com/greenery/





Re: Redefining the word language

2011-03-03 Thread Derek M Jones

Richard,


What I *don't* see here is any practical relevance to the question
of whether 'printf' is part of the C language or not, except for


I would come back to my final sentence of my original reply:

The definition of 'language' depends on who you are talking to.

--
Derek M. Jones tel: +44 (0) 1252 520 667
Knowledge Software Ltd mailto:de...@knosof.co.uk
Source code analysis   http://www.knosof.co.uk

--
The Open University is incorporated by Royal Charter (RC 000391), an exempt charity 
in England  Wales and a charity registered in Scotland (SC 038302).



Re: Redefining the word language

2011-03-03 Thread Richard O'Keefe

On 4/03/2011, at 12:44 AM, Derek M Jones wrote:

 Richard,
 
 What I *don't* see here is any practical relevance to the question
 of whether 'printf' is part of the C language or not, except for
 
 I would come back to my final sentence of my original reply:
 
 The definition of 'language' depends on who you are talking to.

The original poster claimed that it was *USEFUL* to regard
each program and each document as being in its own language.

I'm still waiting to be shown how that is *USEFUL*.



-- 
The Open University is incorporated by Royal Charter (RC 000391), an exempt 
charity in England  Wales and a charity registered in Scotland (SC 038302).



Re: Redefining the word language

2011-03-02 Thread Derek M Jones

Kari,


By that criterion, printf is definitely part of the C language.


printf is not part of the C syntax or semantics, it is a function
defined in a library.

Fortran and Pascal are examples of languages where the I/O is
defined to be part of the syntax/semantics of the languages and
not as functions defined in a library (although are likely to
get mapped to calls to some sequence of internal library calls).

The language/library distinction is important for the compiler,
which in one case has to recognise certain character sequences and
perform special processing of them and in the other just handles
a construct the same as any other function call.

As a compiler writer I don't regard printf as being part of the language
but as part of the library.  The average user is unlikely to make
this distinction and view the language as being whatever they can be
guaranteed to get out of the box when they obtain a conforming
implementation.

The definition of 'language' depends on who you are talking to.

--
Derek M. Jones tel: +44 (0) 1252 520 667
Knowledge Software Ltd mailto:de...@knosof.co.uk
Source code analysis   http://www.knosof.co.uk

--
The Open University is incorporated by Royal Charter (RC 000391), an exempt charity 
in England  Wales and a charity registered in Scotland (SC 038302).



Re: Redefining the word language

2011-03-02 Thread Richard O'Keefe

On 3/03/2011, at 6:14 AM, Derek M Jones wrote:
 As a compiler writer I don't regard printf as being part of the language
 but as part of the library.

Note however that there are C compilers which, given a call to
{sn,s,f,}printf() with a string literal for the format, will
check that the following arguments conform in number and type
to what the format expects.  This is enormously helpful for the
programmer because there _are_ often bugs of that kind and it
is 100% OK by the standards for C compilers to do checks like
that.

There are things in the library, notably type generic math,
which require *something* beyond what a compiler that knows only
what is defined elsewhere in the standard to process correctly.
It could be some special magic non-standard syntax or it could
simply be special knowledge built into the compiler that is
activated by including a particular header, but the existence of
*interfaces* in the library that cannot be defined in the
language means that compiler writers CANNOT treat (all of)
the library as not part of the language.

This is not unlike the way that certain fossilised phrases in
English use non-standard syntax, like court martial, the
structure noun adjective not being the way English normally
does things.

I suppose I don't need to point out, but I shall anyway, that
unlike Pascal or Ada, where keywords are unavailable to the
programmer, C lets you use keywords as variable names:

#define else ElSe
int else = 42;
else++;
printf(%d\n, else);
#undef else




-- 
The Open University is incorporated by Royal Charter (RC 000391), an exempt 
charity in England  Wales and a charity registered in Scotland (SC 038302).



Re: Redefining the word language

2011-03-02 Thread Derek M Jones

Richard,


On 3/03/2011, at 6:14 AM, Derek M Jones wrote:

As a compiler writer I don't regard printf as being part of the language
but as part of the library.


Note however that there are C compilers which, given a call to
{sn,s,f,}printf() with a string literal for the format, will
check that the following arguments conform in number and type
to what the format expects.  This is enormously helpful for the


For the last 20 years or so my company has sold a tool that
allows developers to specify the name of a function (user defined
or otherwise) and various properties about its arguments and
return value, these are used to check the source during compilation.

Over the last few years data mining tools have started to be used
to extract ordering dependencies for user defined functions which
can then be checked at compile time.

The only reason that C library functions tend to be checked before
user defined functions is that it requires less work from the
compiler implementor.

--
Derek M. Jones tel: +44 (0) 1252 520 667
Knowledge Software Ltd mailto:de...@knosof.co.uk
Source code analysis   http://www.knosof.co.uk

--
The Open University is incorporated by Royal Charter (RC 000391), an exempt charity 
in England  Wales and a charity registered in Scotland (SC 038302).



Re: Redefining the word language

2011-03-02 Thread Richard O'Keefe

On 3/03/2011, at 2:41 PM, Derek M Jones wrote:
 For the last 20 years or so my company has sold a tool that
 allows developers to specify the name of a function (user defined
 or otherwise) and various properties about its arguments and
 return value, these are used to check the source during compilation.

That's good to know.  But the properties for the standard library
functions come from the 'Programming language - C' standard, not
from the user.  That is the point I was making.  They are not
PRIVATE vocabulary.

Look, if I look at a book about a human language, such as
An Introduction to Persian (revised 3rd edition)
W. M. Thackston
IranBooks 1993
you find the sections go
Phonology and script   -- lexical structure
Grammar  -- what it says, 25 lessons
Classical and Archaic Usages
Colloquial Transformations
Appendices
Examples
Dictionary (English-Persian, Persian-English)
Does this mean that the vocabulary is separable from the
language?  No.  Every lesson introduces vocabulary.
This is common practice in language books.  If you had a grammar
without vocabulary, what could you say?

In C, 'int' is a reserved word.  In Algol 68, it was part of the
standard prelude.  And so what?

What I *don't* see here is any practical relevance to the question
of whether 'printf' is part of the C language or not, except for
someone who is writing a parser and needs to know whether it has
special syntax.  In C, 'return' is a keyword, but most C programmers
treat it syntactically like a function call.  There are languages
in which return *is* a function call.  When I'm writing *in* a
language, if I ask the question what is the normal way to do X in
this language, it makes no practical difference to me whether it
involve special syntax or not.  I've used languages where
absolute value of X is written |X|, or abs X, or abs(X).  The last
appears to use a normal function call, the middle one uses a kind of
syntax that is common (unary operators like + and -) even if this
one happens to be less usual, and the first requires a special kind
of syntax (outfix operators) that is rare.  But they are all
equally part of the language for someone *using* that language.
They are defined in the report or standard or whatever it's called.



-- 
The Open University is incorporated by Royal Charter (RC 000391), an exempt 
charity in England  Wales and a charity registered in Scotland (SC 038302).



Re: Redefining the word language

2011-03-01 Thread Kari Laitinen


Richard O'Keefe wrote:

If you came across a sentence written using English syntax and
closed-class words but Japanese open class words, would it
still be English?


According to the theory that I have presented in
http://www.naturalprogramming.com/to_read/estimating_understandability_etc.pdf
such a sentence could be seen as containing a language of its
own. That language would contain symbols (words) that might
be found in Japanese and English dictionaries.



By that criterion, printf is definitely part of the C language.


In my earlier post I said that it has been said that the printf
function does not belong to the C language. In the paper
http://www.alcatel-lucent.com/bstj/vol57-1978/articles/bstj57-6-1991.pdf
Ritchie et al. discuss the C language.
They say on page 2015 (page 25 in the PDF file) that The C language
provides no facility for I/O., and they continue discussing how
things would be if the printf function would be built into the language.
I understand this so that Ritchie et al. think that printf does
not belong to the C language. Obviously these computer scientists
think that only those textual symbols (keywords) that are built into
the compiler belong to the programming language.

As it seems to be so difficult to define what does or does
not belong to a (programming) language, I thought that it might
be useful to think that each computer program or any other document
contains a language of its own.




--
The Open University is incorporated by Royal Charter (RC 000391), an exempt charity 
in England  Wales and a charity registered in Scotland (SC 038302).



Re: Redefining the word language

2011-03-01 Thread Richard O'Keefe

On 1/03/2011, at 11:34 PM, Kari Laitinen wrote:
 In my earlier post I said that it has been said that the printf
 function does not belong to the C language. In the paper
 http://www.alcatel-lucent.com/bstj/vol57-1978/articles/bstj57-6-1991.pdf
 Ritchie et al. discuss the C language.
 They say on page 2015 (page 25 in the PDF file) that The C language
 provides no facility for I/O., and they continue discussing how
 things would be if the printf function would be built into the language.

For one thing, look at the date.  That's 1978.  That's back when
different structs in C did not define different name spaces -- which is
why to this day structs in the POSIX interface often have otherwise
pointless prefixes on the member names.  It's when the compiler had to
fit into a 16-bit address space.

I think you have misunderstood what Ritchie was talking about.
Programming languages like Fortran and COBOL and PL/I and Pascal
and Visual Basic have special *SYNTAX* for I/O, like

WRITE (UNIT = 6, FMT = '1XI12') (KOUNT(I), I = 1,N)

which would be expressed in C as

for (i = 0; i  n; i++) printf( %12d, kount[i]);

 I understand this so that Ritchie et al. think that printf does
 not belong to the C language.

Wrong tense.  That is what Ritchie *thought* 33 years ago, if indeed
he thought it.  He certainly talks about the STANDARD library in
that paper.  Try to tell a C programmer that 'malloc' is not part of
C and s/he will laugh you to scorn.  I don't know that he ever *did*
think that printf didn't belong to the C language in the sense
relevant here.  The *parser* doesn't have to be aware of it, and in
a 16-bit environment that's a huge advantage.  But it isn't *optional*;
a C programmer does have to be aware of it.

To put it another way,
The C Programming Language, 2nd edition
Kernighan and Ritchie
is available on line at
http://cgip.inf.unideb.hu/eng/rtornai/Kernighan_Ritchie_Language_C.pdf

Chapter 7 starts with the sentence Input and output are not part of
the C language itself, true.  But that doesn't mean what you want it
to mean.  It means ONLY that the compiler doesn't HAVE to know
anything special about the transput functions other than what the
stdio.h header tells it.  It doesn't mean, and the C standard
explicitly denies that it means, that the compiler isn't ALLOWED to
know all about the standard library functions.  It does not mean,
and the C standard explcitly denies that it means, that these names
may be freely used for other purposes.  On the contrary, the space
of reserved special names in C is actually extremely large.

After all, if I/O isn't part of the language, what's it doing in a
book whose whole reason for existence is to describe that language?

If you look at Algol 68, which rather tepidly inspired some of the
aspects of C, you will discover that it too has an I/O library that
uses no special syntax.  But I/O is spelled out as a required part
of the Algol 68 language in the defining report, and it is spelled
out as a required part of the language in the C standards too.

 Obviously these computer scientists
 think that only those textual symbols (keywords) that are built into
 the compiler belong to the programming language.

Kernighan and Ritchie were using 'language' in the sense of 'that
which MUST be specially known by a parser', i.e., syntax.  If that is
all you mean, it is uncontentious and unilluminating.  If you mean
anything else, you cannot appeal to them for support.

I note that one of the features which they relegated to the library
(setjmp/longjmp) is still part of the library in C89 and C99 *but*
has special syntactic restrictions on where it can be used because
it *requires* every special support from a compiler.  Just because
something is defined in the 'library' chapter instead of the 'syntax'
chapter doesn't mean that the compiler can afford to be ignorant of it.
 
 As it seems to be so difficult to define what does or does
 not belong to a (programming) language,

That is part of a programming language which is specified in its
defining document.  Period.  We've been writing language definitions
for about 60 years.  We learned how to be pretty darned good at it,
and then the industry forgot what had been learned.  If something
is in the defining document, you are entitled to expect it to be
_there_ and can demand your money back if it isn't.  It's part of
the language.

 I thought that it might
 be useful to think that each computer program or any other document
 contains a language of its own.

But this is inconsistent with the way you used 'language' above.
You defined the language of C in effect to be the *syntax* of C,
which C programs share.  It is the *vocabulary* which varies from
program to program or document to document.  

Are the mechanisms by which programmers learn the meanings of identifiers
in programs at all similar to the mechanisms by which people learn the
meanings of new words?  Is learning what a Horcrux is 

Re: Redefining the word language

2011-02-24 Thread Adam Smith
Kari and Richard's attention to symbols, definition, and meaning is
highly appropriate, but there's another angle at play here which I
think is more central to the language-ness of programming languages.
I'd like to share an analogy that's stuck with me for several when
thinking about the distinction between programming and natural as
modifiers for the concept of language (particularly one that tries to
side-step discussion of symbols).

Consider the representation of some physical value such as the
temperature of a room. We can make a programming representation of
this value with some configuration of a fixed number of bits (an int
in a machine word). Likewise, we can make a natural representation
of the value with, um, another physical value such as a voltage or
height of a mercury column in a thermometer. The programming
representation has a very chunky, discrete range of expression with
delivers a set number of bits of information. Meanwhile, the natural
representation smoothly ranges over some continuous domain, conveying
a fuzzy/undefined amount of information bounded by a complex
interaction of external noise sources and sampling error.

In programming languages, we've got these very discrete sequences of,
say, ascii characters, some subset of which are blessed by a
particular context-free grammar to be valid. Program code isn't
bounded in the same way as a 32-bit unsigned int, but it has a similar
discrete feeling. Meanwhile, natural language encompass a seemingly
infinite domain of expression, but the deeper we dig in extracting
information from an utterance, the more we need to make assumptions
about where the message came from and how well we heard it.

Programming representations invite us to interpret
(parse/compile/execute/etc.) them once and be confident that we got
everything important on the first try. Natural representations invite
us to repeatedly ask refining questions, allowing subsequent samples
to change our mind about inferences from the first question. And ADC
can convert a noisy (at some level) electrical signal into a discrete
digital symbol by repreatedly checking whether the signal is currently
above or below certain reference voltages. Indeed any amount of
digital information can be packed into some real value
(http://en.wikipedia.org/wiki/Arithmetic_coding), but for practical
purposes you will stop after some number of iterations. Natural
language can be subject to a similar process where seemingly limitless
information can be sucked out of a fixed natural language input via
close reading and the asking of a lot of tiny questions
(http://en.wikipedia.org/wiki/Deconstruction seems to experimentally
probe for the practical limits here).

This analogy between languages and simple physical values gives us a
way to talk about particular non-natural-nesses of programming
languages without references to symbols and semantics. (And if code is
data and you can store any data in a big fat bitstring, then it
shouldn't be saying anything too controversial.) But I'm somewhat of
an advocate for programming languages as languages, so now I want to
show how practical programming representations regularly find
themselves creeping in the natural direction.

Consider the generation of html documentation from java code using
javadoc. The official java compiler immediately throws away a lot of
information when it reads your source (comments, indentation, the
order of certain declarations, etc.). The javadoc code analyser, on
the other hand, makes a few assumptions about where the code came from
(that the programmer followed certain common practices). This allows
it to slurp up and save many comments and tie them to the constructs
they describe (conventionally, the declaration on the next line). The
tool remembers enough of your (not-so-superfluous) code formatting to
provide click-through links to particular locations in the source.
Certainly human java programmers read into the text a lot more than
the official compiler does, but this extended, extra-grammatical
interpretation process is not exclusive to humans.

Because I'm a fan of Richard's, this next example uses Prolog.
Metaprogramming regularly involves reading deeper and deeper meanings
from a snippet of object language (with the meta-language being
something we are comfortable calling a programming language). The
prolog snippet connect_via(kitchen,dining_room,west). is a 100%
complete and valid program, but it doesn't do much to execute it
(other than populate a conceptual table). By piling on more and more
assumptions in the surrounding code, we can infer from this snippet
(1) an instruction in the larger process of building a house, (2) a
specification for how some existing house was built, (3) a description
of (query for) houses out of some external database, or many other
things. It's not that we can get these meanings the snippet ended with
a full stop making it a complete statement, just knowing that a loose
term of that shape 

Re: Redefining the word language

2011-02-23 Thread Richard O'Keefe

On 24/02/2011, at 3:11 AM, Kari Laitinen wrote:
 A classic book about the C programming language
 begins with a program that contains the statement
 
   printf(hello, world);
 
 It has been said, however, that the printf function
 that is used in the above statement does not
 belong to the language itself but it is a library
 function.

We can draw a parallel to open class and closed class
words in a natural language.  Pronouns, determiners,
conjunctions, prepositions; these are not wholly
unlike keywords in a language.  (Indeed, keywords are
often prepositions, and I do not believe this is accidental.)
Variable and function names are not wholly unlike open class
words.

Would you say that open class words are not part of a language?
If you came across a sentence written using English syntax and
closed-class words but Japanese open class words, would it
still be English?

I say that a type, variable, or function belongs to the
language if it is specified in the best available description
of the language.

By that criterion, printf is definitely part of the C language.
A C compiler (like gcc) is fully within its rights to look at
a call of printf() and exploit information in the standard but
not in the type of the identifier to check that the format
string makes sense and agrees with the actual parameters.


 To me, this raises questions: Why
 begin a book about a programming language by
 showing something that does not belong to the language?

Because the claim that it does not is false.


At a minimum, a programming language must provide
 - constants
 - variables
 - user defined functions
 - built in functions
 - combining forms

One important criterion is whether an operation could be provided in
the language if it were not in the standard.  Take fopen() as an
example.  Given only the rest of C, fopen() would not be definable.
Out of putc(), fwrite(), and fprintf(), at least one of them must
be provided in the language; given any one of them the others are
definable.



-- 
The Open University is incorporated by Royal Charter (RC 000391), an exempt 
charity in England  Wales and a charity registered in Scotland (SC 038302).