>> On 6/04/16, Linda McIver wrote:

> A couple of reasons. 1. It is widely used in real data science.


"It is widely used by experts" does not mean
"it is suitable for beginners."  All sorts of things are used
in real data science, including the seriously weird SQL.

> 2. nice clean human-readable syntax
> (this is where R seems to hit the wall)

Python has, um, quirks of its own.  Which is why Python 3
exists, but Python 2 isn't going away like it was supposed
to.  So we have the unfortunate case of "Python" programs
that don't work in "Python" (both ways around).

The sadly amusing thing here is that R syntax is a straight
steal from S, and S syntax was largely adapted from C
*in order to have something programmers would be comfortable
with*.  And I must say that compared with GLIM, GenStat, and
SAS, it was a success.  R syntax really isn't as bad as it
seems.  You could call it "C: the good bits" + keyword
parameters.  I think the main barrier for experienced
programmers is the heavy used of "vectorised" forms with
subscripting overloaded in a number of very useful ways,
and the whole-object thinking you have to do.  So maybe
the semantics is the key issue.

Which has me wondering if you evaluated Octave?
(I have a copy of Octave, but almost never used it, so I
have no idea of the quality of its error messages.)

> 3. We
> will do some spreadsheet stuff, but they won't all have access to Excel,

They COULD all have access to LibreOffice.

Spreadsheets are interesting because of Ray Panko's
work on errors and work on spreadsheets.
"What we know about spreadsheet errors."
http://panko.shidler.hawaii.edu/My%20Publications/Whatknow.htm
"there has long been ample evidence that errors in spreadsheets are
pandemic. Spreadsheets, even after careful development, contain errors in
one percent or more of all formula cells. In large spreadsheets with
thousands of formulas, there will be dozens of undetected errors. Even
significant errors may go undetected because formal testing in spreadsheet
development is rare and because even serious errors may not be apparent."

So there's something worse than poor error messages,
and that's *missing* error messages.

This is one reason why I wish DrRacket included a a statistics
(sub)language.
Scheme, like Python and R, is a dynamically typed language, but
DrRacket works hard to try to detect some type errors at
compile time.  (This was one of the strengths of ABC.  It did
not require or even allow the programmer to provide type
annotations, but it did do quite strong type inference/checking,
and this was because it was meant for beginner use.  It's a real
pity that Guido van Rossum abandoned that for Python.)  The
problem with run-time errors is that they are reported a long
time after you wrote the mistake that caused them.

Which leads me to F#.
F# is an interesting candidate for doing data science
because
 - it is an interactive programming language
 - that runs under .NET and Mono
 - which compiles via the .NET intermediate language to
   tolerably good native code, making it practical for
   large amounts of data
 - which has kit available for "data science"
   (search for F# for data science) e.g. http://fslab.org/
 - having an indentation-based syntax these days
 - and is as concise as Python or more so
 - but is strongly typed so more errors are caught early.

I don't know how well novices would cope with its error
messages.  What I'm suggesting here is that the question
of WHEN an error is reported may be as important as HOW
it is reported, and that F#-for-data-science vs Python-
for-data-science might be one way of exploring that.
(The F# compiler is open source, so it should be possible
to reword its error messages.)

There's another issue, which is that syntax-colouring
editors (which I personally hate with a visceral hatred)
may help novices.  For example, Squeak and Pharo have
editors where when you type an identifier, it is red if
the compiler doesn't recognise it, black if it does, so
most spelling mistakes are instantly reported.  Bracket
matching is another way of unobtrusively reporting some
errors very early.

So how are your students writing Python?
Inside IDLE?  Notepad?  Some other way?

> and I want them to learn some basic programming as well, in particular for
> data extraction/cleaning type tasks that rather stretch excel's
> capabilities. I know you can program macros to do most of this stuff, but
> hey presto we are suddenly teaching programming in a
> less-than-beginner-friendly syntax and system.

Indeed.

>
> The main thing is that I want to teach them real skills that they can
> continue to lose.
              ^^^^
Ah, aging.

> Even worse, many of them
> are outright frightened by computation.

There is something worse than a scientist frightened by
computation, and that's one who isn't.  It's OK when it's
an MSc or PhD student, but any project with serious
funding that has nontrivial coding to do would do well to
hire someone who is good at it.  I've got some climate models
on my machine, and I honestly don't know whether to laugh or
scream.  Generally, I end up doing a bit of both.

For what it's worth, there's a Python e-book for schools
that was written here.  Would you like me to have a word
with the people who wrote it to see if they would like to
talk about Python error messages?


-- 
You received this message because you are subscribed to the Google Groups "PPIG 
Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to ppig-discuss+unsubscr...@googlegroups.com.
To post to this group, send an email to ppig-discuss@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to