>> On 6/04/16, Linda McIver wrote: > A couple of reasons. 1. It is widely used in real data science.
"It is widely used by experts" does not mean "it is suitable for beginners." All sorts of things are used in real data science, including the seriously weird SQL. > 2. nice clean human-readable syntax > (this is where R seems to hit the wall) Python has, um, quirks of its own. Which is why Python 3 exists, but Python 2 isn't going away like it was supposed to. So we have the unfortunate case of "Python" programs that don't work in "Python" (both ways around). The sadly amusing thing here is that R syntax is a straight steal from S, and S syntax was largely adapted from C *in order to have something programmers would be comfortable with*. And I must say that compared with GLIM, GenStat, and SAS, it was a success. R syntax really isn't as bad as it seems. You could call it "C: the good bits" + keyword parameters. I think the main barrier for experienced programmers is the heavy used of "vectorised" forms with subscripting overloaded in a number of very useful ways, and the whole-object thinking you have to do. So maybe the semantics is the key issue. Which has me wondering if you evaluated Octave? (I have a copy of Octave, but almost never used it, so I have no idea of the quality of its error messages.) > 3. We > will do some spreadsheet stuff, but they won't all have access to Excel, They COULD all have access to LibreOffice. Spreadsheets are interesting because of Ray Panko's work on errors and work on spreadsheets. "What we know about spreadsheet errors." http://panko.shidler.hawaii.edu/My%20Publications/Whatknow.htm "there has long been ample evidence that errors in spreadsheets are pandemic. Spreadsheets, even after careful development, contain errors in one percent or more of all formula cells. In large spreadsheets with thousands of formulas, there will be dozens of undetected errors. Even significant errors may go undetected because formal testing in spreadsheet development is rare and because even serious errors may not be apparent." So there's something worse than poor error messages, and that's *missing* error messages. This is one reason why I wish DrRacket included a a statistics (sub)language. Scheme, like Python and R, is a dynamically typed language, but DrRacket works hard to try to detect some type errors at compile time. (This was one of the strengths of ABC. It did not require or even allow the programmer to provide type annotations, but it did do quite strong type inference/checking, and this was because it was meant for beginner use. It's a real pity that Guido van Rossum abandoned that for Python.) The problem with run-time errors is that they are reported a long time after you wrote the mistake that caused them. Which leads me to F#. F# is an interesting candidate for doing data science because - it is an interactive programming language - that runs under .NET and Mono - which compiles via the .NET intermediate language to tolerably good native code, making it practical for large amounts of data - which has kit available for "data science" (search for F# for data science) e.g. http://fslab.org/ - having an indentation-based syntax these days - and is as concise as Python or more so - but is strongly typed so more errors are caught early. I don't know how well novices would cope with its error messages. What I'm suggesting here is that the question of WHEN an error is reported may be as important as HOW it is reported, and that F#-for-data-science vs Python- for-data-science might be one way of exploring that. (The F# compiler is open source, so it should be possible to reword its error messages.) There's another issue, which is that syntax-colouring editors (which I personally hate with a visceral hatred) may help novices. For example, Squeak and Pharo have editors where when you type an identifier, it is red if the compiler doesn't recognise it, black if it does, so most spelling mistakes are instantly reported. Bracket matching is another way of unobtrusively reporting some errors very early. So how are your students writing Python? Inside IDLE? Notepad? Some other way? > and I want them to learn some basic programming as well, in particular for > data extraction/cleaning type tasks that rather stretch excel's > capabilities. I know you can program macros to do most of this stuff, but > hey presto we are suddenly teaching programming in a > less-than-beginner-friendly syntax and system. Indeed. > > The main thing is that I want to teach them real skills that they can > continue to lose. ^^^^ Ah, aging. > Even worse, many of them > are outright frightened by computation. There is something worse than a scientist frightened by computation, and that's one who isn't. It's OK when it's an MSc or PhD student, but any project with serious funding that has nontrivial coding to do would do well to hire someone who is good at it. I've got some climate models on my machine, and I honestly don't know whether to laugh or scream. Generally, I end up doing a bit of both. For what it's worth, there's a Python e-book for schools that was written here. Would you like me to have a word with the people who wrote it to see if they would like to talk about Python error messages? -- You received this message because you are subscribed to the Google Groups "PPIG Discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to ppig-discuss+unsubscr...@googlegroups.com. To post to this group, send an email to ppig-discuss@googlegroups.com. For more options, visit https://groups.google.com/d/optout.