Re: [ccp4bb] Off-topic: Best Scripting Language
On Thu, 2012-09-13 at 13:21 -0500, Jacob Keller wrote: Java anyone?! Your subject line asks about scripting languages. I would think that even the most inveterate Java advocate would hesitate before recommending Java for the kind of informal scripting that you were asking about :-) I've actually noticed a bit of a dearth of Java in the crystallography world, with some exceptions... Writing good Java can be quite demanding. Many people seem to find it easier to make a quick start with interpreted languages like Python and Perl, especially if they are dynamically typed. There are also other well-respected scripting languages around that don't have much traction in the crystallography world, like rexx, lua and pike, which might be worth a look. However, using a language that is familiar to your colleagues has some value. IMHO Java is good for large codebases that are developed over a long period of time. The static typing and ability of development tools to do fairly deep code analysis, means that the cost of maintaining and refactoring in Java doesn't increase with code size as quickly as with a language like Python, even if the cost of writing the code in the first place is higher. It also means that fewer test cases need to be written (if I had a penny for every runtime bug in Python code that I have seen that would have been picked up at compile time in Java.). You only get those benefits in full when using a heavyweight development environment like Eclipse or NetBeans though. Learning to get the best out of those adds to the up-front time that needs to be spent before you produce anything useful. I think that a lot of frustration with Java comes from that. I also don't think that traditional developers' text editors are really suitable for Java development once the amount of code involved becomes large: the traditional write, compile, fix, compile, fix... approach starts to break down then. Syntax highlighting is nice but only gets you so far. My 2p worth Regards, Peter. -- Peter Keller Tel.: +44 (0)1223 353033 Global Phasing Ltd., Fax.: +44 (0)1223 366889 Sheraton House, Castle Park, Cambridge CB3 0AX United Kingdom
Re: [ccp4bb] Off-topic: Best Scripting Language
On 09/14/2012 12:30 AM, Eric Bennett wrote: Actually it's a bit of a hindrance. In Perl I can call the int function on anything and get a sensible answer. In python if you call int on a string that contains a floating point number the default behavior is that it will crash: The sensible answer you describe may be considered a buggy behavior. If you use int() converter in your code, my guess is that you anticipate it will be processing a bunch of strings that are expected to represent integers. If you have a string and you want it to be converted to integer even if it actually looks like a float, you can do this number = int(float(example_string)) or this import math number = math.floor(float(example_string)) or perhaps this (more sensible) number = round(float(example_string)) I wonder what situation you have in mind when forcing non-integer string data to become integers with a slightly shorter expression gives an advantage. On a broader point, I am sure that perl-bashers can come up with examples of said language behavior they may want to ridicule. Different computer languages will exhibit different behavior because they are, well, different. That's brain dead. IMHO of course. Name-calling is not an argument. It's not quite Godwin's rule, but still. Cheers, Ed.
Re: [ccp4bb] Off-topic: Best Scripting Language
I'd just use a decent shell scripting language (like zsh) in conjunction with a unix tool like awk. But the gnuplot option sounds ideal. Bill William G. Scott Professor Department of Chemistry and Biochemistry and The Center for the Molecular Biology of RNA 228 Sinsheimer Laboratories University of California at Santa Cruz Santa Cruz, California 95064 USA On Sep 12, 2012, at 7:32 AM, Jacob Keller j-kell...@fsm.northwestern.edu wrote: Dear List, since this probably comes up a lot in manipulation of pdb/reflection files and so on, I was curious what people thought would be the best language for the following: I have some huge (100s MB) tables of tab-delimited data on which I would like to do some math (averaging, sigmas, simple arithmetic, etc) as well as some sorting and rejecting. It can be done in Excel, but this is exceedingly slow even in 64-bit, so I am looking to do it through some scripting. Just as an example, a sort which takes 10 min in Excel takes ~10 sec max with the unix command sort (seems crazy, no?). Any suggestions? Thanks, and sorry for being off-topic, Jacob -- *** Jacob Pearson Keller Northwestern University Medical Scientist Training Program email: j-kell...@northwestern.edu ***
Re: [ccp4bb] Off-topic: Best Scripting Language
After this religious debate concludes, I propose we return to the old standby - vi versus emacs. -Original Message- From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Tim Gruene Sent: Thursday, September 13, 2012 5:25 AM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] Off-topic: Best Scripting Language -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi James, I don't read blaming in George's words, just reasoning for a personal decision. Maybe I suffer from similar prejudice: I have the impression that python programmers spend a lot of effort in trying to convince others that python is a good choice. Why bother rather than let people make their own decision? Cheers, Tim On 09/12/2012 09:49 PM, James Stroud wrote: On Sep 12, 2012, at 1:00 PM, George Sheldrick wrote: It is the lack of compatibility between different versions mentioned by Ethan that really put me off learning PYTHON. Python is backwards compatible. I have reams of code I wrote in python 2.3 that still works in 2.7 without modification. Also, python (aka python 2) and python 3000 (aka python 3) are considered two different languages. It's not reasonable to consider them one language and then complain that they are incompatible. Python 3 was created as a new language (and should be treated as such) precisely because it breaks compatibility with python 2. That was the intent of the language authors. You blame the authors for recognizing limitations of a language and inventing a new one to overcome those limitations. If the FORTRAN authors would have done that about 30 years ago, we all might be programming in FORTRAN. James - -- - -- Dr Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen GPG Key ID = A46BEE1A -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iD8DBQFQUaZRUxlJ7aRr7hoRAimoAJwPAw1o0VqugVERQcYN0RBR424mYgCgu8mF cwGd1+0swzfudjmf0pgu0ek= =NIVw -END PGP SIGNATURE- Notice: This e-mail message, together with any attachments, contains information of Merck Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates Direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system.
Re: [ccp4bb] Off-topic: Best Scripting Language
On Sep 13, 2012, at 3:24 AM, Tim Gruene wrote: I have the impression that python programmers spend a lot of effort in trying to convince others that python is a good choice. Why bother rather than let people make their own decision? Someone asked. Plus, python programmers put no more effort than any other programmer. It's just that python has more advocates (for good reason) so the apparent effort is amplified. Don't hate us because our preferred programming language is beautiful. James -- James Stroud http://www.jamesstroud.com
Re: [ccp4bb] Off-topic: Best Scripting Language
Like most computer users and many scientists I don't write scripts to organize or analyse my data unless I get desperate. I've used both Python and Perl a few years ago, but it would take quite a lot of time and effort and staring at on-line tutorials to get back into either of them right now. So I end up using massive Excel files that kind of work, but are a pain. I've noticed that quite a few structural biologists have the same problem. I've never understood why there can't be a simple programming language that is completely self-explanatory bercause it uses English sentences. Our robot scripting language uses syntax like Dispense 0.5 * DropVol ul to TargetWells using ProteinSyringe That is pretty obvious. So why can't I have a language where I can write Carry_out_a_sequence_where x is 1 to 10 with_step_size 1 : if age of person(x) is_greater_than 50 then print name of person(x) is an old man (or woman) . Repeat_for_next x . ? I don't care if it's efficient (anything is efficient compared to Excel) or if it's easy to write big programs in. All I care about is that it's easy to get going. Later on I can learn to write simply Sequence instead of Carry_out_a_sequence_where. I could click a button that would make the replacement to make my code more compact and readable to a trained eye. And of course is_greater_than could be written as . Any intelligent school-child could understand it too, which would be fantastic here in the UK where kids aren't taught to program any more. Does such a language exist? On 13 September 2012 17:08, James Stroud xtald...@gmail.com wrote: On Sep 13, 2012, at 3:24 AM, Tim Gruene wrote: I have the impression that python programmers spend a lot of effort in trying to convince others that python is a good choice. Why bother rather than let people make their own decision? Someone asked. Plus, python programmers put no more effort than any other programmer. It's just that python has more advocates (for good reason) so the apparent effort is amplified. Don't hate us because our preferred programming language is beautiful. James -- James Stroud http://www.jamesstroud.com -- patr...@douglas.co.ukDouglas Instruments Ltd. Douglas House, East Garston, Hungerford, Berkshire, RG17 7HD, UK Directors: Peter Baldock, Patrick Shaw Stewart http://www.douglas.co.uk Tel: 44 (0) 148-864-9090US toll-free 1-877-225-2034 Regd. England 2177994, VAT Reg. GB 480 7371 36
Re: [ccp4bb] Off-topic: Best Scripting Language
In my opinion, the Python equivalent of your pseudo-code is fairly close to how you would write the instructions logically. But then maybe not everyone thinks in the same way that I do :-) for x in range(1, 10): if age_of_person(x) 50: print name_of_person(x), is an old man (or woman) Of course you would have to define the functions age_of_person() and name_of_person() in order for this to work, or you could write it in a more object-oriented method so you have a Person object which has attributes name, age, gender, etc. and the code would be even more readable. for person in list_of_people: if person.age 50: print person.name, is an old , person.gender Disclaimer: I mainly write in Python, so obviously am naturally biased towards Python, however I have yet to see another widely used language that is as readable or intuitive to learn (to me). Cheers, Richard -- Richard Gildea Software Developer Physical Biosciences Division Lawrence Berkeley National Laboratory 1 Cyclotron Rd Mail Stop 64R0121 Berkeley CA 94720-8118 On 13 September 2012 10:02, Patrick Shaw Stewart patr...@douglas.co.ukwrote: Like most computer users and many scientists I don't write scripts to organize or analyse my data unless I get desperate. I've used both Python and Perl a few years ago, but it would take quite a lot of time and effort and staring at on-line tutorials to get back into either of them right now. So I end up using massive Excel files that kind of work, but are a pain. I've noticed that quite a few structural biologists have the same problem. I've never understood why there can't be a simple programming language that is completely self-explanatory bercause it uses English sentences. Our robot scripting language uses syntax like Dispense 0.5 * DropVol ul to TargetWells using ProteinSyringe That is pretty obvious. So why can't I have a language where I can write Carry_out_a_sequence_where x is 1 to 10 with_step_size 1 : if age of person(x) is_greater_than 50 then print name of person(x) is an old man (or woman) . Repeat_for_next x . ? I don't care if it's efficient (anything is efficient compared to Excel) or if it's easy to write big programs in. All I care about is that it's easy to get going. Later on I can learn to write simply Sequence instead of Carry_out_a_sequence_where. I could click a button that would make the replacement to make my code more compact and readable to a trained eye. And of course is_greater_than could be written as . Any intelligent school-child could understand it too, which would be fantastic here in the UK where kids aren't taught to program any more. Does such a language exist? On 13 September 2012 17:08, James Stroud xtald...@gmail.com wrote: On Sep 13, 2012, at 3:24 AM, Tim Gruene wrote: I have the impression that python programmers spend a lot of effort in trying to convince others that python is a good choice. Why bother rather than let people make their own decision? Someone asked. Plus, python programmers put no more effort than any other programmer. It's just that python has more advocates (for good reason) so the apparent effort is amplified. Don't hate us because our preferred programming language is beautiful. James -- James Stroud http://www.jamesstroud.com -- patr...@douglas.co.ukDouglas Instruments Ltd. Douglas House, East Garston, Hungerford, Berkshire, RG17 7HD, UK Directors: Peter Baldock, Patrick Shaw Stewart http://www.douglas.co.uk Tel: 44 (0) 148-864-9090US toll-free 1-877-225-2034 Regd. England 2177994, VAT Reg. GB 480 7371 36
Re: [ccp4bb] Off-topic: Best Scripting Language
On Sep 13, 2012, at 11:02 AM, Patrick Shaw Stewart wrote: Like most computer users and many scientists I don't write scripts to organize or analyse my data unless I get desperate. I've used both Python and Perl a few years ago, but it would take quite a lot of time and effort and staring at on-line tutorials to get back into either of them right now. So I end up using massive Excel files that kind of work, but are a pain. I've noticed that quite a few structural biologists have the same problem. I've never understood why there can't be a simple programming language that is completely self-explanatory bercause it uses English sentences. Yeah. They tried that. It's called AppleScript and is a complete disaster for programmers simply because of its vague resemblance to natural language. There are essays on this issue [1, 2], but other than the message stay away from programming languages that try to be natural languages, these essays are mostly academic. It turns out that the syntax and semantics of all reasonable programming languages are very similar, or fall into only a few classes (e.g. C-like, S-expressions, etc.), so once you are fluent in one from a class, it's easy to pick up the others. This can't be said of natural languages, which are full of idioms and grammatical exceptions, even in closely related dialects. James [1] http://www.codinghorror.com/blog/2006/08/computer-languages-arent-human-languages.html [2] http://daringfireball.net/2005/09/englishlikeness_monster
Re: [ccp4bb] Off-topic: Best Scripting Language
It turns out that the syntax and semantics of all reasonable programming languages are very similar, or fall into only a few classes (e.g. C-like, S-expressions, etc.), so once you are fluent in one from a class, it's easy to pick up the others. This can't be said of natural languages, which are full of idioms and grammatical exceptions, even in closely related dialects. This is more opinions than I can shake a stick at! Don't we all have other fish to fry (or for the French, other cats to whip? Other national equivalents?) Anyway, I was nervous as a long-tailed cat in a room full of rocking chairs to ask this question, and look at the Pandora's box that this has opened! Java anyone?! I've actually noticed a bit of a dearth of Java in the crystallography world, with some exceptions... Thanks everybody for your suggestions--I will mull them over, since you all make such good arguments (not meant in the programming sense, but probably that's true too!), Jacob James [1] http://www.codinghorror.com/blog/2006/08/computer-languages-arent-human-languages.html [2] http://daringfireball.net/2005/09/englishlikeness_monster -- *** Jacob Pearson Keller Northwestern University Medical Scientist Training Program email: j-kell...@northwestern.edu ***
Re: [ccp4bb] Off-topic: Best Scripting Language
Another option that could be cheap or free (if your university offers license deals, as mine does) is SPSS. It has a lot of the quick and dirty spreadsheet functionality of Excel, is much faster than Excel with large tables, has lots of good analysis tools, has its own scripting language, and is compatible with R and Python. Eric On Thu, Sep 13, 2012 at 2:21 PM, Jacob Keller j-kell...@fsm.northwestern.edu wrote: It turns out that the syntax and semantics of all reasonable programming languages are very similar, or fall into only a few classes (e.g. C-like, S-expressions, etc.), so once you are fluent in one from a class, it's easy to pick up the others. This can't be said of natural languages, which are full of idioms and grammatical exceptions, even in closely related dialects. This is more opinions than I can shake a stick at! Don't we all have other fish to fry (or for the French, other cats to whip? Other national equivalents?) Anyway, I was nervous as a long-tailed cat in a room full of rocking chairs to ask this question, and look at the Pandora's box that this has opened! Java anyone?! I've actually noticed a bit of a dearth of Java in the crystallography world, with some exceptions... Thanks everybody for your suggestions--I will mull them over, since you all make such good arguments (not meant in the programming sense, but probably that's true too!), Jacob James [1] http://www.codinghorror.com/blog/2006/08/computer-languages-arent-human-languages.html [2] http://daringfireball.net/2005/09/englishlikeness_monster -- *** Jacob Pearson Keller Northwestern University Medical Scientist Training Program email: j-kell...@northwestern.edu ***
Re: [ccp4bb] Off-topic: Best Scripting Language
On Sep 12, 2012, at 2:28 PM, Ethan Merritt wrote: Why are you dis-ing python? Seems everybody loves it... I'm sure you can google for many reasons I hate Python lists. Mine would start 1) sensitive to white space == fail 2) dynamic typing makes it nearly impossible to verify program correctness, and very hard to debug problems that arise from unexpected input or a mismatch between caller and callee. 3) the language developers don't care about backward compatibility; it seems version 2.n+1 always breaks code written for version 2.n, and let's not even talk about version 3 4) slw unless you use it simply as a wrapper for C++, in which case why not just use C++ or C to begin with? 5) not thread-safe you did ask... Ethan While I agree generally with your points and try to avoid python if at all possible, I'm not sure about what you mean with point 5, since it's certainly possible to write threaded python scripts. Another point that is purely personal taste is the language philosophy that there is one official way to do something in Python, as contrasted with Perl (which is my choice) where the language philosophy is that there are many ways of doing any given task and the language is not designed to force you into a particular way of doing it. Ed adds: While indeed 1/3=0 (but so it will be in C), I think it's a bit of an overstatement that python code execution is nearly impossible to verify. Another goal of python is to accelerate implementation, and dynamic/duck typing supposedly helps that. The argument is simply that weak typing favours strong testing, which should be a good thing. Actually it's a bit of a hindrance. In Perl I can call the int function on anything and get a sensible answer. In python if you call int on a string that contains a floating point number the default behavior is that it will crash: [woz:~] bennette% cat pytest.py example_string = 10.3 number = int(example_string) [woz:~] bennette% /Library/Frameworks/Python.framework/Versions/2.6/bin/python pytest.py Traceback (most recent call last): File pytest.py, line 2, in module number = int(example_string) ValueError: invalid literal for int() with base 10: '10.3' That's brain dead. IMHO of course. Cheers, Eric
[ccp4bb] Off-topic: Best Scripting Language
Dear List, since this probably comes up a lot in manipulation of pdb/reflection files and so on, I was curious what people thought would be the best language for the following: I have some huge (100s MB) tables of tab-delimited data on which I would like to do some math (averaging, sigmas, simple arithmetic, etc) as well as some sorting and rejecting. It can be done in Excel, but this is exceedingly slow even in 64-bit, so I am looking to do it through some scripting. Just as an example, a sort which takes 10 min in Excel takes ~10 sec max with the unix command sort (seems crazy, no?). Any suggestions? Thanks, and sorry for being off-topic, Jacob -- *** Jacob Pearson Keller Northwestern University Medical Scientist Training Program email: j-kell...@northwestern.edu ***
Re: [ccp4bb] Off-topic: Best Scripting Language
Try R. :) http://www.r-project.org/ Eric On Wed, Sep 12, 2012 at 10:32 AM, Jacob Keller j-kell...@fsm.northwestern.edu wrote: Dear List, since this probably comes up a lot in manipulation of pdb/reflection files and so on, I was curious what people thought would be the best language for the following: I have some huge (100s MB) tables of tab-delimited data on which I would like to do some math (averaging, sigmas, simple arithmetic, etc) as well as some sorting and rejecting. It can be done in Excel, but this is exceedingly slow even in 64-bit, so I am looking to do it through some scripting. Just as an example, a sort which takes 10 min in Excel takes ~10 sec max with the unix command sort (seems crazy, no?). Any suggestions? Thanks, and sorry for being off-topic, Jacob -- *** Jacob Pearson Keller Northwestern University Medical Scientist Training Program email: j-kell...@northwestern.edu ***
Re: [ccp4bb] Off-topic: Best Scripting Language
I always use FORTRAN for such tasks, especially if speed is important. George On 09/12/2012 04:32 PM, Jacob Keller wrote: Dear List, since this probably comes up a lot in manipulation of pdb/reflection files and so on, I was curious what people thought would be the best language for the following: I have some huge (100s MB) tables of tab-delimited data on which I would like to do some math (averaging, sigmas, simple arithmetic, etc) as well as some sorting and rejecting. It can be done in Excel, but this is exceedingly slow even in 64-bit, so I am looking to do it through some scripting. Just as an example, a sort which takes 10 min in Excel takes ~10 sec max with the unix command sort (seems crazy, no?). Any suggestions? Thanks, and sorry for being off-topic, Jacob -- *** Jacob Pearson Keller Northwestern University Medical Scientist Training Program email: j-kell...@northwestern.edu mailto:j-kell...@northwestern.edu *** -- Prof. George M. Sheldrick FRS Dept. Structural Chemistry, University of Goettingen, Tammannstr. 4, D37077 Goettingen, Germany Tel. +49-551-39-3021 or -3068 Fax. +49-551-39-22582
Re: [ccp4bb] Off-topic: Best Scripting Language
Le Mercredi 12 Septembre 2012 16:40 CEST, George M. Sheldrick gshe...@shelx.uni-ac.gwdg.de a écrit: May I add a little personal joke to the serious remark by George. This remembers me a discussion I had with Jorge Navaza, let's say 15 years ago, about the programming language of the future. (To a good approximation, 15 years ago, the future was now) The answer by Jorge was: I don't know what it will be, but I know it's name will be FORTRAN. I hope he will confirm the statement... Philippe Dumas I always use FORTRAN for such tasks, especially if speed is important. George On 09/12/2012 04:32 PM, Jacob Keller wrote: Dear List, since this probably comes up a lot in manipulation of pdb/reflection files and so on, I was curious what people thought would be the best language for the following: I have some huge (100s MB) tables of tab-delimited data on which I would like to do some math (averaging, sigmas, simple arithmetic, etc) as well as some sorting and rejecting. It can be done in Excel, but this is exceedingly slow even in 64-bit, so I am looking to do it through some scripting. Just as an example, a sort which takes 10 min in Excel takes ~10 sec max with the unix command sort (seems crazy, no?). Any suggestions? Thanks, and sorry for being off-topic, Jacob -- *** Jacob Pearson Keller Northwestern University Medical Scientist Training Program email: j-kell...@northwestern.edu mailto:j-kell...@northwestern.edu *** -- Prof. George M. Sheldrick FRS Dept. Structural Chemistry, University of Goettingen, Tammannstr. 4, D37077 Goettingen, Germany Tel. +49-551-39-3021 or -3068 Fax. +49-551-39-22582
Re: [ccp4bb] Off-topic: Best Scripting Language
One thing to keep in mind is that there's usually a trade-off between setup (writing and testing) and execution time. For one-off data processing, I'd focus on implementation speed rather than execution speed (in other words, FORTRAN might not be ideal unless you're already fluent with it). That said, I'd take a look at python, octave or R. Python's relatively easy to learn, and more flexible than octave/R; but it doesn't have the built-in statistic functions that octave and R do. One other tip which you've probably already though of - Depending on your runtimes (I don't think 100s MB of data is usually considered an enormous amount, but it'll depend on what you're doing) it may be worth getting things working on a small subset of the data first. Pete Jacob Keller wrote: Dear List, since this probably comes up a lot in manipulation of pdb/reflection files and so on, I was curious what people thought would be the best language for the following: I have some huge (100s MB) tables of tab-delimited data on which I would like to do some math (averaging, sigmas, simple arithmetic, etc) as well as some sorting and rejecting. It can be done in Excel, but this is exceedingly slow even in 64-bit, so I am looking to do it through some scripting. Just as an example, a sort which takes 10 min in Excel takes ~10 sec max with the unix command sort (seems crazy, no?). Any suggestions? Thanks, and sorry for being off-topic, Jacob
Re: [ccp4bb] Off-topic: Best Scripting Language
On Wed, Sep 12, 2012 at 7:32 AM, Jacob Keller j-kell...@fsm.northwestern.edu wrote: since this probably comes up a lot in manipulation of pdb/reflection files and so on, I was curious what people thought would be the best language for the following: I have some huge (100s MB) tables of tab-delimited data on which I would like to do some math (averaging, sigmas, simple arithmetic, etc) as well as some sorting and rejecting. It can be done in Excel, but this is exceedingly slow even in 64-bit, so I am looking to do it through some scripting. Just as an example, a sort which takes 10 min in Excel takes ~10 sec max with the unix command sort (seems crazy, no?). Any suggestions? Anything but Fortran. Seriously, there are probably a dozen (or more) good solutions, and it depends on whose syntax you prefer, what external libraries you need, whether you want to someday apply your new programming skills to another project, and whether you want anyone else to be able to read your code. For me, Python wins easily, but the suggestions of Octave or R are probably just as good for a one-time script of the sort you describe. -Nat
Re: [ccp4bb] Off-topic: Best Scripting Language
A similar remark was made to me by David Blow, while he was on sabbatical at UNC in the 1980s, working with the UNC Computer Science Department and in a moment of intense frustration with the overpowering ignorance of fortran and the enthusiasm for Unix exhibited by that department. Charlie On Sep 12, 2012, at 10:58 AM, DUMAS Philippe (UDS) wrote: This remembers me a discussion I had with Jorge Navaza, let's say 15 years ago, about the programming language of the future. (To a good approximation, 15 years ago, the future was now) The answer by Jorge was: I don't know what it will be, but I know it's name will be FORTRAN. I hope he will confirm the statement... Philippe Dumas
Re: [ccp4bb] Off-topic: Best Scripting Language
On Wednesday, September 12, 2012 07:32:54 am Jacob Keller wrote: Dear List, since this probably comes up a lot in manipulation of pdb/reflection files and so on, I was curious what people thought would be the best language for the following: I have some huge (100s MB) tables of tab-delimited data on which I would like to do some math (averaging, sigmas, simple arithmetic, etc) as well as some sorting and rejecting. It can be done in Excel, but this is exceedingly slow even in 64-bit, so I am looking to do it through some scripting. Just as an example, a sort which takes 10 min in Excel takes ~10 sec max with the unix command sort (seems crazy, no?). Any suggestions? For the specific purpose you list - input from tab-delimited data output to simple statisitical summaries and (I assume) plots - it sounds like gnuplot could do the job nicely. Otherwise I'd recommend perl, and dis-recommend python. Ethan -- Ethan A Merritt Biomolecular Structure Center, K-428 Health Sciences Bldg University of Washington, Seattle 98195-7742
Re: [ccp4bb] Off-topic: Best Scripting Language
For the specific purpose you list - input from tab-delimited data output to simple statisitical summaries and (I assume) plots - it sounds like gnuplot could do the job nicely. I wasn't aware that gnuplot can do calculations--can it? I was probably going to use it somewhere as a plotting option. Otherwise I'd recommend perl, and dis-recommend python. Why are you dis-ing python? Seems everybody loves it... JPK Ethan -- Ethan A Merritt Biomolecular Structure Center, K-428 Health Sciences Bldg University of Washington, Seattle 98195-7742 -- *** Jacob Pearson Keller Northwestern University Medical Scientist Training Program email: j-kell...@northwestern.edu ***
Re: [ccp4bb] Off-topic: Best Scripting Language
All you need is scipy library to get those pesky statistic functions :) On 09/12/2012 11:11 AM, Pete Meyer wrote: Python's relatively easy to learn, and more flexible than octave/R; but it doesn't have the built-in statistic functions that octave and R do.
Re: [ccp4bb] Off-topic: Best Scripting Language
Why are you dis-ing python? Seems everybody loves it... Depends on if you like the object model, some don't. In the end it really boils down to what you're used to and what you've learned to use.
Re: [ccp4bb] Off-topic: Best Scripting Language
Now is the time when I start waxing nostalgic about the old days when there used to be entire threads on this bulletin board about Fortran format statement syntax for parsing various files.and I read them with great interest How did I get to be such a geezer? -Original Message- From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Carter, Charlie Sent: Wednesday, September 12, 2012 11:17 AM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] Off-topic: Best Scripting Language A similar remark was made to me by David Blow, while he was on sabbatical at UNC in the 1980s, working with the UNC Computer Science Department and in a moment of intense frustration with the overpowering ignorance of fortran and the enthusiasm for Unix exhibited by that department. Charlie On Sep 12, 2012, at 10:58 AM, DUMAS Philippe (UDS) wrote: This remembers me a discussion I had with Jorge Navaza, let's say 15 years ago, about the programming language of the future. (To a good approximation, 15 years ago, the future was now) The answer by Jorge was: I don't know what it will be, but I know it's name will be FORTRAN. I hope he will confirm the statement... Philippe Dumas Notice: This e-mail message, together with any attachments, contains information of Merck Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates Direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system.
Re: [ccp4bb] Off-topic: Best Scripting Language
On Sep 12, 2012, at 9:11 AM, Pete Meyer wrote: That said, I'd take a look at python, octave or R. Python's relatively easy to learn, and more flexible than octave/R; but it doesn't have the built-in statistic functions that octave and R do. import scipy Now it does!
Re: [ccp4bb] Off-topic: Best Scripting Language
It is the lack of compatibility between different versions mentioned by Ethan that really put me off learning PYTHON. In contrast, the FORTRAN-66 program SHELX76 still compiles and runs correctly with any modern FORTRAN compiler. The only significant 'new' features that I now use are dynamic array allocation (introduced in FORTRAN-90) and OpenMP support for multiple CPUs, but even programs using OpenMP would still work with older compilers because the OpenMP instructions would be treated as comments. George On 09/12/2012 08:28 PM, Ethan Merritt wrote: On Wednesday, September 12, 2012 09:52:09 am Jacob Keller wrote: For the specific purpose you list - input from tab-delimited data output to simple statisitical summaries and (I assume) plots - it sounds like gnuplot could do the job nicely. I wasn't aware that gnuplot can do calculations--can it? I was probably going to use it somewhere as a plotting option. Here's a simple-minded example using a dump of the current contents of the PDB from www.pdb.org as a comma-separated file with ~65000 entries. The input file was previously filtered to contain only X-ray structures between 1 and 4 Angstroms resolution. gnuplot !head -3 PDB.csv PDB ID,R Observed,R All,R Work,R Free,Refinement Resolution 100D,0.145,,0.145,,1.90 101D,0.163,,,0.252,2.25 gnuplot set datafile separater , gnuplot set datafile nofpe_trap # trap handling greatly slows large data sets gnuplot stats 'PDB.csv' using R Observed prefix Robs * FILE: Records: 63029 Out of range: 0 Invalid: 0 Blank:2 Data Blocks: 2 * COLUMN: Mean: 0.1982 Std Dev: 0.0334 Sum: 12494.6900 Sum Sq.:2547.3068 Minimum: 0.0450 [24518] Maximum: 0.9700 [45024] Quartile: 0.1770 Median:0.1970 Quartile: 0.2180 gnuplot print Robs_mean 0.198237160672072 gnuplot #calculate correlation of Robs with Resolution gnuplot stats 'PDB.cvs' using R Observed:Refinement Resolution nooutput gnuplot print STATS_correlation 0.595763711910418 I've attached graphical output of the same data following some sorting, filtered, binning, etc, with output to a PDF file. You can do all this in R also. R has a larger collection of statistics options, but is not as good at dealing with really large data sets. IMHO gnuplot has more flexible options for graphical output. Otherwise I'd recommend perl, and dis-recommend python. Why are you dis-ing python? Seems everybody loves it... I'm sure you can google for many reasons I hate Python lists. Mine would start 1) sensitive to white space == fail 2) dynamic typing makes it nearly impossible to verify program correctness, and very hard to debug problems that arise from unexpected input or a mismatch between caller and callee. 3) the language developers don't care about backward compatibility; it seems version 2.n+1 always breaks code written for version 2.n, and let's not even talk about version 3 4) slw unless you use it simply as a wrapper for C++, in which case why not just use C++ or C to begin with? 5) not thread-safe you did ask... Ethan -- Prof. George M. Sheldrick FRS Dept. Structural Chemistry, University of Goettingen, Tammannstr. 4, D37077 Goettingen, Germany Tel. +49-551-39-3021 or -3068 Fax. +49-551-39-22582
Re: [ccp4bb] Off-topic: Best Scripting Language
Python sorting 1 records of 1 floats for each record, finding the max, min, and mean of entire 100,000,000 32 bit float array (400 MB) on a 6 year old white imac. *11.6 seconds. *This doesn't include the time to generate the 400 MB of random (normal) data. Try it on your own computer. Here's the copy-paste from mine: py import timeit py timeit.timeit('big_data.sort(axis=0), big_data.mean(); big_data.max(); big_data.min();', 'import numpy; big_data=numpy.random.normal(10, size=1e8).reshape((1e4,1e4)); print random data made, starting...', number=1) random data made, starting... 11.597978115081787 James On Sep 12, 2012, at 8:32 AM, Jacob Keller wrote: Dear List, since this probably comes up a lot in manipulation of pdb/reflection files and so on, I was curious what people thought would be the best language for the following: I have some huge (100s MB) tables of tab-delimited data on which I would like to do some math (averaging, sigmas, simple arithmetic, etc) as well as some sorting and rejecting. It can be done in Excel, but this is exceedingly slow even in 64-bit, so I am looking to do it through some scripting. Just as an example, a sort which takes 10 min in Excel takes ~10 sec max with the unix command sort (seems crazy, no?). Any suggestions? Thanks, and sorry for being off-topic, Jacob -- *** Jacob Pearson Keller Northwestern University Medical Scientist Training Program email: j-kell...@northwestern.edu ***
Re: [ccp4bb] Off-topic: Best Scripting Language
Hi, Python, of course (if you know some basic math). Otherwise, Python and a good math text book -:) Pavel On Wed, Sep 12, 2012 at 7:32 AM, Jacob Keller j-kell...@fsm.northwestern.edu wrote: Dear List, since this probably comes up a lot in manipulation of pdb/reflection files and so on, I was curious what people thought would be the best language for the following: I have some huge (100s MB) tables of tab-delimited data on which I would like to do some math (averaging, sigmas, simple arithmetic, etc) as well as some sorting and rejecting. It can be done in Excel, but this is exceedingly slow even in 64-bit, so I am looking to do it through some scripting. Just as an example, a sort which takes 10 min in Excel takes ~10 sec max with the unix command sort (seems crazy, no?). Any suggestions? Thanks, and sorry for being off-topic, Jacob -- *** Jacob Pearson Keller Northwestern University Medical Scientist Training Program email: j-kell...@northwestern.edu ***
Re: [ccp4bb] Off-topic: Best Scripting Language
On Sep 12, 2012, at 1:00 PM, George Sheldrick wrote: It is the lack of compatibility between different versions mentioned by Ethan that really put me off learning PYTHON. Python is backwards compatible. I have reams of code I wrote in python 2.3 that still works in 2.7 without modification. Also, python (aka python 2) and python 3000 (aka python 3) are considered two different languages. It's not reasonable to consider them one language and then complain that they are incompatible. Python 3 was created as a new language (and should be treated as such) precisely because it breaks compatibility with python 2. That was the intent of the language authors. You blame the authors for recognizing limitations of a language and inventing a new one to overcome those limitations. If the FORTRAN authors would have done that about 30 years ago, we all might be programming in FORTRAN. James
Re: [ccp4bb] Off-topic: Best Scripting Language
Ethan, I think majority of your complaints about python result from its very purpose - to be readable/portable for the sake of facilitating rapid implementation. There are many other languages that provide tools to accomplish what Jacob wants to do (well, I would stay away from P''), but python definitely is a good option for casual calculations. On 09/12/2012 02:28 PM, Ethan Merritt wrote: I'm sure you can google for many reasons I hate Python lists. Mine would start 1) sensitive to white space == fail every language has a way to group lines of code. Curly brackets are fine, but python is designed to force code readability, and preceding white space (btw, everywhere else it is ignored) does that. 2) dynamic typing makes it nearly impossible to verify program correctness, and very hard to debug problems that arise from unexpected input or a mismatch between caller and callee. While indeed 1/3=0 (but so it will be in C), I think it's a bit of an overstatement that python code execution is nearly impossible to verify. Another goal of python is to accelerate implementation, and dynamic/duck typing supposedly helps that. The argument is simply that weak typing favours strong testing, which should be a good thing. 3) the language developers don't care about backward compatibility; it seems version 2.n+1 always breaks code written for version 2.n, and let's not even talk about version 3 I don't think that's entirely true either, why would they then backport certain features from v3? The decision to not provide backward compatibility was well explained. While 2to3 converter may potentially fail on complex code, the very fact that it was implemented confirms that python developers do care about the issue to some extent. While I definitely agree that it is annoying when a module you rely on is deprecated, there is a strong argument that a clean break is sometimes better than continuous patching of a code that outlived its initial design. 4) slw unless you use it simply as a wrapper for C++, in which case why not just use C++ or C to begin with? Native python is not meant for number-crunching, but wrappers such as scipy allow one to combine python flexibility/readability with speed of compiled binaries. One reason to use python over C/C++ is portability. 5) not thread-safe I am definitely not an expert on this (or anything else), but afaiu this is not unique to python. Cheers, Ed.
Re: [ccp4bb] Off-topic: Best Scripting Language
On Wed, Sep 12, 2012 at 12:49 PM, James Stroud xtald...@gmail.com wrote: Also, python (aka python 2) and python 3000 (aka python 3) are considered two different languages. It's not reasonable to consider them one language and then complain that they are incompatible. Python 3 was created as a new language (and should be treated as such) precisely because it breaks compatibility with python 2. That was the intent of the language authors. Actually, despite having endorsed Python, I have to agree with the complaints about Python 3, for several reasons: 1) It doesn't actually introduce many fundamentally new features that would have changed how we code for it. (Like getting rid of self or the Global Interpreter Lock, or writing the interpreter in C++ and improving the API for writing extensions.) The only really huge change is Unicode support, which is probably good but doesn't really make it a different programming language. 2) The changes that really break code compatibility - like getting rid of the print statement - seem to have been done on a whim rather than because of any pressing need. Maybe this was done to try to force everyone to migrate immediately (since module developers couldn't easily maintain code that works with 2.x and 3.x), but it has had the opposite effect. 3) Development on Python 2 is being shut down. Despite all this, I would still choose Python over nearly anything else for scripting (and most other purposes, but eventually C++ will be necessary too). You blame the authors for recognizing limitations of a language and inventing a new one to overcome those limitations. If the FORTRAN authors would have done that about 30 years ago, we all might be programming in FORTRAN. I think this is what Fortran 90 was supposed to do (unsuccessfully, at least in the world of crystallography) - but F77 code is still valid F90 code, just like ANSI C is still valid C++. -Nat
Re: [ccp4bb] Off-topic: Best Scripting Language
Colleagues: Another country is heard from: Since no one has mentioned MATLAB, let me mention it. --Can easily do any math from 2+2 to matrix SVD etc. --Statistics toolbox does most of what anyone would want. --Lots of easy quick graphics that can be prettied up if needed. --If you know FORTRAN, you already know most of the syntax. --Reasonably easy to write quickies, can also run large calculations quite fast, can even run some stuff on GPUs now. --Largely, not entirely, backwards compatible to earlier versions. --Available for Linux, MAC, Windows. --Great tech support and online help. Disadvantages: --It is not free. --Not so great for text manipulation operations. --Takes a while to learn, so not good for one quickie, but well worth the time to learn it in the long run. George Reeke
Re: [ccp4bb] Off-topic: Best Scripting Language
I encourage trainees to learn a programming language that they will help their careers beyond their short time in my lab. Many or most of them will not continue in structural biology or even science. For the moment, I am pushing python even though I am minimally literate in it myself. They should learn a modern programming language that is widely used beyond my subdiscipline. Python will probably help them get a job more than Fortran. Ho