Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-14 Thread Peter Keller
On Thu, 2012-09-13 at 13:21 -0500, Jacob Keller wrote:


 Java anyone?! 

Your subject line asks about scripting languages. I would think that
even the most inveterate Java advocate would hesitate before
recommending Java for the kind of informal scripting that you were
asking about :-)

 I've actually noticed a bit of a dearth of Java in the crystallography
 world, with some exceptions...

Writing good Java can be quite demanding. Many people seem to find it
easier to make a quick start with interpreted languages like Python and
Perl, especially if they are dynamically typed.

There are also other well-respected scripting languages around that
don't have much traction in the crystallography world, like rexx, lua
and pike, which might be worth a look. However, using a language that is
familiar to your colleagues has some value.

IMHO Java is good for large codebases that are developed over a long
period of time. The static typing and ability of development tools to do
fairly deep code analysis, means that the cost of maintaining and
refactoring in Java doesn't increase with code size as quickly as with a
language like Python, even if the cost of writing the code in the first
place is higher. It also means that fewer test cases need to be written
(if I had a penny for every runtime bug in Python code that I have seen
that would have been picked up at compile time in Java.).

You only get those benefits in full when using a heavyweight development
environment like Eclipse or NetBeans though. Learning to get the best
out of those adds to the up-front time that needs to be spent before you
produce anything useful. I think that a lot of frustration with Java
comes from that. I also don't think that traditional developers' text
editors are really suitable for Java development once the amount of code
involved becomes large: the traditional write, compile, fix, compile,
fix... approach starts to break down then. Syntax highlighting is nice
but only gets you so far.

My 2p worth

Regards,
Peter.


-- 
Peter Keller Tel.: +44 (0)1223 353033
Global Phasing Ltd., Fax.: +44 (0)1223 366889
Sheraton House,
Castle Park,
Cambridge CB3 0AX
United Kingdom


Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-14 Thread Edwin Pozharski

On 09/14/2012 12:30 AM, Eric Bennett wrote:

Actually it's a bit of a hindrance.  In Perl I can call the int function on 
anything and get a sensible answer.  In python if you call int on a string that 
contains a floating point number the default behavior is that it will crash:


The sensible answer you describe may be considered a buggy behavior.  If 
you use int() converter in your code, my guess is that you anticipate it 
will be processing a bunch of strings that are expected to represent 
integers.  If you have a string and you want it to be converted to 
integer even if it actually looks like a float, you can do this


number = int(float(example_string))

or this

import math
number = math.floor(float(example_string))

or perhaps this (more sensible)

number = round(float(example_string))

I wonder what situation you have in mind when forcing non-integer string 
data to become integers with a slightly shorter expression gives an 
advantage. On a broader point, I am sure that perl-bashers can come up 
with examples of said language behavior they may want to ridicule.  
Different computer languages will exhibit different behavior because 
they are, well, different.

That's brain dead.  IMHO of course.

Name-calling is not an argument.  It's not quite Godwin's rule, but still.

Cheers,

Ed.


Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-13 Thread William G. Scott
I'd just use a decent shell scripting language (like zsh) in conjunction with a 
unix tool like awk.  But the gnuplot option sounds ideal.

Bill


William G. Scott
Professor
Department of Chemistry and Biochemistry
and The Center for the Molecular Biology of RNA
228 Sinsheimer Laboratories
University of California at Santa Cruz
Santa Cruz, California 95064
USA


On Sep 12, 2012, at 7:32 AM, Jacob Keller j-kell...@fsm.northwestern.edu 
wrote:

 Dear List,
 
 since this probably comes up a lot in manipulation of pdb/reflection files 
 and so on, I was curious what people thought would be the best language for 
 the following: I have some huge (100s MB) tables of tab-delimited data on 
 which I would like to do some math (averaging, sigmas, simple arithmetic, 
 etc) as well as some sorting and rejecting. It can be done in Excel, but this 
 is exceedingly slow even in 64-bit, so I am looking to do it through some 
 scripting. Just as an example, a sort which takes 10 min in Excel takes 
 ~10 sec max with the unix command sort (seems crazy, no?). Any suggestions?
 
 Thanks, and sorry for being off-topic,
 
 Jacob
 
 -- 
 ***
 Jacob Pearson Keller
 Northwestern University
 Medical Scientist Training Program
 email: j-kell...@northwestern.edu
 ***


Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-13 Thread Soisson, Stephen M
After this religious debate concludes, I propose we return to the old standby - 
vi versus emacs. 

-Original Message-
From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Tim Gruene
Sent: Thursday, September 13, 2012 5:25 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] Off-topic: Best Scripting Language

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi James,

I don't read blaming in George's words, just reasoning for a
personal decision.

Maybe I suffer from similar prejudice: I have the impression that
python programmers spend a lot of effort in trying to convince others
that python is a good choice. Why bother rather than let people make
their own decision?

Cheers,
Tim

On 09/12/2012 09:49 PM, James Stroud wrote:
 
 On Sep 12, 2012, at 1:00 PM, George Sheldrick wrote:
 
 It is the lack of compatibility between different versions 
 mentioned by Ethan that really put me off learning PYTHON.
 
 
 Python is backwards compatible. I have reams of code I wrote in 
 python 2.3 that still works in 2.7 without modification.
 
 Also, python (aka python 2) and python 3000 (aka python 3) are 
 considered two different languages. It's not reasonable to
 consider them one language and then complain that they are
 incompatible. Python 3 was created as a new language (and should be
 treated as such) precisely because it breaks compatibility with
 python 2. That was the intent of the language authors.
 
 You blame the authors for recognizing limitations of a language
 and inventing a new one to overcome those limitations.
 
 If the FORTRAN authors would have done that about 30 years ago, we 
 all might be programming in FORTRAN.
 
 James
 
 

- -- 
- --
Dr Tim Gruene
Institut fuer anorganische Chemie
Tammannstr. 4
D-37077 Goettingen

GPG Key ID = A46BEE1A

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iD8DBQFQUaZRUxlJ7aRr7hoRAimoAJwPAw1o0VqugVERQcYN0RBR424mYgCgu8mF
cwGd1+0swzfudjmf0pgu0ek=
=NIVw
-END PGP SIGNATURE-
Notice:  This e-mail message, together with any attachments, contains
information of Merck  Co., Inc. (One Merck Drive, Whitehouse Station,
New Jersey, USA 08889), and/or its affiliates Direct contact information
for affiliates is available at 
http://www.merck.com/contact/contacts.html) that may be confidential,
proprietary copyrighted and/or legally privileged. It is intended solely
for the use of the individual or entity named on this message. If you are
not the intended recipient, and have received this message in error,
please notify us immediately by reply e-mail and then delete it from 
your system.


Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-13 Thread James Stroud

On Sep 13, 2012, at 3:24 AM, Tim Gruene wrote:
 I have the impression that
 python programmers spend a lot of effort in trying to convince others
 that python is a good choice. Why bother rather than let people make
 their own decision?


Someone asked.

Plus, python programmers put no more effort than any other programmer. It's 
just that python has more advocates (for good reason) so the apparent effort is 
amplified.

Don't hate us because our preferred programming language is beautiful.

James

--
James Stroud

http://www.jamesstroud.com



Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-13 Thread Patrick Shaw Stewart
Like most computer users and many scientists I don't write scripts to
organize or analyse my data unless I get desperate.  I've used both Python
and Perl a few years ago, but it would take quite a lot of time and effort
and staring at on-line tutorials to get back into either of them right now.
 So I end up using massive Excel files that kind of work, but are a pain.
 I've noticed that quite a few structural biologists have the same problem.

I've never understood why there can't be a simple programming language that
is completely self-explanatory bercause it uses English sentences.  Our
robot scripting language uses syntax like

Dispense 0.5 * DropVol ul to TargetWells using ProteinSyringe

That is pretty obvious.


So why can't I have a language where I can write


Carry_out_a_sequence_where

x is 1 to 10

with_step_size 1 :

if
age of person(x) is_greater_than 50
then
print name of person(x) is an old man (or woman) .

Repeat_for_next x .


?


I don't care if it's efficient (anything is efficient compared to Excel) or
if it's easy to write big programs in.  All I care about is that it's easy
to get going.

Later on I can learn to write simply Sequence instead of
Carry_out_a_sequence_where.  I could click a button that would make the
replacement to make my code more compact and readable to a trained eye.
 And of course  is_greater_than  could be written as   .

Any intelligent school-child could understand it too, which would be
fantastic here in the UK where kids aren't taught to program any more.

Does such a language exist?





On 13 September 2012 17:08, James Stroud xtald...@gmail.com wrote:


 On Sep 13, 2012, at 3:24 AM, Tim Gruene wrote:

 I have the impression that
 python programmers spend a lot of effort in trying to convince others
 that python is a good choice. Why bother rather than let people make
 their own decision?


 Someone asked.

 Plus, python programmers put no more effort than any other programmer.
 It's just that python has more advocates (for good reason) so the apparent
 effort is amplified.

 Don't hate us because our preferred programming language is beautiful.

 James

 --
 James Stroud

 http://www.jamesstroud.com




-- 
 patr...@douglas.co.ukDouglas Instruments Ltd.
 Douglas House, East Garston, Hungerford, Berkshire, RG17 7HD, UK
 Directors: Peter Baldock, Patrick Shaw Stewart

 http://www.douglas.co.uk
 Tel: 44 (0) 148-864-9090US toll-free 1-877-225-2034
 Regd. England 2177994, VAT Reg. GB 480 7371 36


Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-13 Thread Richard Gildea
In my opinion, the Python equivalent of your pseudo-code is fairly close to
how you would write the instructions logically. But then maybe not everyone
thinks in the same way that I do :-)

for x in range(1, 10):
  if age_of_person(x)  50:
print name_of_person(x), is an old man (or woman)

Of course you would have to define the functions age_of_person() and
name_of_person() in order for this to work, or you could write it in a more
object-oriented method so you have a Person object which has attributes
name, age, gender, etc. and the code would be even more readable.

for person in list_of_people:
  if person.age  50:
print person.name, is an old , person.gender

Disclaimer: I mainly write in Python, so obviously am naturally biased
towards Python, however I have yet to see another widely used language that
is as readable or intuitive to learn (to me).

Cheers,

Richard

--

Richard Gildea

Software Developer
Physical Biosciences Division
Lawrence Berkeley National Laboratory
1 Cyclotron Rd
Mail Stop 64R0121
Berkeley
CA 94720-8118



On 13 September 2012 10:02, Patrick Shaw Stewart patr...@douglas.co.ukwrote:


 Like most computer users and many scientists I don't write scripts to
 organize or analyse my data unless I get desperate.  I've used both Python
 and Perl a few years ago, but it would take quite a lot of time and effort
 and staring at on-line tutorials to get back into either of them right now.
  So I end up using massive Excel files that kind of work, but are a pain.
  I've noticed that quite a few structural biologists have the same problem.

 I've never understood why there can't be a simple programming language
 that is completely self-explanatory bercause it uses English sentences.
  Our robot scripting language uses syntax like

 Dispense 0.5 * DropVol ul to TargetWells using ProteinSyringe

 That is pretty obvious.


 So why can't I have a language where I can write


 Carry_out_a_sequence_where

  x is 1 to 10

 with_step_size 1 :

 if
 age of person(x) is_greater_than 50
  then
 print name of person(x) is an old man (or woman) .

 Repeat_for_next x .


 ?


 I don't care if it's efficient (anything is efficient compared to Excel)
 or if it's easy to write big programs in.  All I care about is that it's
 easy to get going.

 Later on I can learn to write simply Sequence instead of
 Carry_out_a_sequence_where.  I could click a button that would make the
 replacement to make my code more compact and readable to a trained eye.
  And of course  is_greater_than  could be written as   .

 Any intelligent school-child could understand it too, which would be
 fantastic here in the UK where kids aren't taught to program any more.

 Does such a language exist?





 On 13 September 2012 17:08, James Stroud xtald...@gmail.com wrote:


 On Sep 13, 2012, at 3:24 AM, Tim Gruene wrote:

 I have the impression that
 python programmers spend a lot of effort in trying to convince others
 that python is a good choice. Why bother rather than let people make
 their own decision?


 Someone asked.

 Plus, python programmers put no more effort than any other programmer.
 It's just that python has more advocates (for good reason) so the apparent
 effort is amplified.

 Don't hate us because our preferred programming language is beautiful.

 James

   --
 James Stroud

 http://www.jamesstroud.com




 --
  patr...@douglas.co.ukDouglas Instruments Ltd.
  Douglas House, East Garston, Hungerford, Berkshire, RG17 7HD, UK
  Directors: Peter Baldock, Patrick Shaw Stewart

  http://www.douglas.co.uk
  Tel: 44 (0) 148-864-9090US toll-free 1-877-225-2034
  Regd. England 2177994, VAT Reg. GB 480 7371 36




Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-13 Thread James Stroud
On Sep 13, 2012, at 11:02 AM, Patrick Shaw Stewart wrote:
 Like most computer users and many scientists I don't write scripts to 
 organize or analyse my data unless I get desperate.  I've used both Python 
 and Perl a few years ago, but it would take quite a lot of time and effort 
 and staring at on-line tutorials to get back into either of them right now.  
 So I end up using massive Excel files that kind of work, but are a pain.  
 I've noticed that quite a few structural biologists have the same problem.
 
 I've never understood why there can't be a simple programming language that 
 is completely self-explanatory bercause it uses English sentences.

Yeah. They tried that. It's called AppleScript and is a complete disaster for 
programmers simply because of its vague resemblance to natural language. There 
are essays on this issue [1, 2], but other than the message stay away from 
programming languages that try to be natural languages, these essays are 
mostly academic.

It turns out that the syntax and semantics of all reasonable programming 
languages are very similar, or fall into only a few classes (e.g. C-like, 
S-expressions, etc.), so once you are fluent in one from a class, it's easy 
to pick up the others. This can't be said of natural languages, which are full 
of idioms and grammatical exceptions, even in closely related dialects.

James


[1] 
http://www.codinghorror.com/blog/2006/08/computer-languages-arent-human-languages.html
[2] http://daringfireball.net/2005/09/englishlikeness_monster

Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-13 Thread Jacob Keller

 It turns out that the syntax and semantics of all reasonable programming
 languages are very similar, or fall into only a few classes (e.g. C-like,
 S-expressions, etc.), so once you are fluent in one from a class, it's
 easy to pick up the others. This can't be said of natural languages, which
 are full of idioms and grammatical exceptions, even in closely related
 dialects.


This is more opinions than I can shake a stick at! Don't we all have other
fish to fry (or for the French, other cats to whip? Other national
equivalents?) Anyway, I was nervous as a long-tailed cat in a room full of
rocking chairs to ask this question, and look at the Pandora's box that
this has opened!

Java anyone?! I've actually noticed a bit of a dearth of Java in the
crystallography world, with some exceptions...

Thanks everybody for your suggestions--I will mull them over, since you all
make such good arguments (not meant in the programming sense, but probably
that's true too!),

Jacob





 James


 [1]
 http://www.codinghorror.com/blog/2006/08/computer-languages-arent-human-languages.html
 [2] http://daringfireball.net/2005/09/englishlikeness_monster




-- 
***
Jacob Pearson Keller
Northwestern University
Medical Scientist Training Program
email: j-kell...@northwestern.edu
***


Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-13 Thread Eric Williams
Another option that could be cheap or free (if your university offers
license deals, as mine does) is SPSS. It has a lot of the quick and dirty
spreadsheet functionality of Excel, is much faster than Excel with large
tables, has lots of good analysis tools, has its own scripting language,
and is compatible with R and Python.

Eric

On Thu, Sep 13, 2012 at 2:21 PM, Jacob Keller 
j-kell...@fsm.northwestern.edu wrote:

 It turns out that the syntax and semantics of all reasonable programming
 languages are very similar, or fall into only a few classes (e.g. C-like,
 S-expressions, etc.), so once you are fluent in one from a class, it's
 easy to pick up the others. This can't be said of natural languages, which
 are full of idioms and grammatical exceptions, even in closely related
 dialects.


 This is more opinions than I can shake a stick at! Don't we all have other
 fish to fry (or for the French, other cats to whip? Other national
 equivalents?) Anyway, I was nervous as a long-tailed cat in a room full of
 rocking chairs to ask this question, and look at the Pandora's box that
 this has opened!

 Java anyone?! I've actually noticed a bit of a dearth of Java in the
 crystallography world, with some exceptions...

 Thanks everybody for your suggestions--I will mull them over, since you
 all make such good arguments (not meant in the programming sense, but
 probably that's true too!),

 Jacob





 James


 [1]
 http://www.codinghorror.com/blog/2006/08/computer-languages-arent-human-languages.html
 [2] http://daringfireball.net/2005/09/englishlikeness_monster




 --
 ***
 Jacob Pearson Keller
 Northwestern University
 Medical Scientist Training Program
 email: j-kell...@northwestern.edu
 ***



Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-13 Thread Eric Bennett
On Sep 12, 2012, at 2:28 PM, Ethan Merritt wrote:


 Why are you dis-ing python? Seems everybody loves it...
 
 I'm sure you can google for many reasons I hate Python lists.
 
 Mine would start
 1) sensitive to white space == fail
 2) dynamic typing makes it nearly impossible to verify program correctness,
   and very hard to debug problems that arise from unexpected input or
   a mismatch between caller and callee.   
 3) the language developers don't care about backward compatibility;
   it seems version 2.n+1 always breaks code written for version 2.n, 
   and let's not even talk about version 3
 4) slw unless you use it simply as a wrapper for C++,
   in which case why not just use C++ or C to begin with?
 5) not thread-safe
 
you did ask...
   
   Ethan
 


While I agree generally with your points and try to avoid python if at all 
possible, I'm not sure about what you mean with point 5, since it's certainly 
possible to write threaded python scripts.

Another point that is purely personal taste is the language philosophy that 
there is one official way to do something in Python, as contrasted with Perl 
(which is my choice) where the language philosophy is that there are many ways 
of doing any given task and the language is not designed to force you into a 
particular way of doing it.




Ed adds:

 While indeed 1/3=0 (but so it will be in C), I think it's a bit of an 
 overstatement that python code execution is nearly impossible to verify.
 Another goal of python is to accelerate implementation, and dynamic/duck 
 typing supposedly helps that.  The argument is simply that weak typing 
 favours strong testing, which should be a good thing.


Actually it's a bit of a hindrance.  In Perl I can call the int function on 
anything and get a sensible answer.  In python if you call int on a string that 
contains a floating point number the default behavior is that it will crash:


[woz:~] bennette% cat pytest.py
example_string = 10.3
number = int(example_string)

[woz:~] bennette% /Library/Frameworks/Python.framework/Versions/2.6/bin/python 
pytest.py
Traceback (most recent call last):
  File pytest.py, line 2, in module
number = int(example_string)
ValueError: invalid literal for int() with base 10: '10.3'





That's brain dead.  IMHO of course.

Cheers,
Eric


[ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread Jacob Keller
Dear List,

since this probably comes up a lot in manipulation of pdb/reflection files
and so on, I was curious what people thought would be the best language for
the following: I have some huge (100s MB) tables of tab-delimited data on
which I would like to do some math (averaging, sigmas, simple arithmetic,
etc) as well as some sorting and rejecting. It can be done in Excel, but
this is exceedingly slow even in 64-bit, so I am looking to do it through
some scripting. Just as an example, a sort which takes 10 min in Excel
takes ~10 sec max with the unix command sort (seems crazy, no?). Any
suggestions?

Thanks, and sorry for being off-topic,

Jacob

-- 
***
Jacob Pearson Keller
Northwestern University
Medical Scientist Training Program
email: j-kell...@northwestern.edu
***


Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread Eric Williams
Try R. :)

http://www.r-project.org/

Eric

On Wed, Sep 12, 2012 at 10:32 AM, Jacob Keller 
j-kell...@fsm.northwestern.edu wrote:

 Dear List,

 since this probably comes up a lot in manipulation of pdb/reflection files
 and so on, I was curious what people thought would be the best language for
 the following: I have some huge (100s MB) tables of tab-delimited data on
 which I would like to do some math (averaging, sigmas, simple arithmetic,
 etc) as well as some sorting and rejecting. It can be done in Excel, but
 this is exceedingly slow even in 64-bit, so I am looking to do it through
 some scripting. Just as an example, a sort which takes 10 min in Excel
 takes ~10 sec max with the unix command sort (seems crazy, no?). Any
 suggestions?

 Thanks, and sorry for being off-topic,

 Jacob

 --
 ***
 Jacob Pearson Keller
 Northwestern University
 Medical Scientist Training Program
 email: j-kell...@northwestern.edu
 ***



Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread George M. Sheldrick
I always use FORTRAN for such tasks, especially if speed is important.

George

On 09/12/2012 04:32 PM, Jacob Keller wrote:
 Dear List,
 
 since this probably comes up a lot in manipulation of pdb/reflection
 files and so on, I was curious what people thought would be the best
 language for the following: I have some huge (100s MB) tables of
 tab-delimited data on which I would like to do some math (averaging,
 sigmas, simple arithmetic, etc) as well as some sorting and rejecting.
 It can be done in Excel, but this is exceedingly slow even in 64-bit, so
 I am looking to do it through some scripting. Just as an example, a
 sort which takes 10 min in Excel takes ~10 sec max with the unix
 command sort (seems crazy, no?). Any suggestions?
 
 Thanks, and sorry for being off-topic,
 
 Jacob
 
 -- 
 ***
 Jacob Pearson Keller
 Northwestern University
 Medical Scientist Training Program
 email: j-kell...@northwestern.edu mailto:j-kell...@northwestern.edu
 ***

-- 
Prof. George M. Sheldrick FRS
Dept. Structural Chemistry,
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-22582


Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread DUMAS Philippe (UDS)

Le Mercredi 12 Septembre 2012 16:40 CEST, George M. Sheldrick 
gshe...@shelx.uni-ac.gwdg.de a écrit:

May I add a little personal joke to the serious remark by George.

This remembers me a discussion I had with Jorge Navaza, let's say 15 years ago, 
about the programming language of the future.
(To a good approximation, 15 years ago, the future was now)
The answer by Jorge was: I don't know what it will be, but I know it's name 
will be FORTRAN.
I hope he will confirm the statement...
Philippe Dumas


 I always use FORTRAN for such tasks, especially if speed is important.

 George

 On 09/12/2012 04:32 PM, Jacob Keller wrote:
  Dear List,
 
  since this probably comes up a lot in manipulation of pdb/reflection
  files and so on, I was curious what people thought would be the best
  language for the following: I have some huge (100s MB) tables of
  tab-delimited data on which I would like to do some math (averaging,
  sigmas, simple arithmetic, etc) as well as some sorting and rejecting.
  It can be done in Excel, but this is exceedingly slow even in 64-bit, so
  I am looking to do it through some scripting. Just as an example, a
  sort which takes 10 min in Excel takes ~10 sec max with the unix
  command sort (seems crazy, no?). Any suggestions?
 
  Thanks, and sorry for being off-topic,
 
  Jacob
 
  --
  ***
  Jacob Pearson Keller
  Northwestern University
  Medical Scientist Training Program
  email: j-kell...@northwestern.edu mailto:j-kell...@northwestern.edu
  ***

 --
 Prof. George M. Sheldrick FRS
 Dept. Structural Chemistry,
 University of Goettingen,
 Tammannstr. 4,
 D37077 Goettingen, Germany
 Tel. +49-551-39-3021 or -3068
 Fax. +49-551-39-22582






Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread Pete Meyer
One thing to keep in mind is that there's usually a trade-off between 
setup (writing and testing) and execution time.  For one-off data 
processing, I'd focus on implementation speed rather than execution 
speed (in other words, FORTRAN might not be ideal unless you're already 
fluent with it).


That said, I'd take a look at python, octave or R.  Python's relatively 
easy to learn, and more flexible than octave/R; but it doesn't have the 
built-in statistic functions that octave and R do.


One other tip which you've probably already though of - Depending on 
your runtimes (I don't think 100s MB of data is usually considered an 
enormous amount, but it'll depend on what you're doing) it may be worth 
getting things working on a small subset of the data first.


Pete

Jacob Keller wrote:

Dear List,

since this probably comes up a lot in manipulation of pdb/reflection files
and so on, I was curious what people thought would be the best language for
the following: I have some huge (100s MB) tables of tab-delimited data on
which I would like to do some math (averaging, sigmas, simple arithmetic,
etc) as well as some sorting and rejecting. It can be done in Excel, but
this is exceedingly slow even in 64-bit, so I am looking to do it through
some scripting. Just as an example, a sort which takes 10 min in Excel
takes ~10 sec max with the unix command sort (seems crazy, no?). Any
suggestions?

Thanks, and sorry for being off-topic,

Jacob



Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread Nat Echols
On Wed, Sep 12, 2012 at 7:32 AM, Jacob Keller
j-kell...@fsm.northwestern.edu wrote:
 since this probably comes up a lot in manipulation of pdb/reflection files
 and so on, I was curious what people thought would be the best language for
 the following: I have some huge (100s MB) tables of tab-delimited data on
 which I would like to do some math (averaging, sigmas, simple arithmetic,
 etc) as well as some sorting and rejecting. It can be done in Excel, but
 this is exceedingly slow even in 64-bit, so I am looking to do it through
 some scripting. Just as an example, a sort which takes 10 min in Excel
 takes ~10 sec max with the unix command sort (seems crazy, no?). Any
 suggestions?

Anything but Fortran.

Seriously, there are probably a dozen (or more) good solutions, and it
depends on whose syntax you prefer, what external libraries you need,
whether you want to someday apply your new programming skills to
another project, and whether you want anyone else to be able to read
your code.  For me, Python wins easily, but the suggestions of Octave
or R are probably just as good for a one-time script of the sort you
describe.

-Nat


Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread Carter, Charlie
A similar remark was made to me by David Blow, while he was on sabbatical at 
UNC in the 1980s, working with the UNC Computer Science Department and in a 
moment of intense frustration with the overpowering ignorance of fortran and 
the enthusiasm for Unix exhibited by that department.

Charlie 
On Sep 12, 2012, at 10:58 AM, DUMAS Philippe (UDS) wrote:

 This remembers me a discussion I had with Jorge Navaza, let's say 15 years 
 ago, about the programming language of the future.
 (To a good approximation, 15 years ago, the future was now)
 The answer by Jorge was: I don't know what it will be, but I know it's name 
 will be FORTRAN.
 I hope he will confirm the statement...
 Philippe Dumas


Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread Ethan Merritt
On Wednesday, September 12, 2012 07:32:54 am Jacob Keller wrote:
 Dear List,
 
 since this probably comes up a lot in manipulation of pdb/reflection files
 and so on, I was curious what people thought would be the best language for
 the following: I have some huge (100s MB) tables of tab-delimited data on
 which I would like to do some math (averaging, sigmas, simple arithmetic,
 etc) as well as some sorting and rejecting. It can be done in Excel, but
 this is exceedingly slow even in 64-bit, so I am looking to do it through
 some scripting. Just as an example, a sort which takes 10 min in Excel
 takes ~10 sec max with the unix command sort (seems crazy, no?). Any
 suggestions?

For the specific purpose you list -
input from tab-delimited data
output to simple statisitical summaries and (I assume) plots
- it sounds like gnuplot could do the job nicely.

Otherwise I'd recommend perl, and dis-recommend python.

Ethan


-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742


Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread Jacob Keller

 For the specific purpose you list -
 input from tab-delimited data
 output to simple statisitical summaries and (I assume) plots
 - it sounds like gnuplot could do the job nicely.


I wasn't aware that gnuplot can do calculations--can it? I was probably
going to use it somewhere as a plotting option.


 Otherwise I'd recommend perl, and dis-recommend python.


Why are you dis-ing python? Seems everybody loves it...

JPK





 Ethan


 --
 Ethan A Merritt
 Biomolecular Structure Center,  K-428 Health Sciences Bldg
 University of Washington, Seattle 98195-7742




-- 
***
Jacob Pearson Keller
Northwestern University
Medical Scientist Training Program
email: j-kell...@northwestern.edu
***


Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread Edwin Pozharski

All you need is scipy library to get those pesky statistic functions :)

On 09/12/2012 11:11 AM, Pete Meyer wrote:
Python's relatively easy to learn, and more flexible than octave/R; 
but it doesn't have the built-in statistic functions that octave and R 
do. 


Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread Sabuj Pattanayek
 Why are you dis-ing python? Seems everybody loves it...

Depends on if you like the object model, some don't. In the end it
really boils down to what you're used to and what you've learned to
use.


Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread Soisson, Stephen M
Now is the time when I start waxing nostalgic about the old days when there 
used to be entire threads on this bulletin board about Fortran format statement 
syntax for parsing various files.and I read them with great interest

How did I get to be such a geezer? 

-Original Message-
From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Carter, 
Charlie
Sent: Wednesday, September 12, 2012 11:17 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] Off-topic: Best Scripting Language

A similar remark was made to me by David Blow, while he was on sabbatical at 
UNC in the 1980s, working with the UNC Computer Science Department and in a 
moment of intense frustration with the overpowering ignorance of fortran and 
the enthusiasm for Unix exhibited by that department.

Charlie 
On Sep 12, 2012, at 10:58 AM, DUMAS Philippe (UDS) wrote:

 This remembers me a discussion I had with Jorge Navaza, let's say 15 years 
 ago, about the programming language of the future.
 (To a good approximation, 15 years ago, the future was now)
 The answer by Jorge was: I don't know what it will be, but I know it's name 
 will be FORTRAN.
 I hope he will confirm the statement...
 Philippe Dumas
Notice:  This e-mail message, together with any attachments, contains
information of Merck  Co., Inc. (One Merck Drive, Whitehouse Station,
New Jersey, USA 08889), and/or its affiliates Direct contact information
for affiliates is available at 
http://www.merck.com/contact/contacts.html) that may be confidential,
proprietary copyrighted and/or legally privileged. It is intended solely
for the use of the individual or entity named on this message. If you are
not the intended recipient, and have received this message in error,
please notify us immediately by reply e-mail and then delete it from 
your system.


Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread James Stroud

On Sep 12, 2012, at 9:11 AM, Pete Meyer wrote:

 That said, I'd take a look at python, octave or R.  Python's relatively easy 
 to learn, and more flexible than octave/R; but it doesn't have the built-in 
 statistic functions that octave and R do.


import scipy


Now it does!



Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread George Sheldrick
It is the lack of compatibility between different versions mentioned by 
Ethan that really put me off learning PYTHON. In contrast, the 
FORTRAN-66 program SHELX76 still compiles and runs correctly with any 
modern FORTRAN compiler. The only significant 'new' features that I now 
use are dynamic array allocation (introduced in FORTRAN-90) and OpenMP 
support for multiple CPUs, but even programs using OpenMP would still 
work with older compilers because the OpenMP instructions would be 
treated as comments.


George

On 09/12/2012 08:28 PM, Ethan Merritt wrote:

On Wednesday, September 12, 2012 09:52:09 am Jacob Keller wrote:

For the specific purpose you list -
input from tab-delimited data
output to simple statisitical summaries and (I assume) plots
- it sounds like gnuplot could do the job nicely.


I wasn't aware that gnuplot can do calculations--can it? I was probably
going to use it somewhere as a plotting option.

Here's a simple-minded example using a dump of the current contents
of the PDB from www.pdb.org as a comma-separated file with ~65000 entries.
The input file was previously filtered to contain only X-ray structures
between 1 and 4 Angstroms resolution.

gnuplot  !head -3 PDB.csv
PDB ID,R Observed,R All,R Work,R Free,Refinement Resolution
100D,0.145,,0.145,,1.90
101D,0.163,,,0.252,2.25

gnuplot  set datafile separater ,
gnuplot  set datafile nofpe_trap   # trap handling greatly slows large data 
sets
gnuplot  stats 'PDB.csv' using R Observed prefix Robs

* FILE:
   Records:  63029
   Out of range: 0
   Invalid:  0
   Blank:2
   Data Blocks:  2

* COLUMN:
   Mean:  0.1982
   Std Dev:   0.0334
   Sum:   12494.6900
   Sum Sq.:2547.3068

   Minimum:   0.0450 [24518]
   Maximum:   0.9700 [45024]
   Quartile:  0.1770
   Median:0.1970
   Quartile:  0.2180

gnuplot  print Robs_mean
  0.198237160672072

gnuplot  #calculate correlation of Robs with Resolution
gnuplot  stats 'PDB.cvs' using R Observed:Refinement Resolution  nooutput
gnuplot  print STATS_correlation
  0.595763711910418

I've attached graphical output of the same data following some sorting,
filtered, binning, etc, with output to a PDF file.

You can do all this in R also.   R has a larger collection of statistics 
options,
but is not as good at dealing with really large data sets.  IMHO gnuplot has 
more
flexible options for graphical output.


Otherwise I'd recommend perl, and dis-recommend python.


Why are you dis-ing python? Seems everybody loves it...

I'm sure you can google for many reasons I hate Python lists.

Mine would start
1) sensitive to white space == fail
2) dynamic typing makes it nearly impossible to verify program correctness,
and very hard to debug problems that arise from unexpected input or
a mismatch between caller and callee.
3) the language developers don't care about backward compatibility;
it seems version 2.n+1 always breaks code written for version 2.n,
and let's not even talk about version 3
4) slw unless you use it simply as a wrapper for C++,
in which case why not just use C++ or C to begin with?
5) not thread-safe

 you did ask...

Ethan




--
Prof. George M. Sheldrick FRS
Dept. Structural Chemistry,
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-22582


Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread James Stroud
Python sorting 1 records of 1 floats for each record,  finding the max, 
min, and mean of entire 100,000,000 32 bit float array (400 MB) on a 6 year old 
white imac.

 *11.6 seconds.

*This doesn't include the time to generate the 400 MB of random (normal) data.

Try it on your own computer. Here's the copy-paste from mine:

py import timeit
py timeit.timeit('big_data.sort(axis=0), big_data.mean(); big_data.max(); 
big_data.min();',
 'import numpy; big_data=numpy.random.normal(10, 
size=1e8).reshape((1e4,1e4)); print random data made, starting...',
 number=1)
random data made, starting...
11.597978115081787

James




On Sep 12, 2012, at 8:32 AM, Jacob Keller wrote:

 Dear List,
 
 since this probably comes up a lot in manipulation of pdb/reflection files 
 and so on, I was curious what people thought would be the best language for 
 the following: I have some huge (100s MB) tables of tab-delimited data on 
 which I would like to do some math (averaging, sigmas, simple arithmetic, 
 etc) as well as some sorting and rejecting. It can be done in Excel, but this 
 is exceedingly slow even in 64-bit, so I am looking to do it through some 
 scripting. Just as an example, a sort which takes 10 min in Excel takes 
 ~10 sec max with the unix command sort (seems crazy, no?). Any suggestions?
 
 Thanks, and sorry for being off-topic,
 
 Jacob
 
 -- 
 ***
 Jacob Pearson Keller
 Northwestern University
 Medical Scientist Training Program
 email: j-kell...@northwestern.edu
 ***



Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread Pavel Afonine
Hi,

Python, of course (if you know some basic math). Otherwise, Python and a
good math text book -:)

Pavel

On Wed, Sep 12, 2012 at 7:32 AM, Jacob Keller 
j-kell...@fsm.northwestern.edu wrote:

 Dear List,

 since this probably comes up a lot in manipulation of pdb/reflection files
 and so on, I was curious what people thought would be the best language for
 the following: I have some huge (100s MB) tables of tab-delimited data on
 which I would like to do some math (averaging, sigmas, simple arithmetic,
 etc) as well as some sorting and rejecting. It can be done in Excel, but
 this is exceedingly slow even in 64-bit, so I am looking to do it through
 some scripting. Just as an example, a sort which takes 10 min in Excel
 takes ~10 sec max with the unix command sort (seems crazy, no?). Any
 suggestions?

 Thanks, and sorry for being off-topic,

 Jacob

 --
 ***
 Jacob Pearson Keller
 Northwestern University
 Medical Scientist Training Program
 email: j-kell...@northwestern.edu
 ***



Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread James Stroud

On Sep 12, 2012, at 1:00 PM, George Sheldrick wrote:

 It is the lack of compatibility between different versions mentioned by Ethan 
 that really put me off learning PYTHON.


Python is backwards compatible. I have reams of code I wrote in python 2.3 that 
still works in 2.7 without modification.

Also, python (aka python 2) and python 3000 (aka python 3) are considered two 
different languages. It's not reasonable to consider them one language and then 
complain that they are incompatible. Python 3 was created as a new language 
(and should be treated as such) precisely because it breaks compatibility with 
python 2. That was the intent of the language authors.

You blame the authors for recognizing limitations of a language and inventing a 
new one to overcome those limitations.

If the FORTRAN authors would have done that about 30 years ago, we all might be 
programming in FORTRAN.

James



Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread Edwin Pozharski

Ethan,

I think majority of your complaints about python result from its very 
purpose - to be readable/portable for the sake of facilitating rapid 
implementation.  There are many other languages that provide tools to 
accomplish what Jacob wants to do (well, I would stay away from P''), 
but python definitely is a good option for casual calculations.


On 09/12/2012 02:28 PM, Ethan Merritt wrote:

I'm sure you can google for many reasons I hate Python lists.

Mine would start
1) sensitive to white space == fail
every language has a way to group lines of code.  Curly brackets are 
fine, but python is designed to force code readability, and preceding 
white space (btw, everywhere else it is ignored) does that.



2) dynamic typing makes it nearly impossible to verify program correctness,
and very hard to debug problems that arise from unexpected input or
a mismatch between caller and callee.


While indeed 1/3=0 (but so it will be in C), I think it's a bit of an 
overstatement that python code execution is nearly impossible to verify.
Another goal of python is to accelerate implementation, and dynamic/duck 
typing supposedly helps that.  The argument is simply that weak typing 
favours strong testing, which should be a good thing.



3) the language developers don't care about backward compatibility;
it seems version 2.n+1 always breaks code written for version 2.n,
and let's not even talk about version 3


I don't think that's entirely true either, why would they then backport 
certain features from v3?  The decision to not provide backward 
compatibility was well explained.  While 2to3 converter may potentially 
fail on complex code, the very fact that it was implemented confirms 
that python developers do care about the issue to some extent. While I 
definitely agree that it is annoying when a module you rely on is 
deprecated, there is a strong argument that a clean break is sometimes 
better than continuous patching of a code that outlived its initial design.



4) slw unless you use it simply as a wrapper for C++,
in which case why not just use C++ or C to begin with?


Native python is not meant for number-crunching, but wrappers such as 
scipy allow one to combine python flexibility/readability with speed of 
compiled binaries.  One reason to use python over C/C++ is portability.



5) not thread-safe
I am definitely not an expert on this (or anything else), but afaiu this 
is not unique to python.


Cheers,

Ed.


Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread Nat Echols
On Wed, Sep 12, 2012 at 12:49 PM, James Stroud xtald...@gmail.com wrote:
 Also, python (aka python 2) and python 3000 (aka python 3) are considered
 two different languages. It's not reasonable to consider them one language
 and then complain that they are incompatible. Python 3 was created as a new
 language (and should be treated as such) precisely because it breaks
 compatibility with python 2. That was the intent of the language authors.

Actually, despite having endorsed Python, I have to agree with the
complaints about Python 3, for several reasons:

1) It doesn't actually introduce many fundamentally new features that
would have changed how we code for it.  (Like getting rid of self or
the Global Interpreter Lock, or writing the interpreter in C++ and
improving the API for writing extensions.)  The only really huge
change is Unicode support, which is probably good but doesn't really
make it a different programming language.
2) The changes that really break code compatibility - like getting rid
of the print statement - seem to have been done on a whim rather than
because of any pressing need.  Maybe this was done to try to force
everyone to migrate immediately (since module developers couldn't
easily maintain code that works with 2.x and 3.x), but it has had the
opposite effect.
3) Development on Python 2 is being shut down.

Despite all this, I would still choose Python over nearly anything
else for scripting (and most other purposes, but eventually C++ will
be necessary too).

 You blame the authors for recognizing limitations of a language and
 inventing a new one to overcome those limitations.
 If the FORTRAN authors would have done that about 30 years ago, we all might
 be programming in FORTRAN.

I think this is what Fortran 90 was supposed to do (unsuccessfully, at
least in the world of crystallography) - but F77 code is still valid
F90 code, just like ANSI C is still valid C++.

-Nat


Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread George Reeke
Colleagues:  Another country is heard from:
Since no one has mentioned MATLAB, let me mention it.
--Can easily do any math from 2+2 to matrix SVD etc.
--Statistics toolbox does most of what anyone would want.
--Lots of easy quick graphics that can be prettied up if needed.
--If you know FORTRAN, you already know most of the syntax.
--Reasonably easy to write quickies, can also run large
  calculations quite fast, can even run some stuff on GPUs now.
--Largely, not entirely, backwards compatible to earlier versions.
--Available for Linux, MAC, Windows.
--Great tech support and online help.
Disadvantages:
--It is not free.
--Not so great for text manipulation operations.
--Takes a while to learn, so not good for one quickie, but well
  worth the time to learn it in the long run.

George Reeke


Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread Ho Leung Ng
 I encourage trainees to learn a programming language that they
will help their careers beyond their short time in my lab. Many or
most of them will not continue in structural biology or even science.
For the moment, I am pushing python even though I am minimally
literate in it myself. They should learn a modern programming
language that is widely used beyond my subdiscipline. Python will
probably help them get a job more than Fortran.


Ho