Re: [ccp4bb] Refmac executables - win vs linux in RHEL VM

2012-04-08 Thread Nicholas M Glykos
Hi Bernhard,

 Maybe the paranoia-checkers in windows slow everything down
 although I did not see any resources overwhelmed...

I wonder whether the windoze refmac binaries can be used through wine in a 
GNU/Linux environment. If yes, then you could possibly differentiate 
between the operating-system-dependent and compiler-specific hypotheses.

Nicholas


-- 


Nicholas M. Glykos, Department of Molecular Biology
 and Genetics, Democritus University of Thrace, University Campus,
  Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620,
Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/


Re: [ccp4bb] Refmac executables - win vs linux in RHEL VM

2012-04-08 Thread Nicholas M Glykos
Hi Nat,

 one of my colleagues found (on Linux) that the exp() function provided 
 by g77 was 20-fold slower than the equivalent in the Intel math library.

I do not know whether this has recently been changed, but the license for 
icc-produced executables used to be rather restrictive. If I remember 
correctly, you were not allowed to distribute the binaries, full stop. 
This together with the fact that until recently (icc v.11.0.074) the 
icc-produced executables would not run on specific AMD-based hardware, had 
made me return to the safety of gcc.

My twocents,
Nicholas

-- 


Nicholas M. Glykos, Department of Molecular Biology
 and Genetics, Democritus University of Thrace, University Campus,
  Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620,
Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/


Re: [ccp4bb] Refmac executables - win vs linux in RHEL VM

2012-04-08 Thread Ian Tickle
 I do not know whether this has recently been changed, but the license for
 icc-produced executables used to be rather restrictive. If I remember
 correctly, you were not allowed to distribute the binaries, full stop.

Nicholas, this restriction applies (and has always applied) only to
Intel's 'evaluation' licence: i.e. you get to try the Intel compilers
free for 1 month, but you're not allowed to redistribute any
executables you create with them.  I don't know if this means that the
software actually stops working after a month, I guess it does
-they're not as trusting as they used to be!

Intel's EULA for all their Software Development Products (including
all their compilers) states:

Subject to all of the terms and conditions of this Agreement and any
specific restrictions which may appear in the Redistributables text
files, Intel grants to you a non-exclusive, non-assignable, fully-paid
copyright license to distribute (except if you received the Materials
under an Evaluation License as specified below) the Redistributables,
including any modifications pursuant to Section 2.B, or any portions
thereof, as part of the product or application you developed using the
Materials..

I had our lawyers check this ~10 years ago when the compiler was at
version ~7 (it's now at 11), since we are commercial and wanted to
distribute our own sources  executables, and the conditions on
redistribution of user-created executables have not changed in essence
since then (obviously redistribution of the compiler executables
themselves has never been allowed).  What has changed is that the
licence conditions have become somewhat more restrictive in the sense
that academic institutional users are no longer eligible for free
licences! - though they do get a discount off the fully paid-up
commercial licence.  A personal non-commercial licence (which does not
cover use by academics) is still free.  In all cases (except
evaluation) executables can be freely distributed, along with any of
Intel's DLLs that are required to run it.

Please note that I have no financial interest in Intel ;).

Cheers

-- Ian


Re: [ccp4bb] Refmac executables - win vs linux in RHEL VM

2012-04-08 Thread Harry Powell

Hi

I suspect that this is more to do with the amount of memory required,  
size of arrays etc; refinement will (in general) be more demanding in  
terms of these than an integration program like Mosflm. The last time  
I compared the Mosflm performance (which was a few years ago),  
running the same batch job on OSX 10.4 (Tiger), and on Windows XP and  
Linux Feisty Fawn (so you can tell how long ago this was) - both the  
latter running under virtual machines on the same 32-bit Intel Mac  
that the OSX job ran on) there was essentially no difference in  
performance (though I have a vague memory of Ubuntu being a little  
faster, maybe ~3%).


Some caveats -

* I used a gfortran build for OSX and Linux, g77 build for Windows
* I didn't spend too much time on this
* I wasn't running a GUI - all three as foregrounded jobs, nothing  
else running on the machine (I tried to make sure only the OS and  
essential services were running). So this wasn't a batch job in the  
traditional sense...
* gfortran builds these days are considerably faster (and compare  
well to ifort builds)


On 7 Apr 2012, at 17:50, Roger Rowlett wrote:

I don't know the state of current software, because I haven't tried  
recently, but when I set up my student crystallography workstations  
a few years back I noticed many packages (e.g. EPMR, Phaser) that  
had potentially long run times (where it is really noticeable)  
would run on the identical hardware about 2-3 times faster in Linux  
than in Windows XP. Memory swapping wasn't the issue. I was  
astounded there could be that much overhead in Windows. A Linux VM  
on a windows machine being faster than native Win7 is pretty weird,  
though.


Cheers,



On 4/7/2012 11:42 AM, Bernhard Rupp (Hofkristallrat a.D.) wrote:


Something the developers might be interested in:

The  Refmac_5.6.0117 32-bit windows binaries run native on a  
win64  3-4x

slower than
those from  the linux distribution run
**in a RHEL6.2-64 VMware virtual machine  hosted the same windows7/64
system.**
VM/RHEL:
Refmac_5.6.0117:  End of Refmac_5.6.0117
Times: User:1015.3s System:  135.0s Elapsed:19:17
Win native
Refmac_5.6.0117:  End of Refmac_5.6.0117
Times: User:   0.0s System:0.0s Elapsed:67:49

Most peculiaralthough I think but I do not know whether the linux
binaries are 64 bit
I don't think that address space is the issue here if they are.

Maybe the paranoia-checkers in windows slow everything down
although I did not see any resources overwhelmed...

Best regards, BR
-
Bernhard Rupp
001 (925) 209-7429
+43 (676) 571-0536
b...@ruppweb.org
hofkristall...@gmail.com
http://www.ruppweb.org/
-
No animals were hurt or killed during the
production of this email.
-


Harry
--
Dr Harry Powell, MRC Laboratory of Molecular Biology, MRC Centre,  
Hills Road, Cambridge, CB2 0QH






Re: [ccp4bb] Refmac executables - win vs linux in RHEL VM

2012-04-08 Thread Nikolaos Glykos

Hi Ian,


Nicholas, this restriction applies (and has always applied) only to
Intel's 'evaluation' licence


That's right. With a cost of $9,997.00 for a 3-years/2-seats academic 
license,

I couldn't have been talking for anything else ... :-)))

All the best,
Nicholas


--
Nicholas M. Glykos, Department of Molecular Biology
 and Genetics, Democritus University of Thrace, University Campus,
  Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) 
+302551030620,

Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/


Re: [ccp4bb] Refmac executables - win vs linux in RHEL VM

2012-04-08 Thread Ben Eisenbraun
On Sun, Apr 08, 2012 at 03:59:22PM +0300, Nikolaos Glykos wrote:
 Nicholas, this restriction applies (and has always applied) only to
 Intel's 'evaluation' licence
 
 That's right. With a cost of $9,997.00 for a 3-years/2-seats academic 
 license, I couldn't have been talking for anything else ... :-)))

Is that a joke? Or did I miss something? We pay about $900 USD/year for our
single seat, academic license that includes both the Linux and OS X
versions of the Intel Compilers.

And if you're an active scientific software developer, we'll let you use
them for free:

http://www.sbgrid.org/wiki/developers/support

-ben

--
| Ben Eisenbraun
| SBGrid Consortium  | http://sbgrid.org   |
| Harvard Medical School | http://hms.harvard.edu  |


Re: [ccp4bb] Refmac executables - win vs linux in RHEL VM

2012-04-08 Thread Ian Tickle
 That's right. With a cost of $9,997.00 for a 3-years/2-seats academic
 license,
 I couldn't have been talking for anything else ... :-)))

Hi Nicholas

That sounds like way more than it should be, in fact it sounds like
you've been quoted the cost of the commercial licence and then some!
From Intel's website the academic licence for icc (Linux/2 seats) is
$570 incl 1 year's support.  Renewal of support for subsequent years
will be less than this, probably around $250/year.  I have ifort + icc
(Linux/single user)  we paid about $1200 for the 1st year, and $500
for subsequent year's support.

Cheers

-- Ian


Re: [ccp4bb] Refmac executables - win vs linux in RHEL VM

2012-04-08 Thread Nikolaos Glykos

Hi Ian,


That sounds like way more than it should be, in fact it sounds like
you've been quoted the cost of the commercial licence and then some!
From Intel's website the academic licence for icc (Linux/2 seats) is
$570 incl 1 year's support.  Renewal of support for subsequent years
will be less than this, probably around $250/year.  I have ifort + 
icc

(Linux/single user)  we paid about $1200 for the 1st year, and $500
for subsequent year's support.


The $9,997.00 price I quoted are for the XE parallel studio versions
(C,C++,Fortran,...) as given at

http://softwarestore.ispfulfillment.com/store/Product.aspx?skupart=I23S74

(which is where the page at 
http://software.intel.com/en-us/intel-sdp-home/

directs to if you select the C++ compiler for linux).

For the XE version of C++ the prices for 3-year/2-seat academic is 
$6,499.00

(http://softwarestore.ispfulfillment.com/store/Product.aspx?skupart=I23S76)
and for Fortran alone is $7,800.00
(http://softwarestore.ispfulfillment.com/store/Product.aspx?skupart=I23S91)

I do not doubt that the prices you quote are also correct for a 
different

product line (and I do not have anything against Intel :-)

Nicholas


--
Nicholas M. Glykos, Department of Molecular Biology
 and Genetics, Democritus University of Thrace, University Campus,
  Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) 
+302551030620,

Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/


Re: [ccp4bb] very informative - Trends in Data Fabrication

2012-04-08 Thread James Holton

On 4/2/2012 6:03 AM, herman.schreu...@sanofi.com wrote:

If James Holton had been involved, the fabrication would not have been
discovered.
Herman


Uhh.  Thanks.  I think?

Apologies for remaining uncharacteristically quiet.  I have been keeping 
up with the discussion, but not sure how much difference one more vote 
would make on the various issues.  Especially since most of this has 
come up before.  I agree that fraud is sick and wrong.  I think backing 
up your data is a good idea, etc. etc.  However, I seem to have been 
declared a leading expert on fake data, so I suppose I ought to say 
something about that.  Not quite sure I want to volunteer to be the 
Defense Against The Dark Arts Teacher (they always seem to end badly).  
But, here goes:


I think the core of the fraud problem lies in our need for models, and 
I mean models in the general scientific sense not just PDB files.  
Fundamental to the practice of science is coming up with a model that 
explains the observations you made, preferably to within experimental 
error.  One is also generally expected to estimate what the experimental 
error was.  That is, if you plot a bunch of points on a graph, you need 
to fit some sort of curve to them, and that curve had better fit to 
within the error bars, or you have some explaining to do.  Protein 
structures are really nothing more than a ~50,000 parameter curve fit to 
~50,000 data points.  So, given that the technology for constructing 
models is widely available (be it gnuplot or refmac), as is the 
technology for estimating errors and generating random numbers, all the 
hard work a would-be fraud needs to make a plausible forgery has already 
been done.  This is not something unique to crystallography!  It is a 
general property of any mature science.


Indeed, fake data, is not only a common tool in science but an 
inextricable part of it.  Simulated diffraction images appear in the 
literature at least as early as Arndt and Wonacott (1976), and I'm sure 
even Moseley and Darwin (1913) made some fake data when trying to 
figure out all the sources of systematic error they were dealing with 
measuring reflected x-ray beams.  At its heart, fake data is a 
control.  Remember controls from science class?  They come in two 
flavors: positive and negative, and you are supposed to have both.  In 
fact, all a fraud really is is someone who in some way, shape or form 
takes a positive control and calls it their experiment.  Pasting gel 
lanes together is an example of this.  I think this is why fraud is so 
hard to prevent in science.  You can't do science without controls, but 
anyone who has access to the technology for doing a control can also 
use it for evil.  The labels are everything.


  Personally, I classify fraud as an intentionally incorrect result.  
This separates it from unintentionally incorrect results (mistakes), 
which are far more common.  Validation is meant to catch the incorrect 
part, but can never be expected to establish intent!  In fact, I expect 
a mildly clever fraud might actually plan to hide behind the we made a 
mistake in the deposition/figure/paper but now can't find the original 
data defense.  The case at hand (Zaborsky et al. 2010) may be a very 
good example of this.  A new validation procedure (Rupp 2012) drew 
attention to the fabricated 3k78 structure as well as real structures 
where Fcalc was accidentally deposited instead Fobs (there are a number 
of these).  Rupp's follow-up on 3k78 found troubling irregularities, but 
could it still be a mistake?  If there is a combination of buttons in 
some GUI somewhere that lets you do this then I imagine at least one 
idiot may have discovered it.  Perhaps even pleased with themselves 
for finding a new way to get their R factor down. The best evidence 
that Fobs simply does not exist for 3k78 was in the response (Zaborsky 
et al. 2012).


The same validation procedure also drew attention to other cases.  Two 
of them 1n0r and 1n0q (Mosavi et al. 2002) were from my beamline (ALS 
8.3.1), so finding the original images was simply a matter of flipping 
through the books of old DVDs I have in my office.  They cost us $0.25 
each in 2002.  Yes, I do back up every image, primarily because figuring 
out which ones were worth backing up was actually a more expensive 
proposition.  Even in adjusted dollars, I think the cost of the whole 
archive is still cheaper than what it would have cost Dan to re-grow his 
crystals and collect the data again in 2012.  It is also nice to be able 
to say that the data for 1n0r were collected on Jan 30 2002 from 9:47 pm 
to 11:48 pm and 1n0q was collected on Mar 15 2002 from 12:52 pm until 
3:48 pm.  I was there!  I saw the whole thing!  Yes, I know, since I am 
the guy who can fake images I am not the best witness (the Defense 
Against the Dark Arts Teacher never is), but for whatever it is worth I 
DO recommend keeping your old images around.  You never know when a 
forgotten slip of 

[ccp4bb] [OT] to CCP admin - CCP14 - who's in charge?

2012-04-08 Thread Bernhard Rupp (Hofkristallrat a.D.)
Dear CCPx administrators:

I just notice that on

/www.ccp14.ac.uk/ccp/web-mirrors/llnlrupp/cvs/Rupp/rupp.html

a deprecated web page from the early 2000s (!) that causes confusion exists 
on a mirror of the LLNL site dead since 2005.

I cannot find a responsible contact for CCP14 since Lachlan's unfortunate
demise.
The last person in charge of the CCP14 website was William Bisson, but the
contact 
link in the page goes to an uninformative site.

Who might be in charge there or a responsive contact?

Best regards, BR


Re: [ccp4bb] very informative - Trends in Data Fabrication

2012-04-08 Thread aaleshin
Since I was the person who started a public outcry to do something, I shell 
explain myself to my critics. Similarly to all of you, I do not care much about 
those few instances of structure fabrication. I might put too much emphases on 
them to initiate the discussion, but they are, indeed, only tiny blips on the 
ocean of science. But, could they be tips of a huge iceberg? That was my 
concern. I believe that an enormous competition in science that we experience 
nowadays  makes many of us desperate, and desperation forces people to cheat.  
Is current validation system at PDB good enough to catch various aspects of 
data cheating? Is there a simple but efficient way to make it more difficult 
and, hence, less desirable? 

Good sportsmen (in terms of sport abilities) sometimes get caught with taking 
performance enhancers. I bet everyone would do it if the drug control did not 
exist. Many sportsmen would do it against their will, just because there was no 
other way to win. Do not you think a similar situation can develop in science? 

 I suppose as social animals we like to think we can trust and be trusted
Well, I suppose that these two antagonistic abilities of social animals (trust 
and cheating) developed in parallel as means to promote the evolution. In a 
very hierarchical society with no legal means to change a social status, 
cheating has been an important tool to contribute ones genes to a society. The 
socially unjust societies still exist and their members may have a slightly 
different view on morality of cheating than those from just societies. 
Moreover, ability to cheat often correlates with the intellect. Could not it be 
called cheating when someone is told to do something in one way, but he does it 
in his own way, because he believes it is more efficient? When a scientist 
feels that he is right about validity of his results, but they do not look good 
enough to be sold to validators, he is supposed to do more research. But he 
is out of time, why not to hide weak spots of the work if he knows that the 
major conclusions are RIGHT? Even if someone will redo the work later, they 
will be reproduced, right? In my opinion, this is the major motif for cheating 
in science.

What I suggested with respect to the PDB data validation was adding some 
additional information that would allow to independently validate such 
parameters as the resolution and data quality (catching of model fabrications 
would be a byproduct of this process). Does the current system allow to 
overestimate those parameters? I believe so (but I might be wrong, correct 
me!). Periodically, people ask at ccp4bb how to determine the resolution of 
their data, but some idiots may decide to do it on their own and add 30% of 
noise to their structural factors. As James mentioned, one does not need to be 
extremely smart to do so, moreover, such an idiot would have less restraints 
than an educated crystallographer, because the idiot believes that nobody 
would notice his cheating. His moral principles are not corrupted, because he 
thinks that the model is correct and no harm is done. But the harm is still 
there, because people are forced to believe the model more than it deserves.  

The question is still open to me about what percentage of PDB structures 
overestimates data quality in terms of resolution. Is it possible to make it 
less dependent on the opinion of persons submitting the data? We all have so 
different opinions about everything...  

People invented laws to create conditions when they can trust each other. 
Sociopaths who do not follow the rules get caught and excluded from a society, 
which maintains the trust. But when the trust is abused, it quickly disappears. 
Many of those who wrote on the matter expressed a strong opinion that the 
system is not broken and we should continue trusting each other. Great! I do 
not mind the status quo. 

Regards,
Alex Aleshin

On Apr 8, 2012, at 8:48 AM, James Holton wrote:

 On 4/2/2012 6:03 AM, herman.schreu...@sanofi.com wrote:
 If James Holton had been involved, the fabrication would not have been
 discovered.
 Herman
 
 Uhh.  Thanks.  I think?
 
 Apologies for remaining uncharacteristically quiet.  I have been keeping
 up with the discussion, but not sure how much difference one more vote
 would make on the various issues.  Especially since most of this has
 come up before.  I agree that fraud is sick and wrong.  I think backing
 up your data is a good idea, etc. etc.  However, I seem to have been
 declared a leading expert on fake data, so I suppose I ought to say
 something about that.  Not quite sure I want to volunteer to be the
 Defense Against The Dark Arts Teacher (they always seem to end badly).
 But, here goes:
 
 I think the core of the fraud problem lies in our need for models, and
 I mean models in the general scientific sense not just PDB files.
 Fundamental to the practice of science is coming up with a model that
 explains the observations you 

Re: [ccp4bb] [OT] to CCP admin - CCP14 - who's in charge?

2012-04-08 Thread Harry

Hi Bernhard

CCP14 is (to all intents and purposes) defunct. It lost funding a  
couple of years after Lachlan left in the early 2000s.


I'll supply William's e-mail off-board (or at least the last recent  
address I have)


On 8 Apr 2012, at 20:48, Bernhard Rupp (Hofkristallrat a.D.) wrote:


Dear CCPx administrators:

I just notice that on

/www.ccp14.ac.uk/ccp/web-mirrors/llnlrupp/cvs/Rupp/rupp.html

a deprecated web page from the early 2000s (!) that causes confusion  
exists

on a mirror of the LLNL site dead since 2005.

I cannot find a responsible contact for CCP14 since Lachlan's  
unfortunate

demise.
The last person in charge of the CCP14 website was William Bisson,  
but the

contact
link in the page goes to an uninformative site.

Who might be in charge there or a responsive contact?

Best regards, BR


Harry
--
Dr Harry Powell, MRC Laboratory of Molecular Biology, MRC Centre,  
Hills Road, Cambridge, CB2 0QH


Re: [ccp4bb] very informative - Trends in Data Fabrication

2012-04-08 Thread Bernhard Rupp (Hofkristallrat a.D.)
You never know when a forgotten slip of the mouse when using AutoDep ten
years ago will come back to haunt you.

On the paper James refers to and found the data, added mystery was that the
postdoc who may have slipped disappeared w/o much of  trace and the PI died.
Dan was the only survivor. Still they found the data.

BR