Re: [ccp4bb] Refmac executables - win vs linux in RHEL VM
Hi Bernhard, Maybe the paranoia-checkers in windows slow everything down although I did not see any resources overwhelmed... I wonder whether the windoze refmac binaries can be used through wine in a GNU/Linux environment. If yes, then you could possibly differentiate between the operating-system-dependent and compiler-specific hypotheses. Nicholas -- Nicholas M. Glykos, Department of Molecular Biology and Genetics, Democritus University of Thrace, University Campus, Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620, Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/
Re: [ccp4bb] Refmac executables - win vs linux in RHEL VM
Hi Nat, one of my colleagues found (on Linux) that the exp() function provided by g77 was 20-fold slower than the equivalent in the Intel math library. I do not know whether this has recently been changed, but the license for icc-produced executables used to be rather restrictive. If I remember correctly, you were not allowed to distribute the binaries, full stop. This together with the fact that until recently (icc v.11.0.074) the icc-produced executables would not run on specific AMD-based hardware, had made me return to the safety of gcc. My twocents, Nicholas -- Nicholas M. Glykos, Department of Molecular Biology and Genetics, Democritus University of Thrace, University Campus, Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620, Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/
Re: [ccp4bb] Refmac executables - win vs linux in RHEL VM
I do not know whether this has recently been changed, but the license for icc-produced executables used to be rather restrictive. If I remember correctly, you were not allowed to distribute the binaries, full stop. Nicholas, this restriction applies (and has always applied) only to Intel's 'evaluation' licence: i.e. you get to try the Intel compilers free for 1 month, but you're not allowed to redistribute any executables you create with them. I don't know if this means that the software actually stops working after a month, I guess it does -they're not as trusting as they used to be! Intel's EULA for all their Software Development Products (including all their compilers) states: Subject to all of the terms and conditions of this Agreement and any specific restrictions which may appear in the Redistributables text files, Intel grants to you a non-exclusive, non-assignable, fully-paid copyright license to distribute (except if you received the Materials under an Evaluation License as specified below) the Redistributables, including any modifications pursuant to Section 2.B, or any portions thereof, as part of the product or application you developed using the Materials.. I had our lawyers check this ~10 years ago when the compiler was at version ~7 (it's now at 11), since we are commercial and wanted to distribute our own sources executables, and the conditions on redistribution of user-created executables have not changed in essence since then (obviously redistribution of the compiler executables themselves has never been allowed). What has changed is that the licence conditions have become somewhat more restrictive in the sense that academic institutional users are no longer eligible for free licences! - though they do get a discount off the fully paid-up commercial licence. A personal non-commercial licence (which does not cover use by academics) is still free. In all cases (except evaluation) executables can be freely distributed, along with any of Intel's DLLs that are required to run it. Please note that I have no financial interest in Intel ;). Cheers -- Ian
Re: [ccp4bb] Refmac executables - win vs linux in RHEL VM
Hi I suspect that this is more to do with the amount of memory required, size of arrays etc; refinement will (in general) be more demanding in terms of these than an integration program like Mosflm. The last time I compared the Mosflm performance (which was a few years ago), running the same batch job on OSX 10.4 (Tiger), and on Windows XP and Linux Feisty Fawn (so you can tell how long ago this was) - both the latter running under virtual machines on the same 32-bit Intel Mac that the OSX job ran on) there was essentially no difference in performance (though I have a vague memory of Ubuntu being a little faster, maybe ~3%). Some caveats - * I used a gfortran build for OSX and Linux, g77 build for Windows * I didn't spend too much time on this * I wasn't running a GUI - all three as foregrounded jobs, nothing else running on the machine (I tried to make sure only the OS and essential services were running). So this wasn't a batch job in the traditional sense... * gfortran builds these days are considerably faster (and compare well to ifort builds) On 7 Apr 2012, at 17:50, Roger Rowlett wrote: I don't know the state of current software, because I haven't tried recently, but when I set up my student crystallography workstations a few years back I noticed many packages (e.g. EPMR, Phaser) that had potentially long run times (where it is really noticeable) would run on the identical hardware about 2-3 times faster in Linux than in Windows XP. Memory swapping wasn't the issue. I was astounded there could be that much overhead in Windows. A Linux VM on a windows machine being faster than native Win7 is pretty weird, though. Cheers, On 4/7/2012 11:42 AM, Bernhard Rupp (Hofkristallrat a.D.) wrote: Something the developers might be interested in: The Refmac_5.6.0117 32-bit windows binaries run native on a win64 3-4x slower than those from the linux distribution run **in a RHEL6.2-64 VMware virtual machine hosted the same windows7/64 system.** VM/RHEL: Refmac_5.6.0117: End of Refmac_5.6.0117 Times: User:1015.3s System: 135.0s Elapsed:19:17 Win native Refmac_5.6.0117: End of Refmac_5.6.0117 Times: User: 0.0s System:0.0s Elapsed:67:49 Most peculiaralthough I think but I do not know whether the linux binaries are 64 bit I don't think that address space is the issue here if they are. Maybe the paranoia-checkers in windows slow everything down although I did not see any resources overwhelmed... Best regards, BR - Bernhard Rupp 001 (925) 209-7429 +43 (676) 571-0536 b...@ruppweb.org hofkristall...@gmail.com http://www.ruppweb.org/ - No animals were hurt or killed during the production of this email. - Harry -- Dr Harry Powell, MRC Laboratory of Molecular Biology, MRC Centre, Hills Road, Cambridge, CB2 0QH
Re: [ccp4bb] Refmac executables - win vs linux in RHEL VM
Hi Ian, Nicholas, this restriction applies (and has always applied) only to Intel's 'evaluation' licence That's right. With a cost of $9,997.00 for a 3-years/2-seats academic license, I couldn't have been talking for anything else ... :-))) All the best, Nicholas -- Nicholas M. Glykos, Department of Molecular Biology and Genetics, Democritus University of Thrace, University Campus, Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620, Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/
Re: [ccp4bb] Refmac executables - win vs linux in RHEL VM
On Sun, Apr 08, 2012 at 03:59:22PM +0300, Nikolaos Glykos wrote: Nicholas, this restriction applies (and has always applied) only to Intel's 'evaluation' licence That's right. With a cost of $9,997.00 for a 3-years/2-seats academic license, I couldn't have been talking for anything else ... :-))) Is that a joke? Or did I miss something? We pay about $900 USD/year for our single seat, academic license that includes both the Linux and OS X versions of the Intel Compilers. And if you're an active scientific software developer, we'll let you use them for free: http://www.sbgrid.org/wiki/developers/support -ben -- | Ben Eisenbraun | SBGrid Consortium | http://sbgrid.org | | Harvard Medical School | http://hms.harvard.edu |
Re: [ccp4bb] Refmac executables - win vs linux in RHEL VM
That's right. With a cost of $9,997.00 for a 3-years/2-seats academic license, I couldn't have been talking for anything else ... :-))) Hi Nicholas That sounds like way more than it should be, in fact it sounds like you've been quoted the cost of the commercial licence and then some! From Intel's website the academic licence for icc (Linux/2 seats) is $570 incl 1 year's support. Renewal of support for subsequent years will be less than this, probably around $250/year. I have ifort + icc (Linux/single user) we paid about $1200 for the 1st year, and $500 for subsequent year's support. Cheers -- Ian
Re: [ccp4bb] Refmac executables - win vs linux in RHEL VM
Hi Ian, That sounds like way more than it should be, in fact it sounds like you've been quoted the cost of the commercial licence and then some! From Intel's website the academic licence for icc (Linux/2 seats) is $570 incl 1 year's support. Renewal of support for subsequent years will be less than this, probably around $250/year. I have ifort + icc (Linux/single user) we paid about $1200 for the 1st year, and $500 for subsequent year's support. The $9,997.00 price I quoted are for the XE parallel studio versions (C,C++,Fortran,...) as given at http://softwarestore.ispfulfillment.com/store/Product.aspx?skupart=I23S74 (which is where the page at http://software.intel.com/en-us/intel-sdp-home/ directs to if you select the C++ compiler for linux). For the XE version of C++ the prices for 3-year/2-seat academic is $6,499.00 (http://softwarestore.ispfulfillment.com/store/Product.aspx?skupart=I23S76) and for Fortran alone is $7,800.00 (http://softwarestore.ispfulfillment.com/store/Product.aspx?skupart=I23S91) I do not doubt that the prices you quote are also correct for a different product line (and I do not have anything against Intel :-) Nicholas -- Nicholas M. Glykos, Department of Molecular Biology and Genetics, Democritus University of Thrace, University Campus, Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620, Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/
Re: [ccp4bb] very informative - Trends in Data Fabrication
On 4/2/2012 6:03 AM, herman.schreu...@sanofi.com wrote: If James Holton had been involved, the fabrication would not have been discovered. Herman Uhh. Thanks. I think? Apologies for remaining uncharacteristically quiet. I have been keeping up with the discussion, but not sure how much difference one more vote would make on the various issues. Especially since most of this has come up before. I agree that fraud is sick and wrong. I think backing up your data is a good idea, etc. etc. However, I seem to have been declared a leading expert on fake data, so I suppose I ought to say something about that. Not quite sure I want to volunteer to be the Defense Against The Dark Arts Teacher (they always seem to end badly). But, here goes: I think the core of the fraud problem lies in our need for models, and I mean models in the general scientific sense not just PDB files. Fundamental to the practice of science is coming up with a model that explains the observations you made, preferably to within experimental error. One is also generally expected to estimate what the experimental error was. That is, if you plot a bunch of points on a graph, you need to fit some sort of curve to them, and that curve had better fit to within the error bars, or you have some explaining to do. Protein structures are really nothing more than a ~50,000 parameter curve fit to ~50,000 data points. So, given that the technology for constructing models is widely available (be it gnuplot or refmac), as is the technology for estimating errors and generating random numbers, all the hard work a would-be fraud needs to make a plausible forgery has already been done. This is not something unique to crystallography! It is a general property of any mature science. Indeed, fake data, is not only a common tool in science but an inextricable part of it. Simulated diffraction images appear in the literature at least as early as Arndt and Wonacott (1976), and I'm sure even Moseley and Darwin (1913) made some fake data when trying to figure out all the sources of systematic error they were dealing with measuring reflected x-ray beams. At its heart, fake data is a control. Remember controls from science class? They come in two flavors: positive and negative, and you are supposed to have both. In fact, all a fraud really is is someone who in some way, shape or form takes a positive control and calls it their experiment. Pasting gel lanes together is an example of this. I think this is why fraud is so hard to prevent in science. You can't do science without controls, but anyone who has access to the technology for doing a control can also use it for evil. The labels are everything. Personally, I classify fraud as an intentionally incorrect result. This separates it from unintentionally incorrect results (mistakes), which are far more common. Validation is meant to catch the incorrect part, but can never be expected to establish intent! In fact, I expect a mildly clever fraud might actually plan to hide behind the we made a mistake in the deposition/figure/paper but now can't find the original data defense. The case at hand (Zaborsky et al. 2010) may be a very good example of this. A new validation procedure (Rupp 2012) drew attention to the fabricated 3k78 structure as well as real structures where Fcalc was accidentally deposited instead Fobs (there are a number of these). Rupp's follow-up on 3k78 found troubling irregularities, but could it still be a mistake? If there is a combination of buttons in some GUI somewhere that lets you do this then I imagine at least one idiot may have discovered it. Perhaps even pleased with themselves for finding a new way to get their R factor down. The best evidence that Fobs simply does not exist for 3k78 was in the response (Zaborsky et al. 2012). The same validation procedure also drew attention to other cases. Two of them 1n0r and 1n0q (Mosavi et al. 2002) were from my beamline (ALS 8.3.1), so finding the original images was simply a matter of flipping through the books of old DVDs I have in my office. They cost us $0.25 each in 2002. Yes, I do back up every image, primarily because figuring out which ones were worth backing up was actually a more expensive proposition. Even in adjusted dollars, I think the cost of the whole archive is still cheaper than what it would have cost Dan to re-grow his crystals and collect the data again in 2012. It is also nice to be able to say that the data for 1n0r were collected on Jan 30 2002 from 9:47 pm to 11:48 pm and 1n0q was collected on Mar 15 2002 from 12:52 pm until 3:48 pm. I was there! I saw the whole thing! Yes, I know, since I am the guy who can fake images I am not the best witness (the Defense Against the Dark Arts Teacher never is), but for whatever it is worth I DO recommend keeping your old images around. You never know when a forgotten slip of
[ccp4bb] [OT] to CCP admin - CCP14 - who's in charge?
Dear CCPx administrators: I just notice that on /www.ccp14.ac.uk/ccp/web-mirrors/llnlrupp/cvs/Rupp/rupp.html a deprecated web page from the early 2000s (!) that causes confusion exists on a mirror of the LLNL site dead since 2005. I cannot find a responsible contact for CCP14 since Lachlan's unfortunate demise. The last person in charge of the CCP14 website was William Bisson, but the contact link in the page goes to an uninformative site. Who might be in charge there or a responsive contact? Best regards, BR
Re: [ccp4bb] very informative - Trends in Data Fabrication
Since I was the person who started a public outcry to do something, I shell explain myself to my critics. Similarly to all of you, I do not care much about those few instances of structure fabrication. I might put too much emphases on them to initiate the discussion, but they are, indeed, only tiny blips on the ocean of science. But, could they be tips of a huge iceberg? That was my concern. I believe that an enormous competition in science that we experience nowadays makes many of us desperate, and desperation forces people to cheat. Is current validation system at PDB good enough to catch various aspects of data cheating? Is there a simple but efficient way to make it more difficult and, hence, less desirable? Good sportsmen (in terms of sport abilities) sometimes get caught with taking performance enhancers. I bet everyone would do it if the drug control did not exist. Many sportsmen would do it against their will, just because there was no other way to win. Do not you think a similar situation can develop in science? I suppose as social animals we like to think we can trust and be trusted Well, I suppose that these two antagonistic abilities of social animals (trust and cheating) developed in parallel as means to promote the evolution. In a very hierarchical society with no legal means to change a social status, cheating has been an important tool to contribute ones genes to a society. The socially unjust societies still exist and their members may have a slightly different view on morality of cheating than those from just societies. Moreover, ability to cheat often correlates with the intellect. Could not it be called cheating when someone is told to do something in one way, but he does it in his own way, because he believes it is more efficient? When a scientist feels that he is right about validity of his results, but they do not look good enough to be sold to validators, he is supposed to do more research. But he is out of time, why not to hide weak spots of the work if he knows that the major conclusions are RIGHT? Even if someone will redo the work later, they will be reproduced, right? In my opinion, this is the major motif for cheating in science. What I suggested with respect to the PDB data validation was adding some additional information that would allow to independently validate such parameters as the resolution and data quality (catching of model fabrications would be a byproduct of this process). Does the current system allow to overestimate those parameters? I believe so (but I might be wrong, correct me!). Periodically, people ask at ccp4bb how to determine the resolution of their data, but some idiots may decide to do it on their own and add 30% of noise to their structural factors. As James mentioned, one does not need to be extremely smart to do so, moreover, such an idiot would have less restraints than an educated crystallographer, because the idiot believes that nobody would notice his cheating. His moral principles are not corrupted, because he thinks that the model is correct and no harm is done. But the harm is still there, because people are forced to believe the model more than it deserves. The question is still open to me about what percentage of PDB structures overestimates data quality in terms of resolution. Is it possible to make it less dependent on the opinion of persons submitting the data? We all have so different opinions about everything... People invented laws to create conditions when they can trust each other. Sociopaths who do not follow the rules get caught and excluded from a society, which maintains the trust. But when the trust is abused, it quickly disappears. Many of those who wrote on the matter expressed a strong opinion that the system is not broken and we should continue trusting each other. Great! I do not mind the status quo. Regards, Alex Aleshin On Apr 8, 2012, at 8:48 AM, James Holton wrote: On 4/2/2012 6:03 AM, herman.schreu...@sanofi.com wrote: If James Holton had been involved, the fabrication would not have been discovered. Herman Uhh. Thanks. I think? Apologies for remaining uncharacteristically quiet. I have been keeping up with the discussion, but not sure how much difference one more vote would make on the various issues. Especially since most of this has come up before. I agree that fraud is sick and wrong. I think backing up your data is a good idea, etc. etc. However, I seem to have been declared a leading expert on fake data, so I suppose I ought to say something about that. Not quite sure I want to volunteer to be the Defense Against The Dark Arts Teacher (they always seem to end badly). But, here goes: I think the core of the fraud problem lies in our need for models, and I mean models in the general scientific sense not just PDB files. Fundamental to the practice of science is coming up with a model that explains the observations you
Re: [ccp4bb] [OT] to CCP admin - CCP14 - who's in charge?
Hi Bernhard CCP14 is (to all intents and purposes) defunct. It lost funding a couple of years after Lachlan left in the early 2000s. I'll supply William's e-mail off-board (or at least the last recent address I have) On 8 Apr 2012, at 20:48, Bernhard Rupp (Hofkristallrat a.D.) wrote: Dear CCPx administrators: I just notice that on /www.ccp14.ac.uk/ccp/web-mirrors/llnlrupp/cvs/Rupp/rupp.html a deprecated web page from the early 2000s (!) that causes confusion exists on a mirror of the LLNL site dead since 2005. I cannot find a responsible contact for CCP14 since Lachlan's unfortunate demise. The last person in charge of the CCP14 website was William Bisson, but the contact link in the page goes to an uninformative site. Who might be in charge there or a responsive contact? Best regards, BR Harry -- Dr Harry Powell, MRC Laboratory of Molecular Biology, MRC Centre, Hills Road, Cambridge, CB2 0QH
Re: [ccp4bb] very informative - Trends in Data Fabrication
You never know when a forgotten slip of the mouse when using AutoDep ten years ago will come back to haunt you. On the paper James refers to and found the data, added mystery was that the postdoc who may have slipped disappeared w/o much of trace and the PI died. Dan was the only survivor. Still they found the data. BR