Re: [ccp4bb] Crystalization in low PH
Hi, I'm sure there are proteins that were crystallized at low pH but I can't remember which. The best thing is to go to the BMCD database: http://xpdb.nist.gov:8060/BMCD4/index.faces and query it with the key pH (look into advanced search). Cheers, Boaz Boaz Shaanan, Ph.D. Dept. of Life Sciences Ben-Gurion University of the Negev Beer-Sheva 84105 Israel E-mail: bshaa...@bgu.ac.il Phone: 972-8-647-2220 Skype: boaz.shaanan Fax: 972-8-647-2992 or 972-8-646-1710 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Sam Arnosti [meisam.nosr...@gmail.com] Sent: Monday, November 07, 2011 7:19 AM To: CCP4BB@JISCMAIL.AC.UK Subject: [ccp4bb] Crystalization in low PH Hi everyone I have a protein that is extraordinarily stable at PH=3.0 or even 2.0. I want to crystallize it in the low PH and compare the differences between the crystals in regular PH and low PH. I was wondering how people set up the boxes in low PH, as usual buffers are mostly less acidic. Regards Sam
Re: [ccp4bb] Crystalization in low PH
Tendamistat (1OK0) was crystallized at pH 1.3 and diffracted to 0.93A. George On Mon, Nov 07, 2011 at 05:19:29AM +, Sam Arnosti wrote: Hi everyone I have a protein that is extraordinarily stable at PH=3.0 or even 2.0. I want to crystallize it in the low PH and compare the differences between the crystals in regular PH and low PH. I was wondering how people set up the boxes in low PH, as usual buffers are mostly less acidic. Regards Sam -- Prof. George M. Sheldrick FRS Dept. Structural Chemistry, University of Goettingen, Tammannstr. 4, D37077 Goettingen, Germany Tel. +49-551-39-3021 or -3068 Fax. +49-551-39-22582
[ccp4bb] Announcement to MX-workshop and invitation to annual HZB-BESSY users meeting
Announcement MX-Satellite workshop New developments in macromolecular crystallography using synchrotron radiation This workshop will take place on Nov 30, 2011 as a satellite to the annual HZB users meeting at BESSY-II Berlin and we would like to cordially invite you to participate in this workshop. The following speakers have already confirmed their participation: - Andrew Thompson (Soleil) - Martin Fuchs (PSI) - Juan Sanchez-Weatherby (DIAMOND) - Alke Meents (DESY) - Manfred S. Weiss (HZB) - Karthik Paithankar (HZB) - Sandra Pühringer (HZB) - Gerd Weber (FU-Berlin) Registration and further information can be found here: http://www.helmholtz-berlin.de/user/usersmeetings/users-meeting-2011/index_en.html The registration deadline is: November 21., 2011 Additionally, we would hereby like to invite you to participate in the and HZB Users meeting, which will take place from Dec 01-02. , 2011 in Berlin-Adlershof in the WISTA main building. As every year, we will reward the best MX-beamlines related poster presentation with the valuable BESSY-MX poster award. Please register: http://www.helmholtz-berlin.de/user/usersmeetings/users-meeting-2011/index_en.html The registration deadline is: November 21., 2011 We are very much looking forward to see you. Uwe Mueller Manfred Weiss Dr. Uwe Mueller Soft Matter and Functional Materials Macromolecular Crystallography (BESSY-MX) | Group leader Elektronenspeicherring BESSY II Albert-Einstein-Str. 15, D-12489 Berlin, Germany Fon: +49 30 8062 14974 Fax: +49 30 8062 14975 url: www.helmholtz-berlin.de/bessy-mxhttp://www.helmholtz-berlin.de/bessy-mx email:u...@helmholtz-berlin.demailto:u...@helmholtz-berlin.de Helmholtz-Zentrum Berlin für Materialien und Energie GmbH Hahn-Meitner-Platz 1, 14109 Berlin Vorsitzender des Aufsichtsrats: Prof. Dr. Dr. h.c. mult. Joachim Treusch Stellvertretende Vorsitzende: Dr. Beatrix Vierkorn-Rudolph Geschäftsführer: Prof. Dr. Anke Rita Kaysser-Pyzalla, Dr. Ulrich Breuer Sitz der Gesellschaft: Berlin Handelsregister: AG Charlottenburg, 89 HRB 5583 Helmholtz-Zentrum Berlin für Materialien und Energie GmbH Mitglied der Hermann von Helmholtz-Gemeinschaft Deutscher Forschungszentren e.V. Aufsichtsrat: Vorsitzender Prof. Dr. Dr. h.c. mult. Joachim Treusch, stv. Vorsitzende Dr. Beatrix Vierkorn-Rudolph Geschäftsführer: Prof. Dr. Anke Rita Kaysser-Pyzalla, Dr. Ulrich Breuer Sitz Berlin, AG Charlottenburg, 89 HRB 5583 Postadresse: Hahn-Meitner-Platz 1 D-14109 Berlin http://www.helmholtz-berlin.de
Re: [ccp4bb] how to use refine ligand containing heavry atom
Something must be wrong.. If you are using REFMAC it will give you a list of bad contacts etc in the log file.. Check those and try to correct them.. Eleanor On 11/06/2011 05:04 PM, Zhipu Luo wrote: Dear all I have a protein soaked in a coordination compound containing platinum. Due to some reason, I do not get the anomalous data at 1.072 angstrom, only got a set of data at 0.973 angstrom. I have solved the phase through molecular replacement, and refined the model to Rfactor=0.2204, Rfree=0.2447 before modeling the coordination compound. However, the Rfactor rised to 0.3345 and Rfree rised to 0.3425 after refining the model with coordination compound. how to deal with this problem? Hoping for help! thank you for your time Zhipu fuzhou China
Re: [ccp4bb] Crystalization in low PH
I'm not convinced that you need a conventional buffer at pH 2 or 3. At pH 2, the hydrogen ion concentration is 10 mM. If you want to use something else, the second pKa for sulfuric acid is around 2. The first pKa for phosphoric acid is slightly higher than 2. Lactic acid has a pKa close to 3. Formic acid has a pKa just under 4. Most of these numbers were in an appendix in the first chemistry text you ever used. wink These numbers imply pretty strongly that most crystallization screens emphasizing common salts will require determined modification to hit these low pH values, because many stabilizing anions in the Hoffmeister series will be partially or completely protonated at these pH values. PEG and organic screens will require a smaller hammer to retrofit. On Nov 6, 2011, at 11:19 PM, Sam Arnosti wrote: Hi everyone I have a protein that is extraordinarily stable at PH=3.0 or even 2.0. I want to crystallize it in the low PH and compare the differences between the crystals in regular PH and low PH. I was wondering how people set up the boxes in low PH, as usual buffers are mostly less acidic. Regards Sam
Re: [ccp4bb] Crystalization in low PH
I have crystallized in PEG with citrate at pH 3. If you want to go lower I would suggest maleate: effective pH range pKa 25°Cbuffer 1.2-2.6 1.97 maleate (pK1) 2.2-6.53.13 citrate (pK1) Enrico. On Mon, 07 Nov 2011 14:15:02 +0100, Craig A. Bingman cbing...@biochem.wisc.edu wrote: I'm not convinced that you need a conventional buffer at pH 2 or 3. At pH 2, the hydrogen ion concentration is 10 mM. If you want to use something else, the second pKa for sulfuric acid is around 2. The first pKa for phosphoric acid is slightly higher than 2. Lactic acid has a pKa close to 3. Formic acid has a pKa just under 4. Most of these numbers were in an appendix in the first chemistry text you ever used. wink These numbers imply pretty strongly that most crystallization screens emphasizing common salts will require determined modification to hit these low pH values, because many stabilizing anions in the Hoffmeister series will be partially or completely protonated at these pH values. PEG and organic screens will require a smaller hammer to retrofit. On Nov 6, 2011, at 11:19 PM, Sam Arnosti wrote: Hi everyone I have a protein that is extraordinarily stable at PH=3.0 or even 2.0. I want to crystallize it in the low PH and compare the differences between the crystals in regular PH and low PH. I was wondering how people set up the boxes in low PH, as usual buffers are mostly less acidic. Regards Sam -- Enrico A. Stura D.Phil. (Oxon) ,Tel: 33 (0)1 69 08 4302 Office Room 19, Bat.152, Tel: 33 (0)1 69 08 9449Lab LTMB, SIMOPRO, IBiTec-S, CE Saclay, 91191 Gif-sur-Yvette, FRANCE http://www-dsv.cea.fr/en/institutes/institute-of-biology-and-technology-saclay-ibitec-s/unites-de-recherche/department-of-molecular-engineering-of-proteins-simopro/molecular-toxinology-and-biotechnology-laboratory-ltmb/crystallogenesis-e.-stura http://www.chem.gla.ac.uk/protein/mirror/stura/index2.html e-mail: est...@cea.fr Fax: 33 (0)1 69 08 90 71
Re: [ccp4bb] Crystalization in low PH
I remembered that people had crystallize a series of streptavidin-2-iminobiotin structures at a low pH. If it might help, check the following PDBIDs: 2RTD 2RTE 2RTI 2RTK 2RTL Hi everyone I have a protein that is extraordinarily stable at PH=3.0 or even 2.0. I want to crystallize it in the low PH and compare the differences between the crystals in regular PH and low PH. I was wondering how people set up the boxes in low PH, as usual buffers are mostly less acidic. Regards Sam
Re: [ccp4bb] Crystalization in low PH
On Mon, 2011-11-07 at 05:19 +, Sam Arnosti wrote: Hi everyone I have a protein that is extraordinarily stable at PH=3.0 or even 2.0. I want to crystallize it in the low PH and compare the differences between the crystals in regular PH and low PH. I was wondering how people set up the boxes in low PH, as usual buffers are mostly less acidic. Regards Sam Not clear if you already have crystals at regular pH, but if you do, you may consider direct transfer to lower pH. Of course, crystals may dissolve, which you could possibly prevent by cross-linking with glutaraldehyde. Three caveats: a) If lattice is incompatible with lower pH, even with cross-linking the resolution may sink to essentially useless levels b) I have no idea if the cross-linking will not be disrupted at really low pH, perhaps someone else can comment on that c) the 3rd reviewer can always say that lattice forces could have prevented a conformational change. But same goes for direct crystallization at low pH (but caries less weight). -- I'd jump in myself, if I weren't so good at whistling. Julian, King of Lemurs
[ccp4bb] vacancy for a research scientist at the ILL
Anita Schober Institut LAUE-LANGEVIN Service Ressources Humaines BP 156 - 38042 GRENOBLE Cedex 9 Tl.: + 33 (0)4 76 20 72 36 Fax : + 33 (0)4 76 20 77 99 E-mail : schob...@ill.fr Hello, please find below the advertisement of a vacancy for a research scientist at the ILL. Human Resources Service Institut Laue Langevin INSTITUT MAX VON LAUE PAUL LANGEVIN DA/SRH/GRI-07/11/2011 ILL rf. 11/24 www.ill.fr VACANCY The Institut Laue-Langevin (ILL), situated in Grenoble, France, is Europe's leading research facility for fundamental research using neutrons. The ILL operates the brightest neutron source in the world, reliably delivering intense neutron beams to 40 unique scientific instruments. The Institute welcomes more than 2000 visiting scientists per year to carry out world-class research in solid-state physics, crystallography, soft matter, biology, chemistry and fundamental physics. Funded primarily by its three founder members: France, Germany and the United Kingdom, the ILL has also signed scientific collaboration agreements with 12 other countries. The Science Division currently has a vacancy : RESEARCH SCIENTIST - m/f - (small angle neutron scattering) Small angle neutron scattering (SANS) has become a major component of structural biology at the ILL to study interactions and low resolution structures in large biological macromolecular complexes. Furthermore the PSB (Partnership for Structural Biology) provides additional outstanding facilities for structural biology, including a Deuteration Laboratory for isotope labeling of biological molecules, and a wide variety of complementary biophysical methods. With the presence of the European Molecular Biology Laboratory, the Institute of Structural Biology and the European Synchrotron Radiation Facility, the campus provides an exciting research environment for biology. Duties: The ILL is inviting applications for a scientist to take charge of the biological aspects of the SANS instrument D22 in the Large Scale Structures group. Duties would include: instrument maintenance and development, running data collection and analysis software, local contact for biology experiments both on D22 and on the other SANS instruments of the group, and to coordinate the system of beamtime block allocation to experimenters. The candidate will also be encouraged to develop strong collaborations for her/his own scientific work. Qualifications and experience: Ph.D. in physical or life sciences. We are particularly interested in highly motivated candidates with an active research interest in biology, experience in neutron or X-ray small angle scattering. The post represents an excellent opportunity for a young postdoctoral scientist to develop expertise, broaden their experience and interact with leading scientists from around the world. Applications from more experienced scientists who are able to obtain a secondment period from their home institute will also be considered. Language skills: As an international research centre, we are particularly keen to ensure that we also attract applicants from outside France. You must have a sound knowledge of English and be willing to learn French (a language course will be paid for by the ILL). Knowledge of German would be an advantage. Notes: Fixed-term contract of 5 years. Further information can be obtained by contacting the head of the Large Scale Structures Group: Dr. R Cubitt, Tel. +33(0)4.76.20.72.15, e-mail: cub...@ill.fr (please do not send your application to this address) or via http://www.ill.fr/lss. Benefits:
[ccp4bb] Two postdoctoral positions at Department of Molecular Drug Research, University of Copenhagen
1. Position Job description A postdoctoral position is available from 01.01.2012-31.12.2013. The successful candidate will focus on the identification of interaction partners of the histone demethylase PLU-1 and on structural studies of stable PLU-1 complexes involved in gene regulation. The project will be conducted within the framework of the University of Copenhagen Programme of Excellence on Epigenetics, a programme involving research groups at the Biotech Research and Innovation Centre (BRIC) and the Department of Medicinal Chemistry. Experience with recombinant protein expression, protein purification and biochemical characterisation of mammalian intracellular proteins will be an advantage. Interest in the application of scattering methods such as X-ray Crystallography and Small Angle X-ray Scattering (SAXS) is also required, but no prior experience with these techniques is necessary. Terms of employment Terms of appointment and payment are in accordance with the agreement between the Danish Ministry of Finance and the Danish Federation of Professional Associations. Applying Send your application via http://www.ku.dk/stillinger/VIP (select 'send ansøgning' at the bottom of the page). The deadline for applying is 1 December 2011. Questions Further information about the position and the project are available from Professor Michael Gajhede, Biostructural Research (www.farma.ku.dk/BRhttp://www.farma.ku.dk/BR), Department of Molecular Drug Research, University of Copenhagen, tel. 35336407, email m...@farma.ku.dkmailto:m...@farma.ku.dk. The University of Copenhagen wishes to reflect the diversity of society and welcomes applications from all qualified candidates regardless of personal background. 2. Position Job description A postdoctoral position is available from 15.12.2011-31.6.2013. The successful candidate will focus on in-vitro characterisation, structural studies and inhibitor identification of the chromatin modifying TET proteins. The project will be conducted within the framework of the University of Copenhagen Programme of Excellence on Epigenetics, a programme involving research groups at the Biotech Research and Innovation Centre (BRIC) and the Department of Medicinal Chemistry. Experience with recombinant protein expression, protein purification and biochemical characterisation of mammalian intracellular proteins will be an advantage. Terms of employment Terms of appointment and payment are in accordance with the agreement between the Danish Ministry of Finance and the Danish Federation of Professional Associations. Applying Send your application via http://www.ku.dk/stillinger/VIP (select 'send ansøgning' at the bottom of the page). The deadline for applying is 1 December 2011. Questions Further information about the position and the project are available from Professor Michael Gajhede, Biostructural Research (www.farma.ku.dk/BRhttp://www.farma.ku.dk/BR), Department of Molecular Drug Research, University of Copenhagen, tel. 35336407, email m...@farma.ku.dkmailto:m...@farma.ku.dk. The University of Copenhagen wishes to reflect the diversity of society and welcomes applications from all qualified candidates regardless of personal background. Professor Michael Gajhede Department of Medicinal Chemistry University of Copenhagen Jagtvej 162 DK-2100 Copenhagen Ø Denmark Phone: +45 35336407 Email: m...@farma.ku.dk.dkmailto:m...@farma.ku.dk.dk
Re: [ccp4bb] Crystalization in low PH
Glutaraldehyde works best at low pH On Mon, Nov 7, 2011 at 8:40 AM, Ed Pozharski epozh...@umaryland.edu wrote: On Mon, 2011-11-07 at 05:19 +, Sam Arnosti wrote: Hi everyone I have a protein that is extraordinarily stable at PH=3.0 or even 2.0. I want to crystallize it in the low PH and compare the differences between the crystals in regular PH and low PH. I was wondering how people set up the boxes in low PH, as usual buffers are mostly less acidic. Regards Sam Not clear if you already have crystals at regular pH, but if you do, you may consider direct transfer to lower pH. Of course, crystals may dissolve, which you could possibly prevent by cross-linking with glutaraldehyde. Three caveats: a) If lattice is incompatible with lower pH, even with cross-linking the resolution may sink to essentially useless levels b) I have no idea if the cross-linking will not be disrupted at really low pH, perhaps someone else can comment on that c) the 3rd reviewer can always say that lattice forces could have prevented a conformational change. But same goes for direct crystallization at low pH (but caries less weight). -- I'd jump in myself, if I weren't so good at whistling. Julian, King of Lemurs
[ccp4bb] Posting
I would like to see the electron density map (2Fo-FC, Fo-Fc, omit map) for ligands on 2-fold symmetry in protein structure. If any of you can send some images I will appreciate it. Thanks Debasish Debasish Chattopadhyay, Ph.D. University of Alabama at Birmingham
[ccp4bb] about .ins file for SHELXD
Hi, I was trying to use SHELXD to solve peptide structure. But I got stuck in the input .ins file, and I need some advice. In the .ins file, TITLE CELL ZERR LATT SYMM SFAC C H N O *UNIT* *FIND* *PLOP* NTRY HKLF END A rough estimate, there will be 62 C, 122 H, 14 N, 32 O in one unit cell. 1 for *UNIT* command, *should the following nums be the num of atoms(C H N O) per unit cell and multiplied by 4*? 62x4 for C, 122x4 for H, etc? 2 for *FIND*, the manual says '*estimated num of sites within 20% of true number*', so *should it be 20% of the total number of atoms in one unit cell?* (62+122+14+32)x 20%? 3 for *PLOP*, the manual says *'# of peaks to start with in each cycle, Peaks are then eliminated one at a time until either the correlation coefficient cannot be increased any more or 50% of the peaks have been eliminated' * '*one should specify more than the expected number of atoms because this procedure involves the elimination of the 'wrong' atoms*' which I don't fully understand,* should the following nums be bigger than the total num of atoms in one unit cell? *Thanks in advance! Lu
Re: [ccp4bb] phaser
The new Phaser GUI does not seem to let me reset the number of clashes for the packing search? Is there something I have missed? Eleanor
[ccp4bb] about .ins file for SHELXD
Hi, I was trying to use SHELXD to solve peptide structure. But I got stuck in the input .ins file, and I need some advice. In the .ins file, TITLE CELL ZERR LATT SYMM SFAC C H N O *UNIT* *FIND* *PLOP* NTRY HKLF END A rough estimate, there will be 62 C, 122 H, 14 N, 32 O in one unit cell. 1 for *UNIT* command, *should the following nums be the num of atoms(C H N O) per unit cell and multiplied by 4*? 62x4 for C, 122x4 for H, etc? 2 for *FIND*, the manual says '*estimated num of sites within 20% of true number*', so *should it be 20% of the total number of atoms in one unit cell?* (62+122+14+32)x 20%? 3 for *PLOP*, the manual says *'# of peaks to start with in each cycle, Peaks are then eliminated one at a time until either the correlation coefficient cannot be increased any more or 50% of the peaks have been eliminated' * '*one should specify more than the expected number of atoms because this procedure involves the elimination of the 'wrong' atoms*' which I don't fully understand,* should the following nums be bigger than the total num of atoms in one unit cell? *Thanks in advance! Lu
Re: [ccp4bb] phaser
Hi Eleanor, I think you should find it in the Additional Parameters section, second line, labelled Packing criterion. The default (chosen largely because you had been asking for something like this!) is to allow a number of clashes equal to 5% of the number of residues. Let me know if it isn't there. It's possible I've got a different version of the GUI on my machine... Randy On 7 Nov 2011, at 16:42, Eleanor Dodson wrote: The new Phaser GUI does not seem to let me reset the number of clashes for the packing search? Is there something I have missed? Eleanor -- Randy J. Read Department of Haematology, University of Cambridge Cambridge Institute for Medical Research Tel: + 44 1223 336500 Wellcome Trust/MRC Building Fax: + 44 1223 336827 Hills RoadE-mail: rj...@cam.ac.uk Cambridge CB2 0XY, U.K. www-structmed.cimr.cam.ac.uk
Re: [ccp4bb] about .ins file for SHELXD
UNIT specifies the number of atoms of each type in the unit-cell. For such 'small-molecule' problems you should try to get the numbers of heavier atoms correct, if only CHNO are present any numbers will do. For such problems I recommend setting FIND to about 70% of the number of atoms (excluding H) in the asymmetric unit. The first PLOP number should be approximately the number of atoms (excluding H) in the asymmetric unit. The second PLOP number should be about 1.2 times this and the third about 1.4 times it (three PLOP cycles are enough). This allows the 'peaklist optimization algorithm' to throw out some of the atoms. You will need data to 1.2A or better (1.0 is much better than 1.2!). The data should be as complete as possible. The following comments apply to larger problems, your structures is probably small enough to ignore them (but you may still need a large NTRY). I strongly recommend using the beta-test multiple-CPU version of shelxd. Direct methods can be very computationally intensive for large structures. I recently used it to solve a RNA with 1.0A data; on an 8-CPU machine it produced one solution in a week. This version is faster even with one CPU, the standard version would have taken about 3 months. I have seen several cases that had one solution in 5 or more trials (NTRY). If you expect a partly helical structure or have a good partial model and have data to 2.1A or better I strongly recommend ARCIMBOLDO. If all else fails I would be happy to look at the data for you. George On Mon, Nov 07, 2011 at 11:31:54AM -0500, Lu Yu wrote: Hi, I was trying to use SHELXD to solve peptide structure. But I got stuck in the input .ins file, and I need some advice. In the .ins file, TITLE CELL ZERR LATT SYMM SFAC C H N O UNIT FIND PLOP NTRY HKLF END A rough estimate, there will be 62 C, 122 H, 14 N, 32 O in one unit cell. 1 for UNIT command, should the following nums be the num of atoms(C H N O) per unit cell and multiplied by 4? 62x4 for C, 122x4 for H, etc? 2 for FIND, the manual says 'estimated num of sites within 20% of true number', so should it be 20% of the total number of atoms in one unit cell? (62+122+14 +32)x 20%? 3 for PLOP, the manual says '# of peaks to start with in each cycle, Peaks are then eliminated one at a time until either the correlation coefficient cannot be increased any more or 50% of the peaks have been eliminated' 'one should specify more than the expected number of atoms because this procedure involves the elimination of the 'wrong' atoms' which I don't fully understand, should the following nums be bigger than the total num of atoms in one unit cell? Thanks in advance! Lu -- Prof. George M. Sheldrick FRS Dept. Structural Chemistry, University of Goettingen, Tammannstr. 4, D37077 Goettingen, Germany Tel. +49-551-39-3021 or -3068 Fax. +49-551-39-22582
[ccp4bb] image compression
At the risk of sounding like another poll, I have a pragmatic question for the methods development community: Hypothetically, assume that there was a website where you could download the original diffraction images corresponding to any given PDB file, including early datasets that were from the same project, but because of smeary spots or whatever, couldn't be solved. There might even be datasets with unknown PDB IDs because that particular project never did work out, or because the relevant protein sequence has been lost. Remember, few of these datasets will be less than 5 years old if we try to allow enough time for the original data collector to either solve it or graduate (and then cease to care). Even for the final dataset, there will be a delay, since the half-life between data collection and coordinate deposition in the PDB is still ~20 months. Plenty of time to forget. So, although the images were archived (probably named test and in a directory called john) it may be that the only way to figure out which PDB ID is the right answer is by processing them and comparing to all deposited Fs. Assume this was done. But there will always be some datasets that don't match any PDB. Are those interesting? What about ones that can't be processed? What about ones that can't even be indexed? There may be a lot of those! (hypothetically, of course). Anyway, assume that someone did go through all the trouble to make these datasets available for download, just in case they are interesting, and annotated them as much as possible. There will be about 20 datasets for any given PDB ID. Now assume that for each of these datasets this hypothetical website has two links, one for the raw data, which will average ~2 GB per wedge (after gzip compression, taking at least ~45 min to download), and a second link for a lossy compressed version, which is only ~100 MB/wedge (2 min download). When decompressed, the images will visually look pretty much like the originals, and generally give you very similar Rmerge, Rcryst, Rfree, I/sigma, anomalous differences, and all other statistics when processed with contemporary software. Perhaps a bit worse. Essentially, lossy compression is equivalent to adding noise to the images. Which one would you try first? Does lossy compression make it easier to hunt for interesting datasets? Or is it just too repugnant to have modified the data in any way shape or form ... after the detector manufacturer's software has corrected it? Would it suffice to simply supply a couple of example images for download instead? -James Holton MAD Scientist
Re: [ccp4bb] image compression
This is a very good question. I would suggest that both versions of the old data are useful. If was is being done is simple validation and regeneration of what was done before, then the lossy compression should be fine in most instances. However, when what is being done hinges on the really fine details -- looking for lost faint spots just peeking out from the background, looking at detailed peak profiles -- then the lossless compression version is the better choice. The annotation for both sets should be the same. The difference is in storage and network bandwidth. Hopefully the fraud issue will never again rear its ugly head, but if it should, then having saved the losslessly compressed images might prove to have been a good idea. To facilitate experimentation with the idea, if there is agreement on the particular lossy compression to be used, I would be happy to add it as an option in CBFlib. Right now all the compressions we have are lossless. Regards, Herbert = Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 y...@dowling.edu = On Mon, 7 Nov 2011, James Holton wrote: At the risk of sounding like another poll, I have a pragmatic question for the methods development community: Hypothetically, assume that there was a website where you could download the original diffraction images corresponding to any given PDB file, including early datasets that were from the same project, but because of smeary spots or whatever, couldn't be solved. There might even be datasets with unknown PDB IDs because that particular project never did work out, or because the relevant protein sequence has been lost. Remember, few of these datasets will be less than 5 years old if we try to allow enough time for the original data collector to either solve it or graduate (and then cease to care). Even for the final dataset, there will be a delay, since the half-life between data collection and coordinate deposition in the PDB is still ~20 months. Plenty of time to forget. So, although the images were archived (probably named test and in a directory called john) it may be that the only way to figure out which PDB ID is the right answer is by processing them and comparing to all deposited Fs. Assume this was done. But there will always be some datasets that don't match any PDB. Are those interesting? What about ones that can't be processed? What about ones that can't even be indexed? There may be a lot of those! (hypothetically, of course). Anyway, assume that someone did go through all the trouble to make these datasets available for download, just in case they are interesting, and annotated them as much as possible. There will be about 20 datasets for any given PDB ID. Now assume that for each of these datasets this hypothetical website has two links, one for the raw data, which will average ~2 GB per wedge (after gzip compression, taking at least ~45 min to download), and a second link for a lossy compressed version, which is only ~100 MB/wedge (2 min download). When decompressed, the images will visually look pretty much like the originals, and generally give you very similar Rmerge, Rcryst, Rfree, I/sigma, anomalous differences, and all other statistics when processed with contemporary software. Perhaps a bit worse. Essentially, lossy compression is equivalent to adding noise to the images. Which one would you try first? Does lossy compression make it easier to hunt for interesting datasets? Or is it just too repugnant to have modified the data in any way shape or form ... after the detector manufacturer's software has corrected it? Would it suffice to simply supply a couple of example images for download instead? -James Holton MAD Scientist
[ccp4bb] Job IRC2749: Postdoc position at LANL in neutron protein crystallography
Detailed Description Immediate postdoctoral positions are available at the neutron Protein Crystallography Station of the Los Alamos National Laboratory. We are looking for crystallographers and/or biochemists to conduct research in the protein structure-function field, with a focus on using joint neutron and X_ray crystallography approaches. Projects include, but are not limited to, studies in mechanistic enzymology, rational protein engineering, and computational biology (QM, QM/MM, MD). We are particularly interested in individuals with a strong background in protein expression, purification and crystallization of membrane proteins as part of a collaborative effort aimed at determining neutron structures of these targets. An incumbent should be proficient in all aspects of macromolecular crystallography, including protein production, crystallization, data collection and processing, structure refinement and analysis. The postdoctoral associates will be trained in neutron protein crystallography techniques and joint X-ray/neutron refinement approaches. The postdoctoral researchers will be expected to participate in the PCS user program part-time as well as the collaborative efforts of the team. A Ph.D. in biochemistry a related discipline is required. Knowledge of mechanistic biochemistry and enzymology is desired. Eligible persons should be within 5 years of receiving their Ph.D. degree. *Responsibilities:* 1. Work on projects to support the PCS user program and DOE-OBER mission areas. 2. Develop and work on an independent research project in the field of joint X-ray and neutron crystallography. 3. Work as part of a collaborative team of interdisciplinary scientists. 4. Publish and present results at both internal and external scientific meetings. 5. Foster and establish project collaborations within and outside LANL. 6. Incorporate and develop new scientific advances into research and processes. Job Requirements *Minimum Job Requirements:* 1. Expertise in all aspects of single crystal macromolecular diffraction studies, including a proven track record of developing and optimizing crystallization for X-ray structure determination. 2. Experience with protein production, characterization and biochemical methods (protein expression, purification, quantification, enzymatic assays). 3. Expertise in some of solution biophysical methods such as differential scanning calorimetry, mass spectrometry, enzyme kinetics assays, UV-Vis and fluorescence spectroscopy. 4. Familiarity with protein modeling. 5.Have a proven track record of problem solving, good organization, strong communication, multi-tasking and teamwork skills and the ability to collaborate within multidisciplinary teams. 6. Committed to high quality work with a strong user focus. 7. Have a proven track record of strong written, oral communication, and presentation skills. *Education: *Ph.D. in chemistry, biochemistry or macromolecular crystallography in the last 5 years. Additional Details *Notes to Applicants: *Interested scientists should contact Dr. A. Kovalevsky, Bioscience Division, Los Alamos National Laboratory, e-mail: a...@lanl.gov, or Dr. S.Z. Fisher, Bioscience Division, Los Alamos National Laboratory, e-mail: zfis...@lanl.gov a...@lanl.gov. Send your CV and the list of publications. *Pre-Employment Drug Test:* The Laboratory requires successful applicants to complete a pre-employment drug test and maintains a substance abuse policy that includes random drug testing. Candidates may be considered for a Director's Fellowship and outstanding candidates may be considered for the prestigious Marie Curie, Richard P. Feynman, J. Robert Oppenheimer, or Frederick Reines Fellowships. For general information refer to the Postdoctoral Programhttp://www.lanl.gov/science/postdocs/ page. *Equal Opportunity:** *Los Alamos National Laboratory is an equal opportunity employer and supports a diverse and inclusive workforce. We welcome and encourage applications from the broadest possible range of qualified candidates. The Laboratory is also committed to making our workplace accessible to individuals with disabilities and will provide reasonable accommodations, upon request, for individuals to participate in the application and hiring process. To request such an accommodation, please send an email to applyh...@lanl.gov or call 1-505-665-5627. * *
[ccp4bb] Job IRC2749: Postdoc position at LANL in neutron protein crystallography
Detailed Description Immediate postdoctoral positions are available at the neutron Protein Crystallography Station of the Los Alamos National Laboratory. We are looking for crystallographers and/or biochemists to conduct research in the protein structure-function field, with a focus on using joint neutron and X_ray crystallography approaches. Projects include, but are not limited to, studies in mechanistic enzymology, rational protein engineering, and computational biology (QM, QM/MM, MD). We are particularly interested in individuals with a strong background in protein expression, purification and crystallization of membrane proteins as part of a collaborative effort aimed at determining neutron structures of these targets. An incumbent should be proficient in all aspects of macromolecular crystallography, including protein production, crystallization, data collection and processing, structure refinement and analysis. The postdoctoral associates will be trained in neutron protein crystallography techniques and joint X-ray/neutron refinement approaches. The postdoctoral researchers will be expected to participate in the PCS user program part-time as well as the collaborative efforts of the team. A Ph.D. in biochemistry a related discipline is required. Knowledge of mechanistic biochemistry and enzymology is desired. Eligible persons should be within 5 years of receiving their Ph.D. degree. *Responsibilities:* 1. Work on projects to support the PCS user program and DOE-OBER mission areas. 2. Develop and work on an independent research project in the field of joint X-ray and neutron crystallography. 3. Work as part of a collaborative team of interdisciplinary scientists. 4. Publish and present results at both internal and external scientific meetings. 5. Foster and establish project collaborations within and outside LANL. 6. Incorporate and develop new scientific advances into research and processes. Job Requirements *Minimum Job Requirements:* 1. Expertise in all aspects of single crystal macromolecular diffraction studies, including a proven track record of developing and optimizing crystallization for X-ray structure determination. 2. Experience with protein production, characterization and biochemical methods (protein expression, purification, quantification, enzymatic assays). 3. Expertise in some of solution biophysical methods such as differential scanning calorimetry, mass spectrometry, enzyme kinetics assays, UV-Vis and fluorescence spectroscopy. 4. Familiarity with protein modeling. 5.Have a proven track record of problem solving, good organization, strong communication, multi-tasking and teamwork skills and the ability to collaborate within multidisciplinary teams. 6. Committed to high quality work with a strong user focus. 7. Have a proven track record of strong written, oral communication, and presentation skills. *Education: *Ph.D. in chemistry, biochemistry or macromolecular crystallography in the last 5 years. Additional Details *Notes to Applicants: *Interested scientists should contact Dr. A. Kovalevsky, Bioscience Division, Los Alamos National Laboratory, e-mail: a...@lanl.gov, or Dr. S.Z. Fisher, Bioscience Division, Los Alamos National Laboratory, e-mail: zfis...@lanl.gov a...@lanl.gov. Send your CV and the list of publications. *Pre-Employment Drug Test:* The Laboratory requires successful applicants to complete a pre-employment drug test and maintains a substance abuse policy that includes random drug testing. Candidates may be considered for a Director's Fellowship and outstanding candidates may be considered for the prestigious Marie Curie, Richard P. Feynman, J. Robert Oppenheimer, or Frederick Reines Fellowships. For general information refer to the Postdoctoral Programhttp://www.lanl.gov/science/postdocs/ page. *Equal Opportunity:** *Los Alamos National Laboratory is an equal opportunity employer and supports a diverse and inclusive workforce. We welcome and encourage applications from the broadest possible range of qualified candidates. The Laboratory is also committed to making our workplace accessible to individuals with disabilities and will provide reasonable accommodations, upon request, for individuals to participate in the application and hiring process. To request such an accommodation, please send an email to applyh...@lanl.gov or call 1-505-665-5627.
Re: [ccp4bb] Archiving Images for PDB Depositions
Reluctantly I am going to add my 2 cents to the discussion, with various aspects in one e-mail. - It is easy to overlook that our business is to answer biological/biochemical questions. This is what you (generally) get grants for to do (showing that these questions are of critical importance in your ability to do science). Crystallography is one tool that we use to acquire evidence to answer questions. The time that you could get a Nobel prize for doing a structure or a PhD for doing a structure is gone. Even writing a publication with just a structure is now not as common anymore as it used to be. So the biochemistry drives crystallography. It is not reasonable to say that once you have collected data and you don't publish the data for 5 years, you are no longer interested. What that generally means is that the rest of science is not cooperating. In short: I would be against a strict rule for mandatory deposition of raw data, even after a long time. An example: I have data sets here with low resolution data (~10A) presumably of protein structures that have known structures for prokaryotes, but not for eukaryotes and it would be exciting if we could prove (or disprove) that they look the same. The problem, apart from resolution, is that the spots are so few and fuzzy that I cannot index the images. The main reason why I save the images is that if/when someone comes to me to say that they think they have made better crystals, we have something to compare. (Thanks to Gerard B. for encouragement to write this item :-) - For those that think that we have come to the end of development in crystallography, James Holton (thank you) has described nicely why we should not think this. We are all happy if our model generates an R-factor of 20%. Even small molecule crystallographers would wave that away in an instant as inadequate. However, everybody has come to accept that this is fine for protein crystallography. It would be better if our models were more consistent with the experimental data. How could we make such models without access to lots of data? As a student I was always taught (when asking why 20% is actually good) that we don't (for example) model solvent. Why not? It is not easy. If we did, would the 20% go down to 3%? I am guessing not, there are other errors that come into play. - Gerard K. has eloquently spoken about cost and effort. Since I maintain a small (local) archive of images, I can affirm his words: a large-capacity disk is inexpensive ($100). A box that the disk sits in is inexpensive ($1000). A second box that sits in a different building, away for security reasons) that holds the backup, is inexpensive ($1400, with 4 disks). The infrastructure to run these boxes (power, fiber optics, boxes in between) is slightly more expensive. What is *really* expensive is people maintaining everything. It was a huge surprise to me (and my boss) how much time and effort it takes to annotate all data sets, rename them appropriately and file them away in a logical place so that anyone (who understands the scheme) can find them back. Therefore (!) the reason why this should be centralized is that the cost per data set stored goes down - it is more efficient. One person can process several (many, if largely automated) data sets per day. It is also of interest that we locally (2-5 people for a project) may not agree on what exactly should be stored. Therefore there is no hope that we can find consensus in the world, but we CAN get a reasonably compromise. But it is tough: I have heard the argument that data for published structures should be kept in case someone wants to see and/or go back, while I have also heard the argument that once published it is signed, sealed and delivered and it can go, while UNpublished data should be preserved because eventually it hopefully will get to publication. Each argument is reasonably sensible, but the conclusions are opposite. (I maintain both classes of data sets.) - Granting agencies in the US generally require that you archive scientific data. What is not yet clear is whether they would be willing to pay for a centralized facility that would do that. After all, it is more exciting to NIH to give money for the study of a disease than it is to store data. But if the argument were made that each grant(ee) would be more efficient and could apply more money towards the actual problem, this might convince them. For that we would need a reasonable consensus what we want and why. More power to John. H and The Committee. Thanks to complete silence on the BB today I am finally caught up reading! Mark van der Woerd -Original Message- From: James Holton jmhol...@lbl.gov To: CCP4BB CCP4BB@JISCMAIL.AC.UK Sent: Tue, Nov 1, 2011 11:07 am Subject: Re: [ccp4bb] Archiving Images for PDB Depositions On general scientific principles the reasons for archiving raw data all boil down to one thing: there
Re: [ccp4bb] image compression
So far, all I really have is a proof of concept compression algorithm here: http://bl831.als.lbl.gov/~jamesh/lossy_compression/ Not exactly portable since you need ffmpeg and the x264 libraries set up properly. The latter seems to be constantly changing things and breaking the former, so I'm not sure how future proof my algorithm is. Something that caught my eye recently was fractal compression, particularly since FIASCO has been part of the NetPBM package for about 10 years now. Seems to give comparable compression vs quality as x264 (to my eye), but I'm presently wondering if I'd be wasting my time developing this further? Will the crystallographic world simply turn up its collective nose at lossy images? Even if it means waiting 6 years for Nielsen's Law to make up the difference in network bandwidth? -James Holton MAD Scientist On Mon, Nov 7, 2011 at 10:01 AM, Herbert J. Bernstein y...@bernstein-plus-sons.com wrote: This is a very good question. I would suggest that both versions of the old data are useful. If was is being done is simple validation and regeneration of what was done before, then the lossy compression should be fine in most instances. However, when what is being done hinges on the really fine details -- looking for lost faint spots just peeking out from the background, looking at detailed peak profiles -- then the lossless compression version is the better choice. The annotation for both sets should be the same. The difference is in storage and network bandwidth. Hopefully the fraud issue will never again rear its ugly head, but if it should, then having saved the losslessly compressed images might prove to have been a good idea. To facilitate experimentation with the idea, if there is agreement on the particular lossy compression to be used, I would be happy to add it as an option in CBFlib. Right now all the compressions we have are lossless. Regards, Herbert = Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 y...@dowling.edu = On Mon, 7 Nov 2011, James Holton wrote: At the risk of sounding like another poll, I have a pragmatic question for the methods development community: Hypothetically, assume that there was a website where you could download the original diffraction images corresponding to any given PDB file, including early datasets that were from the same project, but because of smeary spots or whatever, couldn't be solved. There might even be datasets with unknown PDB IDs because that particular project never did work out, or because the relevant protein sequence has been lost. Remember, few of these datasets will be less than 5 years old if we try to allow enough time for the original data collector to either solve it or graduate (and then cease to care). Even for the final dataset, there will be a delay, since the half-life between data collection and coordinate deposition in the PDB is still ~20 months. Plenty of time to forget. So, although the images were archived (probably named test and in a directory called john) it may be that the only way to figure out which PDB ID is the right answer is by processing them and comparing to all deposited Fs. Assume this was done. But there will always be some datasets that don't match any PDB. Are those interesting? What about ones that can't be processed? What about ones that can't even be indexed? There may be a lot of those! (hypothetically, of course). Anyway, assume that someone did go through all the trouble to make these datasets available for download, just in case they are interesting, and annotated them as much as possible. There will be about 20 datasets for any given PDB ID. Now assume that for each of these datasets this hypothetical website has two links, one for the raw data, which will average ~2 GB per wedge (after gzip compression, taking at least ~45 min to download), and a second link for a lossy compressed version, which is only ~100 MB/wedge (2 min download). When decompressed, the images will visually look pretty much like the originals, and generally give you very similar Rmerge, Rcryst, Rfree, I/sigma, anomalous differences, and all other statistics when processed with contemporary software. Perhaps a bit worse. Essentially, lossy compression is equivalent to adding noise to the images. Which one would you try first? Does lossy compression make it easier to hunt for interesting datasets? Or is it just too repugnant to have modified the data in any way shape or form ... after the detector manufacturer's software has corrected it? Would it suffice to simply supply a couple of example images for download instead? -James Holton MAD Scientist
Re: [ccp4bb] image compression
Dear James, You are _not_ wasting your time. Even if the lossy compression ends up only being used to stage preliminary images forward on the net while full images slowly work their way forward, having such a compression that preserves the crystallography in the image will be an important contribution to efficient workflows. Personally I suspect that such images will have more important, uses, e.g. facilitating real-time monitoring of experiments using detectors providing full images at data rates that simply cannot be handled without major compression. We are already in that world. The reason that the Dectris images use Andy Hammersley's byte-offset compression, rather than going uncompressed or using CCP4 compression is that in January 2007 we were sitting right on the edge of a nasty CPU-performance/disk bandwidth tradeoff, and the byte-offset compression won the competition. In that round a lossless compression was sufficient, but just barely. In the future, I am certain some amount of lossy compression will be needed to sample the dataflow while the losslessly compressed images work their way through a very back-logged queue to the disk. In the longer term, I can see people working with lossy compressed images for analysis of massive volumes of images to select the 1% to 10% that will be useful in a final analysis, and may need to be used in a lossless mode. If you can reject 90% of the images with a fraction of the effort needed to work with the resulting 10% of good images, you have made a good decision. An then there is the inevitable need to work with images on portable devices with limited storage over cell and WIFI networks. ... I would not worry about upturned noses. I would worry about the engineering needed to manage experiments. Lossy compression can be an important part of that engineering. Regards, Herbert At 4:09 PM -0800 11/7/11, James Holton wrote: So far, all I really have is a proof of concept compression algorithm here: http://bl831.als.lbl.gov/~jamesh/lossy_compression/ Not exactly portable since you need ffmpeg and the x264 libraries set up properly. The latter seems to be constantly changing things and breaking the former, so I'm not sure how future proof my algorithm is. Something that caught my eye recently was fractal compression, particularly since FIASCO has been part of the NetPBM package for about 10 years now. Seems to give comparable compression vs quality as x264 (to my eye), but I'm presently wondering if I'd be wasting my time developing this further? Will the crystallographic world simply turn up its collective nose at lossy images? Even if it means waiting 6 years for Nielsen's Law to make up the difference in network bandwidth? -James Holton MAD Scientist On Mon, Nov 7, 2011 at 10:01 AM, Herbert J. Bernstein y...@bernstein-plus-sons.com wrote: This is a very good question. I would suggest that both versions of the old data are useful. If was is being done is simple validation and regeneration of what was done before, then the lossy compression should be fine in most instances. However, when what is being done hinges on the really fine details -- looking for lost faint spots just peeking out from the background, looking at detailed peak profiles -- then the lossless compression version is the better choice. The annotation for both sets should be the same. The difference is in storage and network bandwidth. Hopefully the fraud issue will never again rear its ugly head, but if it should, then having saved the losslessly compressed images might prove to have been a good idea. To facilitate experimentation with the idea, if there is agreement on the particular lossy compression to be used, I would be happy to add it as an option in CBFlib. Right now all the compressions we have are lossless. Regards, Herbert = Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 y...@dowling.edu = On Mon, 7 Nov 2011, James Holton wrote: At the risk of sounding like another poll, I have a pragmatic question for the methods development community: Hypothetically, assume that there was a website where you could download the original diffraction images corresponding to any given PDB file, including early datasets that were from the same project, but because of smeary spots or whatever, couldn't be solved. There might even be datasets with unknown PDB IDs because that particular project never did work out, or because the relevant protein sequence has been lost. Remember, few of these datasets will be less than 5 years old if we try to allow enough time for the original data collector to either solve it or graduate (and then cease to care). Even for the
Re: [ccp4bb] image compression
I'll second that... can't remember anybody on the barricades about corrected CCD images, but they've been just so much more practical. Different kind of problem, I know, but equivalent situation: the people to ask are not the purists, but the ones struggling with the huge volumes of data. I'll take the lossy version any day if it speeds up real-time evaluation of data quality, helps me browse my datasets, and allows me to do remote but intelligent data collection. phx. On 08/11/2011 02:22, Herbert J. Bernstein wrote: Dear James, You are _not_ wasting your time. Even if the lossy compression ends up only being used to stage preliminary images forward on the net while full images slowly work their way forward, having such a compression that preserves the crystallography in the image will be an important contribution to efficient workflows. Personally I suspect that such images will have more important, uses, e.g. facilitating real-time monitoring of experiments using detectors providing full images at data rates that simply cannot be handled without major compression. We are already in that world. The reason that the Dectris images use Andy Hammersley's byte-offset compression, rather than going uncompressed or using CCP4 compression is that in January 2007 we were sitting right on the edge of a nasty CPU-performance/disk bandwidth tradeoff, and the byte-offset compression won the competition. In that round a lossless compression was sufficient, but just barely. In the future, I am certain some amount of lossy compression will be needed to sample the dataflow while the losslessly compressed images work their way through a very back-logged queue to the disk. In the longer term, I can see people working with lossy compressed images for analysis of massive volumes of images to select the 1% to 10% that will be useful in a final analysis, and may need to be used in a lossless mode. If you can reject 90% of the images with a fraction of the effort needed to work with the resulting 10% of good images, you have made a good decision. An then there is the inevitable need to work with images on portable devices with limited storage over cell and WIFI networks. ... I would not worry about upturned noses. I would worry about the engineering needed to manage experiments. Lossy compression can be an important part of that engineering. Regards, Herbert At 4:09 PM -0800 11/7/11, James Holton wrote: So far, all I really have is a proof of concept compression algorithm here: http://bl831.als.lbl.gov/~jamesh/lossy_compression/ Not exactly portable since you need ffmpeg and the x264 libraries set up properly. The latter seems to be constantly changing things and breaking the former, so I'm not sure how future proof my algorithm is. Something that caught my eye recently was fractal compression, particularly since FIASCO has been part of the NetPBM package for about 10 years now. Seems to give comparable compression vs quality as x264 (to my eye), but I'm presently wondering if I'd be wasting my time developing this further? Will the crystallographic world simply turn up its collective nose at lossy images? Even if it means waiting 6 years for Nielsen's Law to make up the difference in network bandwidth? -James Holton MAD Scientist On Mon, Nov 7, 2011 at 10:01 AM, Herbert J. Bernstein y...@bernstein-plus-sons.com wrote: This is a very good question. I would suggest that both versions of the old data are useful. If was is being done is simple validation and regeneration of what was done before, then the lossy compression should be fine in most instances. However, when what is being done hinges on the really fine details -- looking for lost faint spots just peeking out from the background, looking at detailed peak profiles -- then the lossless compression version is the better choice. The annotation for both sets should be the same. The difference is in storage and network bandwidth. Hopefully the fraud issue will never again rear its ugly head, but if it should, then having saved the losslessly compressed images might prove to have been a good idea. To facilitate experimentation with the idea, if there is agreement on the particular lossy compression to be used, I would be happy to add it as an option in CBFlib. Right now all the compressions we have are lossless. Regards, Herbert = Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 y...@dowling.edu = On Mon, 7 Nov 2011, James Holton wrote: At the risk of sounding like another poll, I have a pragmatic question for the methods development community: Hypothetically, assume that there was a
[ccp4bb] weight matrix and R-FreeR gap optimization
Dear ccp4bbers, I wonder if someone can help me defining proper weight matrix term in Refmac5 to lower the R-FreeR gap. The log file indicates weight matrix of 1.98 with a gap of 7. Thanks for suggestions in advance. James
Re: [ccp4bb] image compression
So the purists of speed seem to be more relevant than the purists of images. We complain all the time about how many errors we have out there in our experiments that we seemingly cannot account for. Yet, would we add another source? Sorry if I'm missing something serious here, but I cannot understand this artificial debate. You can do useful remote data collection without having look at *each* image. Miguel Le 08/11/2011 06:27, Frank von Delft a écrit : I'll second that... can't remember anybody on the barricades about corrected CCD images, but they've been just so much more practical. Different kind of problem, I know, but equivalent situation: the people to ask are not the purists, but the ones struggling with the huge volumes of data. I'll take the lossy version any day if it speeds up real-time evaluation of data quality, helps me browse my datasets, and allows me to do remote but intelligent data collection. phx. On 08/11/2011 02:22, Herbert J. Bernstein wrote: Dear James, You are _not_ wasting your time. Even if the lossy compression ends up only being used to stage preliminary images forward on the net while full images slowly work their way forward, having such a compression that preserves the crystallography in the image will be an important contribution to efficient workflows. Personally I suspect that such images will have more important, uses, e.g. facilitating real-time monitoring of experiments using detectors providing full images at data rates that simply cannot be handled without major compression. We are already in that world. The reason that the Dectris images use Andy Hammersley's byte-offset compression, rather than going uncompressed or using CCP4 compression is that in January 2007 we were sitting right on the edge of a nasty CPU-performance/disk bandwidth tradeoff, and the byte-offset compression won the competition. In that round a lossless compression was sufficient, but just barely. In the future, I am certain some amount of lossy compression will be needed to sample the dataflow while the losslessly compressed images work their way through a very back-logged queue to the disk. In the longer term, I can see people working with lossy compressed images for analysis of massive volumes of images to select the 1% to 10% that will be useful in a final analysis, and may need to be used in a lossless mode. If you can reject 90% of the images with a fraction of the effort needed to work with the resulting 10% of good images, you have made a good decision. An then there is the inevitable need to work with images on portable devices with limited storage over cell and WIFI networks. ... I would not worry about upturned noses. I would worry about the engineering needed to manage experiments. Lossy compression can be an important part of that engineering. Regards, Herbert At 4:09 PM -0800 11/7/11, James Holton wrote: So far, all I really have is a proof of concept compression algorithm here: http://bl831.als.lbl.gov/~jamesh/lossy_compression/ Not exactly portable since you need ffmpeg and the x264 libraries set up properly. The latter seems to be constantly changing things and breaking the former, so I'm not sure how future proof my algorithm is. Something that caught my eye recently was fractal compression, particularly since FIASCO has been part of the NetPBM package for about 10 years now. Seems to give comparable compression vs quality as x264 (to my eye), but I'm presently wondering if I'd be wasting my time developing this further? Will the crystallographic world simply turn up its collective nose at lossy images? Even if it means waiting 6 years for Nielsen's Law to make up the difference in network bandwidth? -James Holton MAD Scientist On Mon, Nov 7, 2011 at 10:01 AM, Herbert J. Bernstein y...@bernstein-plus-sons.com wrote: This is a very good question. I would suggest that both versions of the old data are useful. If was is being done is simple validation and regeneration of what was done before, then the lossy compression should be fine in most instances. However, when what is being done hinges on the really fine details -- looking for lost faint spots just peeking out from the background, looking at detailed peak profiles -- then the lossless compression version is the better choice. The annotation for both sets should be the same. The difference is in storage and network bandwidth. Hopefully the fraud issue will never again rear its ugly head, but if it should, then having saved the losslessly compressed images might prove to have been a good idea. To facilitate experimentation with the idea, if there is agreement on the particular lossy compression to be used, I would be happy to add it as an option in CBFlib. Right now all the compressions we have are lossless. Regards, Herbert
Re: [ccp4bb] image compression
I think that real universal image depositions will not take off without a newish type of compression that will speed up and ease up things. Therefore the compression discussion is highly relevant - I would even suggest to go to mathematicians and software engineers to provide a highly efficient compression format for our type of data - our data sets have some very typical repetitive features so they can be very likely compressed as a whole set without loosing information (differential compression in the series) but this needs experts .. Jan Dohnalek On Tue, Nov 8, 2011 at 8:19 AM, Miguel Ortiz Lombardia miguel.ortiz-lombar...@afmb.univ-mrs.fr wrote: So the purists of speed seem to be more relevant than the purists of images. We complain all the time about how many errors we have out there in our experiments that we seemingly cannot account for. Yet, would we add another source? Sorry if I'm missing something serious here, but I cannot understand this artificial debate. You can do useful remote data collection without having look at *each* image. Miguel Le 08/11/2011 06:27, Frank von Delft a écrit : I'll second that... can't remember anybody on the barricades about corrected CCD images, but they've been just so much more practical. Different kind of problem, I know, but equivalent situation: the people to ask are not the purists, but the ones struggling with the huge volumes of data. I'll take the lossy version any day if it speeds up real-time evaluation of data quality, helps me browse my datasets, and allows me to do remote but intelligent data collection. phx. On 08/11/2011 02:22, Herbert J. Bernstein wrote: Dear James, You are _not_ wasting your time. Even if the lossy compression ends up only being used to stage preliminary images forward on the net while full images slowly work their way forward, having such a compression that preserves the crystallography in the image will be an important contribution to efficient workflows. Personally I suspect that such images will have more important, uses, e.g. facilitating real-time monitoring of experiments using detectors providing full images at data rates that simply cannot be handled without major compression. We are already in that world. The reason that the Dectris images use Andy Hammersley's byte-offset compression, rather than going uncompressed or using CCP4 compression is that in January 2007 we were sitting right on the edge of a nasty CPU-performance/disk bandwidth tradeoff, and the byte-offset compression won the competition. In that round a lossless compression was sufficient, but just barely. In the future, I am certain some amount of lossy compression will be needed to sample the dataflow while the losslessly compressed images work their way through a very back-logged queue to the disk. In the longer term, I can see people working with lossy compressed images for analysis of massive volumes of images to select the 1% to 10% that will be useful in a final analysis, and may need to be used in a lossless mode. If you can reject 90% of the images with a fraction of the effort needed to work with the resulting 10% of good images, you have made a good decision. An then there is the inevitable need to work with images on portable devices with limited storage over cell and WIFI networks. ... I would not worry about upturned noses. I would worry about the engineering needed to manage experiments. Lossy compression can be an important part of that engineering. Regards, Herbert At 4:09 PM -0800 11/7/11, James Holton wrote: So far, all I really have is a proof of concept compression algorithm here: http://bl831.als.lbl.gov/~jamesh/lossy_compression/ Not exactly portable since you need ffmpeg and the x264 libraries set up properly. The latter seems to be constantly changing things and breaking the former, so I'm not sure how future proof my algorithm is. Something that caught my eye recently was fractal compression, particularly since FIASCO has been part of the NetPBM package for about 10 years now. Seems to give comparable compression vs quality as x264 (to my eye), but I'm presently wondering if I'd be wasting my time developing this further? Will the crystallographic world simply turn up its collective nose at lossy images? Even if it means waiting 6 years for Nielsen's Law to make up the difference in network bandwidth? -James Holton MAD Scientist On Mon, Nov 7, 2011 at 10:01 AM, Herbert J. Bernstein y...@bernstein-plus-sons.com wrote: This is a very good question. I would suggest that both versions of the old data are useful. If was is being done is simple validation and regeneration of what was done before, then the lossy compression should be fine in most instances. However, when what is being done hinges