Re: [ccp4bb] Crystalization in low PH

2011-11-07 Thread Boaz Shaanan
Hi,

I'm sure there are proteins that were crystallized at low pH but I can't 
remember which. The best thing is to go to the BMCD database: 
http://xpdb.nist.gov:8060/BMCD4/index.faces
and query it with the key pH (look into advanced search).

 Cheers,

  Boaz


Boaz Shaanan, Ph.D.
Dept. of Life Sciences
Ben-Gurion University of the Negev
Beer-Sheva 84105
Israel

E-mail: bshaa...@bgu.ac.il
Phone: 972-8-647-2220  Skype: boaz.shaanan
Fax:   972-8-647-2992 or 972-8-646-1710






From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Sam Arnosti 
[meisam.nosr...@gmail.com]
Sent: Monday, November 07, 2011 7:19 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: [ccp4bb] Crystalization in low PH

Hi everyone

I have a protein that is extraordinarily stable at PH=3.0 or even 2.0.

I want to crystallize it in the  low PH and compare the differences between the 
crystals in regular PH and low PH.

I was wondering how people set up the boxes in low PH, as usual buffers are 
mostly less acidic.

Regards

Sam


Re: [ccp4bb] Crystalization in low PH

2011-11-07 Thread George M. Sheldrick
Tendamistat (1OK0) was crystallized at pH 1.3 and diffracted to 0.93A.
George

On Mon, Nov 07, 2011 at 05:19:29AM +, Sam Arnosti wrote:
 Hi everyone
 
 I have a protein that is extraordinarily stable at PH=3.0 or even 2.0.
 
 I want to crystallize it in the  low PH and compare the differences between 
 the crystals in regular PH and low PH.
 
 I was wondering how people set up the boxes in low PH, as usual buffers are 
 mostly less acidic.
 
 Regards
 
 Sam
 

-- 
Prof. George M. Sheldrick FRS
Dept. Structural Chemistry, 
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-22582


[ccp4bb] Announcement to MX-workshop and invitation to annual HZB-BESSY users meeting

2011-11-07 Thread Müller , Uwe
 Announcement MX-Satellite workshop New developments in macromolecular 
crystallography using synchrotron radiation

This workshop will take place on Nov 30, 2011 as a satellite to the annual HZB 
users meeting at BESSY-II Berlin
and we would like to cordially invite you to participate in this workshop.
The following speakers have already confirmed their participation:
- Andrew Thompson (Soleil)
- Martin Fuchs (PSI)
- Juan Sanchez-Weatherby (DIAMOND)
- Alke Meents (DESY)
- Manfred S. Weiss (HZB)
- Karthik Paithankar (HZB)
- Sandra Pühringer (HZB)
- Gerd Weber (FU-Berlin)

Registration and further information can be found here:
http://www.helmholtz-berlin.de/user/usersmeetings/users-meeting-2011/index_en.html
The registration deadline is: November 21., 2011

Additionally, we would hereby like to invite you to participate in the
and HZB Users meeting, which will take place from Dec 01-02. ,  2011
in Berlin-Adlershof in the WISTA main building. As every year, we will reward 
the best MX-beamlines related poster presentation with the valuable BESSY-MX 
poster award.

Please register:
http://www.helmholtz-berlin.de/user/usersmeetings/users-meeting-2011/index_en.html
The registration deadline is: November 21., 2011

We are very much looking forward to see you.

Uwe Mueller  Manfred Weiss


Dr. Uwe Mueller
Soft Matter and Functional Materials
Macromolecular Crystallography (BESSY-MX) | Group leader
Elektronenspeicherring BESSY II
Albert-Einstein-Str. 15, D-12489 Berlin, Germany

Fon: +49 30 8062 14974
Fax: +49 30 8062 14975
url: www.helmholtz-berlin.de/bessy-mxhttp://www.helmholtz-berlin.de/bessy-mx
email:u...@helmholtz-berlin.demailto:u...@helmholtz-berlin.de

Helmholtz-Zentrum Berlin für Materialien und Energie GmbH
Hahn-Meitner-Platz 1, 14109 Berlin
Vorsitzender des Aufsichtsrats: Prof. Dr. Dr. h.c. mult. Joachim Treusch
Stellvertretende Vorsitzende: Dr. Beatrix Vierkorn-Rudolph
Geschäftsführer: Prof. Dr. Anke Rita Kaysser-Pyzalla, Dr. Ulrich Breuer
Sitz der Gesellschaft: Berlin
Handelsregister: AG Charlottenburg, 89 HRB 5583




Helmholtz-Zentrum Berlin für Materialien und Energie GmbH

Mitglied der Hermann von Helmholtz-Gemeinschaft Deutscher Forschungszentren e.V.

Aufsichtsrat: Vorsitzender Prof. Dr. Dr. h.c. mult. Joachim Treusch, stv. 
Vorsitzende Dr. Beatrix Vierkorn-Rudolph
Geschäftsführer: Prof. Dr. Anke Rita Kaysser-Pyzalla, Dr. Ulrich Breuer

Sitz Berlin, AG Charlottenburg, 89 HRB 5583

Postadresse:
Hahn-Meitner-Platz 1
D-14109 Berlin

http://www.helmholtz-berlin.de


Re: [ccp4bb] how to use refine ligand containing heavry atom

2011-11-07 Thread Eleanor Dodson

Something must be wrong..

If you are using REFMAC it will give you a list of bad contacts etc in 
the log file..


Check those and try to correct them..
Eleanor


On 11/06/2011 05:04 PM, Zhipu Luo wrote:

Dear all

I have a protein soaked in a coordination compound containing platinum. Due to 
some reason, I do not get the anomalous data at 1.072 angstrom, only got a set 
of data at 0.973 angstrom. I have solved the phase through molecular 
replacement, and refined the model to  Rfactor=0.2204, Rfree=0.2447 before 
modeling the coordination compound. However, the Rfactor rised to 0.3345 and 
Rfree rised to 0.3425 after refining the model with coordination compound. how 
to deal with this problem? Hoping for help!


thank you for your time
Zhipu
fuzhou
China


Re: [ccp4bb] Crystalization in low PH

2011-11-07 Thread Craig A. Bingman
I'm not convinced that you need a conventional buffer at pH 2 or 3.  At pH 2, 
the hydrogen ion concentration is 10 mM.  If you want to use something else, 
the second pKa for sulfuric acid is around 2.  The first pKa for phosphoric 
acid is slightly higher than 2.  Lactic acid has a pKa close to 3.  Formic acid 
has a pKa just under 4.  Most of these numbers were in an appendix in the first 
chemistry text you ever used.  wink  These numbers imply pretty strongly that 
most crystallization screens emphasizing common salts will require determined 
modification to hit these low pH values, because many stabilizing anions in the 
Hoffmeister series will be partially or completely protonated at these pH 
values.  PEG and organic screens will require a smaller hammer to retrofit.

On Nov 6, 2011, at 11:19 PM, Sam Arnosti wrote:

 Hi everyone
 
 I have a protein that is extraordinarily stable at PH=3.0 or even 2.0.
 
 I want to crystallize it in the  low PH and compare the differences between 
 the crystals in regular PH and low PH.
 
 I was wondering how people set up the boxes in low PH, as usual buffers are 
 mostly less acidic.
 
 Regards
 
 Sam


Re: [ccp4bb] Crystalization in low PH

2011-11-07 Thread Enrico Stura

I have crystallized in PEG with citrate at pH 3. If you want to go lower
I would suggest maleate:

effective pH range   pKa 25°Cbuffer
1.2-2.6   1.97   maleate (pK1)
2.2-6.53.13  citrate (pK1)

Enrico.


On Mon, 07 Nov 2011 14:15:02 +0100, Craig A. Bingman  
cbing...@biochem.wisc.edu wrote:


I'm not convinced that you need a conventional buffer at pH 2 or 3.  At  
pH 2, the hydrogen ion concentration is 10 mM.  If you want to use  
something else, the second pKa for sulfuric acid is around 2.  The first  
pKa for phosphoric acid is slightly higher than 2.  Lactic acid has a  
pKa close to 3.  Formic acid has a pKa just under 4.  Most of these  
numbers were in an appendix in the first chemistry text you ever used.   
wink  These numbers imply pretty strongly that most crystallization  
screens emphasizing common salts will require determined modification to  
hit these low pH values, because many stabilizing anions in the  
Hoffmeister series will be partially or completely protonated at these  
pH values.  PEG and organic screens will require a smaller hammer to  
retrofit.


On Nov 6, 2011, at 11:19 PM, Sam Arnosti wrote:


Hi everyone

I have a protein that is extraordinarily stable at PH=3.0 or even 2.0.

I want to crystallize it in the  low PH and compare the differences  
between the crystals in regular PH and low PH.


I was wondering how people set up the boxes in low PH, as usual buffers  
are mostly less acidic.


Regards

Sam



--
Enrico A. Stura D.Phil. (Oxon) ,Tel: 33 (0)1 69 08 4302 Office
Room 19, Bat.152,   Tel: 33 (0)1 69 08 9449Lab
LTMB, SIMOPRO, IBiTec-S, CE Saclay, 91191 Gif-sur-Yvette,   FRANCE
http://www-dsv.cea.fr/en/institutes/institute-of-biology-and-technology-saclay-ibitec-s/unites-de-recherche/department-of-molecular-engineering-of-proteins-simopro/molecular-toxinology-and-biotechnology-laboratory-ltmb/crystallogenesis-e.-stura
http://www.chem.gla.ac.uk/protein/mirror/stura/index2.html
e-mail: est...@cea.fr Fax: 33 (0)1 69 08 90 71


Re: [ccp4bb] Crystalization in low PH

2011-11-07 Thread Heping Zheng
I remembered that people had crystallize a series of
streptavidin-2-iminobiotin structures at a low pH. If it might help, check
the following PDBIDs:

2RTD
2RTE
2RTI
2RTK
2RTL



 Hi everyone

 I have a protein that is extraordinarily stable at PH=3.0 or even 2.0.

 I want to crystallize it in the  low PH and compare the differences
 between the crystals in regular PH and low PH.

 I was wondering how people set up the boxes in low PH, as usual buffers
 are mostly less acidic.

 Regards

 Sam



Re: [ccp4bb] Crystalization in low PH

2011-11-07 Thread Ed Pozharski
On Mon, 2011-11-07 at 05:19 +, Sam Arnosti wrote:
 Hi everyone
 
 I have a protein that is extraordinarily stable at PH=3.0 or even 2.0.
 
 I want to crystallize it in the  low PH and compare the differences between 
 the crystals in regular PH and low PH.
 
 I was wondering how people set up the boxes in low PH, as usual buffers are 
 mostly less acidic.
 
 Regards
 
 Sam

Not clear if you already have crystals at regular pH, but if you do,
you may consider direct transfer to lower pH.  Of course, crystals may
dissolve, which you could possibly prevent by cross-linking with
glutaraldehyde.  Three caveats: 
a) If lattice is incompatible with lower pH, even with cross-linking the
resolution may sink to essentially useless levels
b) I have no idea if the cross-linking will not be disrupted at really
low pH, perhaps someone else can comment on that
c) the 3rd reviewer can always say that lattice forces could have
prevented a conformational change.  But same goes for direct
crystallization at low pH (but caries less weight).

-- 
I'd jump in myself, if I weren't so good at whistling.
   Julian, King of Lemurs


[ccp4bb] vacancy for a research scientist at the ILL

2011-11-07 Thread Schober Anita

  
  

Anita Schober
Institut LAUE-LANGEVIN
Service Ressources Humaines
BP 156 - 38042 GRENOBLE Cedex 9
Tl.: + 33 (0)4 76 20 72 36
Fax : + 33 (0)4 76 20 77 99
E-mail : schob...@ill.fr 
Hello,
please find below the advertisement of a vacancy for a research
scientist at the ILL.

Human Resources Service
  Institut Laue Langevin




  

  
INSTITUT MAX VON LAUE PAUL LANGEVIN
  
  
DA/SRH/GRI-07/11/2011
ILL rf. 11/24
  


  

www.ill.fr
  
  

VACANCY 

  


  
The Institut
Laue-Langevin (ILL), situated in Grenoble, France, is
Europe's leading research facility for fundamental
research using neutrons. The ILL operates the brightest
neutron source in the world, reliably delivering intense
neutron beams to 40 unique scientific instruments. The
Institute welcomes more than 2000 visiting scientists
per year to carry out world-class research in
solid-state physics, crystallography, soft matter,
biology, chemistry and fundamental physics. Funded
primarily by its three founder members: France, Germany
and the United Kingdom, the ILL has also signed
scientific collaboration agreements with 12 other
countries. The Science Division currently has a vacancy
:
RESEARCH SCIENTIST - m/f -
(small angle neutron scattering) 

Small



angle neutron scattering (SANS) has become a major
component of structural biology at the ILL to study
interactions and low resolution structures in large
biological macromolecular complexes. Furthermore the PSB
(Partnership for Structural Biology) provides
additional outstanding facilities for structural biology, including a Deuteration
  Laboratory for isotope labeling of biological
  molecules, and a wide variety of complementary
  biophysical methods. With the presence of the European
  Molecular Biology Laboratory, the Institute of
  Structural Biology and the European Synchrotron
  Radiation Facility, the campus provides an exciting
  research environment for biology.
Duties:
The ILL is inviting applications for a scientist to take
charge of the biological aspects of the SANS instrument
D22 in the Large Scale Structures group. Duties would
include: instrument maintenance and development, running
data collection and analysis software, local contact for
biology experiments both on D22 and on the other SANS
instruments of the group, and to coordinate the system
of beamtime block allocation to experimenters. The
candidate will also be encouraged to develop strong
collaborations for her/his own scientific work. 
Qualifications



  and experience:
Ph.D. in physical or life sciences. We are particularly
interested in highly motivated candidates with an active
research interest in biology, experience in neutron or
X-ray small angle scattering.
The post represents an excellent opportunity for a young
postdoctoral scientist to develop expertise, broaden
their experience and interact with leading scientists
from around the world. Applications from more
experienced scientists who are able to obtain a
secondment period from their home institute will also be
considered. 
Language skills:
As an international research centre, we are particularly
keen to ensure that we also attract applicants from
outside France. You must have a sound knowledge of
English and be willing to learn French (a language
course will be paid for by the ILL). Knowledge of German
would be an advantage.
Notes:
Fixed-term contract of 5 years.
Further information can be obtained by contacting the
head of the Large Scale Structures Group: Dr. R Cubitt, Tel.
+33(0)4.76.20.72.15, e-mail: cub...@ill.fr (please
do not send your application to this address) or via http://www.ill.fr/lss.
Benefits: 
  

[ccp4bb] Two postdoctoral positions at Department of Molecular Drug Research, University of Copenhagen

2011-11-07 Thread Michael Gajhede
1. Position

Job description
A postdoctoral position is available from 01.01.2012-31.12.2013. The successful 
candidate will focus on the identification of interaction partners of the 
histone demethylase PLU-1 and on structural studies of stable PLU-1 complexes 
involved in gene regulation. The project will be conducted within the framework 
of the University of Copenhagen Programme of Excellence on Epigenetics, a 
programme involving research groups at the Biotech Research and Innovation 
Centre (BRIC) and the Department of Medicinal Chemistry.  Experience with 
recombinant protein expression, protein purification and biochemical 
characterisation of mammalian intracellular proteins will be an advantage. 
Interest in the application of scattering methods such as X-ray Crystallography 
and Small Angle X-ray Scattering (SAXS) is also required, but no prior 
experience with these techniques is necessary.

Terms of employment
Terms of appointment and payment are in accordance with the agreement between 
the Danish Ministry of Finance and the Danish Federation of Professional 
Associations.

Applying
Send your application via http://www.ku.dk/stillinger/VIP (select 'send 
ansøgning' at the bottom of the page). The deadline for applying is 1 December 
2011.

Questions
Further information about the position and the project are available from 
Professor Michael Gajhede, Biostructural Research 
(www.farma.ku.dk/BRhttp://www.farma.ku.dk/BR), Department of Molecular Drug 
Research, University of Copenhagen, tel. 35336407, email 
m...@farma.ku.dkmailto:m...@farma.ku.dk.

The University of Copenhagen wishes to reflect the diversity of society and 
welcomes applications from all qualified candidates regardless of personal 
background.

2. Position

Job description
A postdoctoral position is available from 15.12.2011-31.6.2013. The successful 
candidate will focus on in-vitro characterisation, structural studies and 
inhibitor identification of the chromatin modifying TET proteins. The project 
will be conducted within the framework of the University of Copenhagen 
Programme of Excellence on Epigenetics, a programme involving research groups 
at the Biotech Research and Innovation Centre (BRIC) and the Department of 
Medicinal Chemistry. Experience with recombinant protein expression, protein 
purification and biochemical characterisation of mammalian intracellular 
proteins will be an advantage.

Terms of employment
Terms of appointment and payment are in accordance with the agreement between 
the Danish Ministry of Finance and the Danish Federation of Professional 
Associations.

Applying
Send your application via http://www.ku.dk/stillinger/VIP (select 'send 
ansøgning' at the bottom of the page). The deadline for applying is 1 December 
2011.

Questions
Further information about the position and the project are available from 
Professor Michael Gajhede, Biostructural Research 
(www.farma.ku.dk/BRhttp://www.farma.ku.dk/BR), Department of Molecular Drug 
Research, University of Copenhagen, tel. 35336407, email 
m...@farma.ku.dkmailto:m...@farma.ku.dk.

The University of Copenhagen wishes to reflect the diversity of society and 
welcomes applications from all qualified candidates regardless of personal 
background.


Professor Michael Gajhede
Department of Medicinal Chemistry
University of Copenhagen
Jagtvej 162
DK-2100 Copenhagen Ø
Denmark
Phone: +45 35336407
Email: m...@farma.ku.dk.dkmailto:m...@farma.ku.dk.dk



Re: [ccp4bb] Crystalization in low PH

2011-11-07 Thread Gloria Borgstahl
Glutaraldehyde works best at low pH

On Mon, Nov 7, 2011 at 8:40 AM, Ed Pozharski epozh...@umaryland.edu wrote:
 On Mon, 2011-11-07 at 05:19 +, Sam Arnosti wrote:
 Hi everyone

 I have a protein that is extraordinarily stable at PH=3.0 or even 2.0.

 I want to crystallize it in the  low PH and compare the differences between 
 the crystals in regular PH and low PH.

 I was wondering how people set up the boxes in low PH, as usual buffers are 
 mostly less acidic.

 Regards

 Sam

 Not clear if you already have crystals at regular pH, but if you do,
 you may consider direct transfer to lower pH.  Of course, crystals may
 dissolve, which you could possibly prevent by cross-linking with
 glutaraldehyde.  Three caveats:
 a) If lattice is incompatible with lower pH, even with cross-linking the
 resolution may sink to essentially useless levels
 b) I have no idea if the cross-linking will not be disrupted at really
 low pH, perhaps someone else can comment on that
 c) the 3rd reviewer can always say that lattice forces could have
 prevented a conformational change.  But same goes for direct
 crystallization at low pH (but caries less weight).

 --
 I'd jump in myself, if I weren't so good at whistling.
                               Julian, King of Lemurs



[ccp4bb] Posting

2011-11-07 Thread Debasish Chattopadhyay
I would like to see the electron density map (2Fo-FC, Fo-Fc, omit map) for 
ligands on 2-fold symmetry in protein structure.  If any of you can send some 
images I will appreciate it.

Thanks


Debasish

Debasish Chattopadhyay, Ph.D.
University of Alabama at Birmingham


[ccp4bb] about .ins file for SHELXD

2011-11-07 Thread Lu Yu
Hi,
I was trying to use SHELXD to solve peptide structure. But I got stuck in
the input .ins file, and I need some advice.

In the .ins file,
TITLE
CELL
ZERR
LATT
SYMM
SFAC C H N O
*UNIT*
*FIND*
*PLOP*
NTRY
HKLF
END

A rough estimate, there will be 62 C, 122 H, 14 N, 32 O in one unit cell.

1 for *UNIT* command, *should the following nums be the num of atoms(C H N
O) per unit cell and multiplied by 4*? 62x4 for C, 122x4 for H, etc?

2 for *FIND*, the manual says '*estimated num of sites within 20% of true
number*', so *should it be 20% of the total number of atoms in one unit
cell?* (62+122+14+32)x 20%?

3 for *PLOP*, the manual says *'# of peaks to start with in each cycle,
Peaks are then eliminated one at a time until either the correlation
coefficient cannot be increased any more or 50% of the peaks have been
eliminated' * '*one should specify more than the expected number of atoms
because this procedure involves the elimination of the 'wrong' atoms*'
which I don't fully understand,* should the following nums be bigger than
the total num of atoms in one unit cell?

*Thanks in advance!

Lu


Re: [ccp4bb] phaser

2011-11-07 Thread Eleanor Dodson
The new Phaser GUI does not seem to let me reset the number of clashes 
for the packing search?

Is there something I have missed?
Eleanor


[ccp4bb] about .ins file for SHELXD

2011-11-07 Thread Lu Yu
Hi,
I was trying to use SHELXD to solve peptide structure. But I got stuck in
the input .ins file, and I need some advice.

In the .ins file,
TITLE
CELL
ZERR
LATT
SYMM
SFAC C H N O
*UNIT*
*FIND*
*PLOP*
NTRY
HKLF
END

A rough estimate, there will be 62 C, 122 H, 14 N, 32 O in one unit cell.

1 for *UNIT* command, *should the following nums be the num of atoms(C H N
O) per unit cell and multiplied by 4*? 62x4 for C, 122x4 for H, etc?

2 for *FIND*, the manual says '*estimated num of sites within 20% of true
number*', so *should it be 20% of the total number of atoms in one unit
cell?* (62+122+14+32)x 20%?

3 for *PLOP*, the manual says *'# of peaks to start with in each cycle,
Peaks are then eliminated one at a time until either the correlation
coefficient cannot be increased any more or 50% of the peaks have been
eliminated' * '*one should specify more than the expected number of atoms
because this procedure involves the elimination of the 'wrong' atoms*'
which I don't fully understand,* should the following nums be bigger than
the total num of atoms in one unit cell?

*Thanks in advance!

Lu


Re: [ccp4bb] phaser

2011-11-07 Thread Randy Read
Hi Eleanor,

I think you should find it in the Additional Parameters section, second line, 
labelled Packing criterion.  The default (chosen largely because you had been 
asking for something like this!) is to allow a number of clashes equal to 5% of 
the number of residues.

Let me know if it isn't there.  It's possible I've got a different version of 
the GUI on my machine...

Randy

On 7 Nov 2011, at 16:42, Eleanor Dodson wrote:

 The new Phaser GUI does not seem to let me reset the number of clashes for 
 the packing search?
 Is there something I have missed?
 Eleanor

--
Randy J. Read
Department of Haematology, University of Cambridge
Cambridge Institute for Medical Research  Tel: + 44 1223 336500
Wellcome Trust/MRC Building   Fax: + 44 1223 336827
Hills RoadE-mail: rj...@cam.ac.uk
Cambridge CB2 0XY, U.K.   www-structmed.cimr.cam.ac.uk


Re: [ccp4bb] about .ins file for SHELXD

2011-11-07 Thread George M. Sheldrick
UNIT specifies the number of atoms of each type in the unit-cell. For such
'small-molecule' problems you should try to get the numbers of heavier atoms
correct, if only CHNO are present any numbers will do.

For such problems I recommend setting FIND to about 70% of the number of atoms
(excluding H) in the asymmetric unit.

The first PLOP number should be approximately the number of atoms (excluding H)
in the asymmetric unit. The second PLOP number should be about 1.2 times this
and the third about 1.4 times it (three PLOP cycles are enough). This allows
the 'peaklist optimization algorithm' to throw out some of the atoms.

You will need data to 1.2A or better (1.0 is much better than 1.2!). The data
should be as complete as possible.

The following comments apply to larger problems, your structures is probably 
small enough to ignore them (but you may still need a large NTRY).

I strongly recommend using the beta-test multiple-CPU version of shelxd. Direct
methods can be very computationally intensive for large structures. I recently
used it to solve a RNA with 1.0A data; on an 8-CPU machine it produced one
solution in a week. This version is faster even with one CPU, the standard 
version would have taken about 3 months. I have seen several cases that had one 
solution in 5 or more trials (NTRY).

If you expect a partly helical structure or have a good partial model and have
data to 2.1A or better I strongly recommend ARCIMBOLDO. If all else fails I 
would be happy to look at the data for you.

George
 




On Mon, Nov 07, 2011 at 11:31:54AM -0500, Lu Yu wrote:
 Hi,
 I was trying to use SHELXD to solve peptide structure. But I got stuck in the
 input .ins file, and I need some advice.
 
 In the .ins file,
 TITLE
 CELL
 ZERR
 LATT
 SYMM
 SFAC C H N O
 UNIT
 FIND
 PLOP
 NTRY
 HKLF
 END
 
 A rough estimate, there will be 62 C, 122 H, 14 N, 32 O in one unit cell.
 
 1 for UNIT command, should the following nums be the num of atoms(C H N O) per
 unit cell and multiplied by 4? 62x4 for C, 122x4 for H, etc?
 
 2 for FIND, the manual says 'estimated num of sites within 20% of true 
 number',
 so should it be 20% of the total number of atoms in one unit cell? (62+122+14
 +32)x 20%?
 
 3 for PLOP, the manual says '# of peaks to start with in each cycle, Peaks are
 then eliminated one at a time until either the correlation coefficient cannot
 be increased any more or 50% of the peaks have been eliminated'  'one should
 specify more than the expected number of atoms because this procedure involves
 the elimination of the 'wrong' atoms'  which I don't fully understand, should
 the following nums be bigger than the total num of atoms in one unit cell?
 
 Thanks in advance!
 
 Lu
 

-- 
Prof. George M. Sheldrick FRS
Dept. Structural Chemistry, 
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-22582


[ccp4bb] image compression

2011-11-07 Thread James Holton
At the risk of sounding like another poll, I have a pragmatic question 
for the methods development community:


Hypothetically, assume that there was a website where you could download 
the original diffraction images corresponding to any given PDB file, 
including early datasets that were from the same project, but because 
of smeary spots or whatever, couldn't be solved.  There might even be 
datasets with unknown PDB IDs because that particular project never 
did work out, or because the relevant protein sequence has been lost.  
Remember, few of these datasets will be less than 5 years old if we try 
to allow enough time for the original data collector to either solve it 
or graduate (and then cease to care).  Even for the final dataset, 
there will be a delay, since the half-life between data collection and 
coordinate deposition in the PDB is still ~20 months.  Plenty of time to 
forget.  So, although the images were archived (probably named test 
and in a directory called john) it may be that the only way to figure 
out which PDB ID is the right answer is by processing them and 
comparing to all deposited Fs.  Assume this was done.  But there will 
always be some datasets that don't match any PDB.  Are those 
interesting?  What about ones that can't be processed?  What about ones 
that can't even be indexed?  There may be a lot of those!  
(hypothetically, of course).


Anyway, assume that someone did go through all the trouble to make these 
datasets available for download, just in case they are interesting, 
and annotated them as much as possible.  There will be about 20 datasets 
for any given PDB ID.


Now assume that for each of these datasets this hypothetical website has 
two links, one for the raw data, which will average ~2 GB per wedge 
(after gzip compression, taking at least ~45 min to download), and a 
second link for a lossy compressed version, which is only ~100 
MB/wedge (2 min download).  When decompressed, the images will visually 
look pretty much like the originals, and generally give you very similar 
Rmerge, Rcryst, Rfree, I/sigma, anomalous differences, and all other 
statistics when processed with contemporary software.  Perhaps a bit 
worse.  Essentially, lossy compression is equivalent to adding noise to 
the images.


Which one would you try first?  Does lossy compression make it easier to 
hunt for interesting datasets?  Or is it just too repugnant to have 
modified the data in any way shape or form ... after the detector 
manufacturer's software has corrected it?  Would it suffice to simply 
supply a couple of example images for download instead?


-James Holton
MAD Scientist


Re: [ccp4bb] image compression

2011-11-07 Thread Herbert J. Bernstein

This is a very good question.  I would suggest that both versions
of the old data are useful.  If was is being done is simple validation
and regeneration of what was done before, then the lossy compression
should be fine in most instances.  However, when what is being
done hinges on the really fine details -- looking for lost faint
spots just peeking out from the background, looking at detailed
peak profiles -- then the lossless compression version is the
better choice.  The annotation for both sets should be the same.
The difference is in storage and network bandwidth.

Hopefully the fraud issue will never again rear its ugly head,
but if it should, then having saved the losslessly compressed
images might prove to have been a good idea.

To facilitate experimentation with the idea, if there is agreement
on the particular lossy compression to be used, I would be happy
to add it as an option in CBFlib.  Right now all the compressions
we have are lossless.

Regards,
  Herbert


=
 Herbert J. Bernstein, Professor of Computer Science
   Dowling College, Kramer Science Center, KSC 121
Idle Hour Blvd, Oakdale, NY, 11769

 +1-631-244-3035
 y...@dowling.edu
=

On Mon, 7 Nov 2011, James Holton wrote:

At the risk of sounding like another poll, I have a pragmatic question for 
the methods development community:


Hypothetically, assume that there was a website where you could download the 
original diffraction images corresponding to any given PDB file, including 
early datasets that were from the same project, but because of smeary spots 
or whatever, couldn't be solved.  There might even be datasets with unknown 
PDB IDs because that particular project never did work out, or because the 
relevant protein sequence has been lost.  Remember, few of these datasets 
will be less than 5 years old if we try to allow enough time for the original 
data collector to either solve it or graduate (and then cease to care).  Even 
for the final dataset, there will be a delay, since the half-life between 
data collection and coordinate deposition in the PDB is still ~20 months. 
Plenty of time to forget.  So, although the images were archived (probably 
named test and in a directory called john) it may be that the only way to 
figure out which PDB ID is the right answer is by processing them and 
comparing to all deposited Fs.  Assume this was done.  But there will always 
be some datasets that don't match any PDB.  Are those interesting?  What 
about ones that can't be processed?  What about ones that can't even be 
indexed?  There may be a lot of those!  (hypothetically, of course).


Anyway, assume that someone did go through all the trouble to make these 
datasets available for download, just in case they are interesting, and 
annotated them as much as possible.  There will be about 20 datasets for any 
given PDB ID.


Now assume that for each of these datasets this hypothetical website has two 
links, one for the raw data, which will average ~2 GB per wedge (after gzip 
compression, taking at least ~45 min to download), and a second link for a 
lossy compressed version, which is only ~100 MB/wedge (2 min download). 
When decompressed, the images will visually look pretty much like the 
originals, and generally give you very similar Rmerge, Rcryst, Rfree, 
I/sigma, anomalous differences, and all other statistics when processed with 
contemporary software.  Perhaps a bit worse.  Essentially, lossy compression 
is equivalent to adding noise to the images.


Which one would you try first?  Does lossy compression make it easier to hunt 
for interesting datasets?  Or is it just too repugnant to have modified 
the data in any way shape or form ... after the detector manufacturer's 
software has corrected it?  Would it suffice to simply supply a couple of 
example images for download instead?


-James Holton
MAD Scientist



[ccp4bb] Job IRC2749: Postdoc position at LANL in neutron protein crystallography

2011-11-07 Thread Suzanne Zoe Fisher
Detailed Description

Immediate postdoctoral positions are available at the neutron Protein
Crystallography Station of the Los Alamos National Laboratory.  We are
looking for crystallographers and/or biochemists to conduct research in the
protein structure-function field, with a focus on using joint neutron and
X_ray crystallography approaches.  Projects include, but are not limited
to, studies in mechanistic enzymology, rational protein engineering, and
computational biology (QM, QM/MM, MD).  We are particularly interested in
individuals with a strong background in protein expression, purification
and crystallization of membrane proteins as part of a collaborative effort
aimed at determining neutron structures of these targets.  An incumbent
should be proficient in all aspects of macromolecular crystallography,
including protein production, crystallization, data collection and
processing, structure refinement and  analysis.  The postdoctoral
associates will be trained in neutron protein crystallography techniques
and joint X-ray/neutron refinement approaches.  The postdoctoral
researchers will be expected to participate in the PCS user program
part-time as well as the collaborative efforts of the team.  A Ph.D. in
biochemistry a related discipline is required. Knowledge of mechanistic
biochemistry and enzymology is desired. Eligible persons should be within 5
years of receiving their Ph.D. degree.

*Responsibilities:*

1.  Work on projects to support the PCS user program and DOE-OBER
mission areas.

2.  Develop and work on an independent research project in the field of
joint X-ray and neutron crystallography.

3.  Work as part of a collaborative team of interdisciplinary
scientists.

4.  Publish and present results at both internal and external
scientific meetings.

5.  Foster and establish project collaborations within and outside LANL.

6.  Incorporate and develop new scientific advances into research and
processes.


Job Requirements

*Minimum Job Requirements:*



1. Expertise in all aspects of single crystal macromolecular
diffraction studies, including a proven track record of developing and
optimizing crystallization for X-ray structure determination.

2. Experience with protein production, characterization and biochemical
methods (protein expression, purification, quantification, enzymatic
assays).

3. Expertise in some of solution biophysical methods such as
differential scanning calorimetry, mass spectrometry, enzyme kinetics
assays, UV-Vis and fluorescence spectroscopy.

4. Familiarity with protein modeling.

5.Have a proven track record of problem solving, good organization,
strong communication, multi-tasking and teamwork skills and the ability to
collaborate within multidisciplinary teams.

6. Committed to high quality work with a strong user focus.

7. Have a proven track record of strong written, oral communication,
and presentation skills.

*Education:  *Ph.D. in chemistry, biochemistry or macromolecular
crystallography in the last 5 years.
Additional Details

*Notes to Applicants:  *Interested scientists should contact Dr. A.
Kovalevsky, Bioscience Division, Los Alamos National Laboratory, e-mail:
a...@lanl.gov, or Dr. S.Z. Fisher, Bioscience Division, Los Alamos National
Laboratory, e-mail: zfis...@lanl.gov a...@lanl.gov.  Send your CV and the
list of publications.

*Pre-Employment Drug Test:* The Laboratory requires successful applicants
to complete a pre-employment drug test and maintains a substance abuse
policy that includes random drug testing.

Candidates may be considered for a Director's Fellowship and outstanding
candidates may be considered for the prestigious Marie Curie, Richard P.
Feynman, J. Robert Oppenheimer,  or Frederick Reines Fellowships.



For general information refer to the Postdoctoral
Programhttp://www.lanl.gov/science/postdocs/
 page.

*Equal Opportunity:** *Los Alamos National Laboratory is an equal
opportunity employer and supports a diverse and inclusive workforce.  We
welcome and encourage applications from the broadest possible range of
qualified candidates.  The Laboratory is also committed to making our
workplace accessible to individuals with disabilities and will provide
reasonable accommodations, upon request, for individuals to participate in
the application and hiring process.  To request such an accommodation,
please send an email to applyh...@lanl.gov or call 1-505-665-5627.

*
*


[ccp4bb] Job IRC2749: Postdoc position at LANL in neutron protein crystallography

2011-11-07 Thread Suzanne Zoe Fisher
 Detailed Description

Immediate postdoctoral positions are available at the neutron Protein
Crystallography Station of the Los Alamos National Laboratory.  We are
looking for crystallographers and/or biochemists to conduct research in the
protein structure-function field, with a focus on using joint neutron and
X_ray crystallography approaches.  Projects include, but are not limited
to, studies in mechanistic enzymology, rational protein engineering, and
computational biology (QM, QM/MM, MD).  We are particularly interested in
individuals with a strong background in protein expression, purification
and crystallization of membrane proteins as part of a collaborative effort
aimed at determining neutron structures of these targets.  An incumbent
should be proficient in all aspects of macromolecular crystallography,
including protein production, crystallization, data collection and
processing, structure refinement and  analysis.  The postdoctoral
associates will be trained in neutron protein crystallography techniques
and joint X-ray/neutron refinement approaches.  The postdoctoral
researchers will be expected to participate in the PCS user program
part-time as well as the collaborative efforts of the team.  A Ph.D. in
biochemistry a related discipline is required. Knowledge of mechanistic
biochemistry and enzymology is desired. Eligible persons should be within 5
years of receiving their Ph.D. degree.

*Responsibilities:*

1.  Work on projects to support the PCS user program and DOE-OBER
mission areas.

2.  Develop and work on an independent research project in the field of
joint X-ray and neutron crystallography.

3.  Work as part of a collaborative team of interdisciplinary
scientists.

4.  Publish and present results at both internal and external
scientific meetings.

5.  Foster and establish project collaborations within and outside LANL.

6.  Incorporate and develop new scientific advances into research and
processes.


Job Requirements

*Minimum Job Requirements:*



1. Expertise in all aspects of single crystal macromolecular
diffraction studies, including a proven track record of developing and
optimizing crystallization for X-ray structure determination.

2. Experience with protein production, characterization and biochemical
methods (protein expression, purification, quantification, enzymatic
assays).

3. Expertise in some of solution biophysical methods such as
differential scanning calorimetry, mass spectrometry, enzyme kinetics
assays, UV-Vis and fluorescence spectroscopy.

4. Familiarity with protein modeling.

5.Have a proven track record of problem solving, good organization,
strong communication, multi-tasking and teamwork skills and the ability to
collaborate within multidisciplinary teams.

6. Committed to high quality work with a strong user focus.

7. Have a proven track record of strong written, oral communication,
and presentation skills.

*Education:  *Ph.D. in chemistry, biochemistry or macromolecular
crystallography in the last 5 years.
Additional Details

*Notes to Applicants:  *Interested scientists should contact Dr. A.
Kovalevsky, Bioscience Division, Los Alamos National Laboratory, e-mail:
a...@lanl.gov, or Dr. S.Z. Fisher, Bioscience Division, Los Alamos National
Laboratory, e-mail: zfis...@lanl.gov a...@lanl.gov.  Send your CV and the
list of publications.

*Pre-Employment Drug Test:* The Laboratory requires successful applicants
to complete a pre-employment drug test and maintains a substance abuse
policy that includes random drug testing.

Candidates may be considered for a Director's Fellowship and outstanding
candidates may be considered for the prestigious Marie Curie, Richard P.
Feynman, J. Robert Oppenheimer,  or Frederick Reines Fellowships.



For general information refer to the Postdoctoral
Programhttp://www.lanl.gov/science/postdocs/
 page.

*Equal Opportunity:** *Los Alamos National Laboratory is an equal
opportunity employer and supports a diverse and inclusive workforce.  We
welcome and encourage applications from the broadest possible range of
qualified candidates.  The Laboratory is also committed to making our
workplace accessible to individuals with disabilities and will provide
reasonable accommodations, upon request, for individuals to participate in
the application and hiring process.  To request such an accommodation,
please send an email to applyh...@lanl.gov or call 1-505-665-5627.


Re: [ccp4bb] Archiving Images for PDB Depositions

2011-11-07 Thread mjvdwoerd

 Reluctantly I am going to add my 2 cents to the discussion, with various 
aspects in one e-mail.

- It is easy to overlook that our business is to answer 
biological/biochemical questions. This is what you (generally) get grants for 
to do (showing that these questions are of critical importance in your ability 
to do science). Crystallography is one tool that we use to acquire evidence to 
answer questions. The time that you could get a Nobel prize for doing a 
structure or a PhD for doing a structure is gone. Even writing a publication 
with just a structure is now not as common anymore as it used to be. So the 
biochemistry drives crystallography. It is not reasonable to say that once 
you have collected data and you don't publish the data for 5 years, you are no 
longer interested. What that generally means is that the rest of science is 
not cooperating. In short: I would be against a strict rule for mandatory 
deposition of raw data, even after a long time. An example: I have data sets 
here with low resolution data (~10A) presumably of protein structures that have 
known structures for prokaryotes, but not for eukaryotes and it would be 
exciting if we could prove (or disprove) that they look the same. The problem, 
apart from resolution, is that the spots are so few and fuzzy that I cannot 
index the images. The main reason why I save the images is that if/when someone 
comes to me to say that they think they have made better crystals, we have 
something to compare. (Thanks to Gerard B. for encouragement to write this item 
:-)

- For those that think that we have come to the end of development in 
crystallography, James Holton (thank you) has described nicely why we should 
not think this. We are all happy if our model generates an R-factor of 20%. 
Even small molecule crystallographers would wave that away in an instant as 
inadequate. However, everybody has come to accept that this is fine for 
protein crystallography. It would be better if our models were more consistent 
with the experimental data. How could we make such models without access to 
lots of data? As a student I was always taught (when asking why 20% is actually 
good) that we don't (for example) model solvent. Why not? It is not easy. If 
we did, would the 20% go down to 3%? I am guessing not, there are other errors 
that come into play. 

- Gerard K. has eloquently spoken about cost and effort. Since I maintain a 
small (local) archive of images, I can affirm his words: a large-capacity disk 
is inexpensive ($100). A box that the disk sits in is inexpensive ($1000). A 
second box that sits in a different building, away for security reasons) that 
holds the backup, is inexpensive ($1400, with 4 disks). The infrastructure to 
run these boxes (power, fiber optics, boxes in between) is slightly more 
expensive. What is *really* expensive is people maintaining everything. It was 
a huge surprise to me (and my boss) how much time and effort it takes to 
annotate all data sets, rename them appropriately and file them away in a 
logical place so that anyone (who understands the scheme) can find them back. 
Therefore (!) the reason why this should be centralized is that the cost per 
data set stored goes down - it is more efficient. One person can process 
several (many, if largely automated) data sets per day. It is also of interest 
that we locally (2-5 people for a project) may not agree on what exactly should 
be stored. Therefore there is no hope that we can find consensus in the world, 
but we CAN get a reasonably compromise. But it is tough: I have heard the 
argument that data for published structures should be kept in case someone 
wants to see and/or go back, while I have also heard the argument that once 
published it is signed, sealed and delivered and it can go, while UNpublished 
data should be preserved because eventually it hopefully will get to 
publication. Each argument is reasonably sensible, but the conclusions are 
opposite. (I maintain both classes of data sets.)

- Granting agencies in the US generally require that you archive scientific 
data. What is not yet clear is whether they would be willing to pay for a 
centralized facility that would do that. After all, it is more exciting to NIH 
to give money for the study of a disease than it is to store data. But if the 
argument were made that each grant(ee) would be more efficient and could apply 
more money towards the actual problem, this might convince them. For that we 
would need a reasonable consensus what we want and why. More power to John. H 
and The Committee.

Thanks to complete silence on the BB today I am finally caught up reading!

Mark van der Woerd
 



 

 

-Original Message-
From: James Holton jmhol...@lbl.gov
To: CCP4BB CCP4BB@JISCMAIL.AC.UK
Sent: Tue, Nov 1, 2011 11:07 am
Subject: Re: [ccp4bb] Archiving Images for PDB Depositions


On general scientific principles the reasons for archiving raw data 
all boil down to one thing: there 

Re: [ccp4bb] image compression

2011-11-07 Thread James Holton
So far, all I really have is a proof of concept compression algorithm here:
http://bl831.als.lbl.gov/~jamesh/lossy_compression/

Not exactly portable since you need ffmpeg and the x264 libraries
set up properly.  The latter seems to be constantly changing things
and breaking the former, so I'm not sure how future proof my
algorithm is.

Something that caught my eye recently was fractal compression,
particularly since FIASCO has been part of the NetPBM package for
about 10 years now.  Seems to give comparable compression vs quality
as x264 (to my eye), but I'm presently wondering if I'd be wasting my
time developing this further?  Will the crystallographic world simply
turn up its collective nose at lossy images?  Even if it means waiting
6 years for Nielsen's Law to make up the difference in network
bandwidth?

-James Holton
MAD Scientist

On Mon, Nov 7, 2011 at 10:01 AM, Herbert J. Bernstein
y...@bernstein-plus-sons.com wrote:
 This is a very good question.  I would suggest that both versions
 of the old data are useful.  If was is being done is simple validation
 and regeneration of what was done before, then the lossy compression
 should be fine in most instances.  However, when what is being
 done hinges on the really fine details -- looking for lost faint
 spots just peeking out from the background, looking at detailed
 peak profiles -- then the lossless compression version is the
 better choice.  The annotation for both sets should be the same.
 The difference is in storage and network bandwidth.

 Hopefully the fraud issue will never again rear its ugly head,
 but if it should, then having saved the losslessly compressed
 images might prove to have been a good idea.

 To facilitate experimentation with the idea, if there is agreement
 on the particular lossy compression to be used, I would be happy
 to add it as an option in CBFlib.  Right now all the compressions
 we have are lossless.

 Regards,
  Herbert


 =
  Herbert J. Bernstein, Professor of Computer Science
   Dowling College, Kramer Science Center, KSC 121
        Idle Hour Blvd, Oakdale, NY, 11769

                 +1-631-244-3035
                 y...@dowling.edu
 =

 On Mon, 7 Nov 2011, James Holton wrote:

 At the risk of sounding like another poll, I have a pragmatic question
 for the methods development community:

 Hypothetically, assume that there was a website where you could download
 the original diffraction images corresponding to any given PDB file,
 including early datasets that were from the same project, but because of
 smeary spots or whatever, couldn't be solved.  There might even be datasets
 with unknown PDB IDs because that particular project never did work out,
 or because the relevant protein sequence has been lost.  Remember, few of
 these datasets will be less than 5 years old if we try to allow enough time
 for the original data collector to either solve it or graduate (and then
 cease to care).  Even for the final dataset, there will be a delay, since
 the half-life between data collection and coordinate deposition in the PDB
 is still ~20 months. Plenty of time to forget.  So, although the images were
 archived (probably named test and in a directory called john) it may be
 that the only way to figure out which PDB ID is the right answer is by
 processing them and comparing to all deposited Fs.  Assume this was done.
  But there will always be some datasets that don't match any PDB.  Are those
 interesting?  What about ones that can't be processed?  What about ones that
 can't even be indexed?  There may be a lot of those!  (hypothetically, of
 course).

 Anyway, assume that someone did go through all the trouble to make these
 datasets available for download, just in case they are interesting, and
 annotated them as much as possible.  There will be about 20 datasets for any
 given PDB ID.

 Now assume that for each of these datasets this hypothetical website has
 two links, one for the raw data, which will average ~2 GB per wedge (after
 gzip compression, taking at least ~45 min to download), and a second link
 for a lossy compressed version, which is only ~100 MB/wedge (2 min
 download). When decompressed, the images will visually look pretty much like
 the originals, and generally give you very similar Rmerge, Rcryst, Rfree,
 I/sigma, anomalous differences, and all other statistics when processed with
 contemporary software.  Perhaps a bit worse.  Essentially, lossy compression
 is equivalent to adding noise to the images.

 Which one would you try first?  Does lossy compression make it easier to
 hunt for interesting datasets?  Or is it just too repugnant to have
 modified the data in any way shape or form ... after the detector
 manufacturer's software has corrected it?  Would it suffice to simply
 supply a couple of example images for download instead?

 -James Holton
 MAD Scientist




Re: [ccp4bb] image compression

2011-11-07 Thread Herbert J. Bernstein

Dear James,

  You are _not_ wasting your time.  Even if the lossy compression ends
up only being used to stage preliminary images forward on the net while
full images slowly work their way forward, having such a compression
that preserves the crystallography in the image will be an important
contribution to efficient workflows.  Personally I suspect that
such images will have more important, uses, e.g. facilitating
real-time monitoring of experiments using detectors providing
full images at data rates that simply cannot be handled without
major compression.  We are already in that world.  The reason that
the Dectris images use Andy Hammersley's byte-offset compression,
rather than going uncompressed or using CCP4 compression is that
in January 2007 we were sitting right on the edge of a nasty 
CPU-performance/disk bandwidth tradeoff, and the byte-offset

compression won the competition.   In that round a lossless
compression was sufficient, but just barely.  In the future,
I am certain some amount of lossy compression will be
needed to sample the dataflow while the losslessly compressed
images work their way through a very back-logged queue to the disk.

  In the longer term, I can see people working with lossy compressed
images for analysis of massive volumes of images to select the
1% to 10% that will be useful in a final analysis, and may need
to be used in a lossless mode.  If you can reject 90% of the images
with a fraction of the effort needed to work with the resulting
10% of good images, you have made a good decision.

  An then there is the inevitable need to work with images on
portable devices with limited storage over cell and WIFI networks. ...

  I would not worry about upturned noses.  I would worry about
the engineering needed to manage experiments.  Lossy compression
can be an important part of that engineering.

  Regards,
Herbert


At 4:09 PM -0800 11/7/11, James Holton wrote:

So far, all I really have is a proof of concept compression algorithm here:
http://bl831.als.lbl.gov/~jamesh/lossy_compression/

Not exactly portable since you need ffmpeg and the x264 libraries
set up properly.  The latter seems to be constantly changing things
and breaking the former, so I'm not sure how future proof my
algorithm is.

Something that caught my eye recently was fractal compression,
particularly since FIASCO has been part of the NetPBM package for
about 10 years now.  Seems to give comparable compression vs quality
as x264 (to my eye), but I'm presently wondering if I'd be wasting my
time developing this further?  Will the crystallographic world simply
turn up its collective nose at lossy images?  Even if it means waiting
6 years for Nielsen's Law to make up the difference in network
bandwidth?

-James Holton
MAD Scientist

On Mon, Nov 7, 2011 at 10:01 AM, Herbert J. Bernstein
y...@bernstein-plus-sons.com wrote:

 This is a very good question.  I would suggest that both versions
 of the old data are useful.  If was is being done is simple validation
 and regeneration of what was done before, then the lossy compression
 should be fine in most instances.  However, when what is being
 done hinges on the really fine details -- looking for lost faint
 spots just peeking out from the background, looking at detailed
 peak profiles -- then the lossless compression version is the
 better choice.  The annotation for both sets should be the same.
 The difference is in storage and network bandwidth.

 Hopefully the fraud issue will never again rear its ugly head,
 but if it should, then having saved the losslessly compressed
 images might prove to have been a good idea.

 To facilitate experimentation with the idea, if there is agreement
 on the particular lossy compression to be used, I would be happy
 to add it as an option in CBFlib.  Right now all the compressions

  we have are lossless.


 Regards,
  Herbert


 =
  Herbert J. Bernstein, Professor of Computer Science
   Dowling College, Kramer Science Center, KSC 121
Idle Hour Blvd, Oakdale, NY, 11769

 +1-631-244-3035
 y...@dowling.edu
 =

 On Mon, 7 Nov 2011, James Holton wrote:


 At the risk of sounding like another poll, I have a pragmatic question
 for the methods development community:

 Hypothetically, assume that there was a website where you could download
 the original diffraction images corresponding to any given PDB file,
 including early datasets that were from the same project, but because of
 smeary spots or whatever, couldn't be solved.  There might even be datasets
 with unknown PDB IDs because that particular project never did work out,
 or because the relevant protein sequence has been lost.  Remember, few of
 these datasets will be less than 5 years old if we try to allow enough time
 for the original data collector to either solve it or graduate (and then
 cease to care).  Even for the 

Re: [ccp4bb] image compression

2011-11-07 Thread Frank von Delft
I'll second that...  can't remember anybody on the barricades about 
corrected CCD images, but they've been just so much more practical.


Different kind of problem, I know, but equivalent situation:  the people 
to ask are not the purists, but the ones struggling with the huge 
volumes of data.  I'll take the lossy version any day if it speeds up 
real-time evaluation of data quality, helps me browse my datasets, and 
allows me to do remote but intelligent data collection.


phx.



On 08/11/2011 02:22, Herbert J. Bernstein wrote:

Dear James,

You are _not_ wasting your time.  Even if the lossy compression ends
up only being used to stage preliminary images forward on the net while
full images slowly work their way forward, having such a compression
that preserves the crystallography in the image will be an important
contribution to efficient workflows.  Personally I suspect that
such images will have more important, uses, e.g. facilitating
real-time monitoring of experiments using detectors providing
full images at data rates that simply cannot be handled without
major compression.  We are already in that world.  The reason that
the Dectris images use Andy Hammersley's byte-offset compression,
rather than going uncompressed or using CCP4 compression is that
in January 2007 we were sitting right on the edge of a nasty
CPU-performance/disk bandwidth tradeoff, and the byte-offset
compression won the competition.   In that round a lossless
compression was sufficient, but just barely.  In the future,
I am certain some amount of lossy compression will be
needed to sample the dataflow while the losslessly compressed
images work their way through a very back-logged queue to the disk.

In the longer term, I can see people working with lossy compressed
images for analysis of massive volumes of images to select the
1% to 10% that will be useful in a final analysis, and may need
to be used in a lossless mode.  If you can reject 90% of the images
with a fraction of the effort needed to work with the resulting
10% of good images, you have made a good decision.

An then there is the inevitable need to work with images on
portable devices with limited storage over cell and WIFI networks. ...

I would not worry about upturned noses.  I would worry about
the engineering needed to manage experiments.  Lossy compression
can be an important part of that engineering.

Regards,
  Herbert


At 4:09 PM -0800 11/7/11, James Holton wrote:

So far, all I really have is a proof of concept compression algorithm here:
http://bl831.als.lbl.gov/~jamesh/lossy_compression/

Not exactly portable since you need ffmpeg and the x264 libraries
set up properly.  The latter seems to be constantly changing things
and breaking the former, so I'm not sure how future proof my
algorithm is.

Something that caught my eye recently was fractal compression,
particularly since FIASCO has been part of the NetPBM package for
about 10 years now.  Seems to give comparable compression vs quality
as x264 (to my eye), but I'm presently wondering if I'd be wasting my
time developing this further?  Will the crystallographic world simply
turn up its collective nose at lossy images?  Even if it means waiting
6 years for Nielsen's Law to make up the difference in network
bandwidth?

-James Holton
MAD Scientist

On Mon, Nov 7, 2011 at 10:01 AM, Herbert J. Bernstein
y...@bernstein-plus-sons.com  wrote:

  This is a very good question.  I would suggest that both versions
  of the old data are useful.  If was is being done is simple validation
  and regeneration of what was done before, then the lossy compression
  should be fine in most instances.  However, when what is being
  done hinges on the really fine details -- looking for lost faint
  spots just peeking out from the background, looking at detailed
  peak profiles -- then the lossless compression version is the
  better choice.  The annotation for both sets should be the same.
  The difference is in storage and network bandwidth.

  Hopefully the fraud issue will never again rear its ugly head,
  but if it should, then having saved the losslessly compressed
  images might prove to have been a good idea.

  To facilitate experimentation with the idea, if there is agreement
  on the particular lossy compression to be used, I would be happy
  to add it as an option in CBFlib.  Right now all the compressions

we have are lossless.

  Regards,
   Herbert


  =
   Herbert J. Bernstein, Professor of Computer Science
Dowling College, Kramer Science Center, KSC 121
 Idle Hour Blvd, Oakdale, NY, 11769

  +1-631-244-3035
  y...@dowling.edu
  =

  On Mon, 7 Nov 2011, James Holton wrote:


  At the risk of sounding like another poll, I have a pragmatic question
  for the methods development community:

  Hypothetically, assume that there was a 

[ccp4bb] weight matrix and R-FreeR gap optimization

2011-11-07 Thread james09 pruza
Dear ccp4bbers,

I wonder if someone can help me defining proper weight matrix term in
Refmac5 to lower the R-FreeR gap. The log file indicates weight matrix of
1.98 with a gap of 7. Thanks for suggestions in advance.
James


Re: [ccp4bb] image compression

2011-11-07 Thread Miguel Ortiz Lombardia
So the purists of speed seem to be more relevant than the purists of images.

We complain all the time about how many errors we have out there in our
experiments that we seemingly cannot account for. Yet, would we add
another source?

Sorry if I'm missing something serious here, but I cannot understand
this artificial debate. You can do useful remote data collection without
having look at *each* image.


Miguel


Le 08/11/2011 06:27, Frank von Delft a écrit :
 I'll second that...  can't remember anybody on the barricades about
 corrected CCD images, but they've been just so much more practical.
 
 Different kind of problem, I know, but equivalent situation:  the people
 to ask are not the purists, but the ones struggling with the huge
 volumes of data.  I'll take the lossy version any day if it speeds up
 real-time evaluation of data quality, helps me browse my datasets, and
 allows me to do remote but intelligent data collection.
 
 phx.
 
 
 
 On 08/11/2011 02:22, Herbert J. Bernstein wrote:
 Dear James,

 You are _not_ wasting your time.  Even if the lossy compression ends
 up only being used to stage preliminary images forward on the net while
 full images slowly work their way forward, having such a compression
 that preserves the crystallography in the image will be an important
 contribution to efficient workflows.  Personally I suspect that
 such images will have more important, uses, e.g. facilitating
 real-time monitoring of experiments using detectors providing
 full images at data rates that simply cannot be handled without
 major compression.  We are already in that world.  The reason that
 the Dectris images use Andy Hammersley's byte-offset compression,
 rather than going uncompressed or using CCP4 compression is that
 in January 2007 we were sitting right on the edge of a nasty
 CPU-performance/disk bandwidth tradeoff, and the byte-offset
 compression won the competition.   In that round a lossless
 compression was sufficient, but just barely.  In the future,
 I am certain some amount of lossy compression will be
 needed to sample the dataflow while the losslessly compressed
 images work their way through a very back-logged queue to the disk.

 In the longer term, I can see people working with lossy compressed
 images for analysis of massive volumes of images to select the
 1% to 10% that will be useful in a final analysis, and may need
 to be used in a lossless mode.  If you can reject 90% of the images
 with a fraction of the effort needed to work with the resulting
 10% of good images, you have made a good decision.

 An then there is the inevitable need to work with images on
 portable devices with limited storage over cell and WIFI networks. ...

 I would not worry about upturned noses.  I would worry about
 the engineering needed to manage experiments.  Lossy compression
 can be an important part of that engineering.

 Regards,
   Herbert


 At 4:09 PM -0800 11/7/11, James Holton wrote:
 So far, all I really have is a proof of concept compression
 algorithm here:
 http://bl831.als.lbl.gov/~jamesh/lossy_compression/

 Not exactly portable since you need ffmpeg and the x264 libraries
 set up properly.  The latter seems to be constantly changing things
 and breaking the former, so I'm not sure how future proof my
 algorithm is.

 Something that caught my eye recently was fractal compression,
 particularly since FIASCO has been part of the NetPBM package for
 about 10 years now.  Seems to give comparable compression vs quality
 as x264 (to my eye), but I'm presently wondering if I'd be wasting my
 time developing this further?  Will the crystallographic world simply
 turn up its collective nose at lossy images?  Even if it means waiting
 6 years for Nielsen's Law to make up the difference in network
 bandwidth?

 -James Holton
 MAD Scientist

 On Mon, Nov 7, 2011 at 10:01 AM, Herbert J. Bernstein
 y...@bernstein-plus-sons.com  wrote:
   This is a very good question.  I would suggest that both versions
   of the old data are useful.  If was is being done is simple
 validation
   and regeneration of what was done before, then the lossy compression
   should be fine in most instances.  However, when what is being
   done hinges on the really fine details -- looking for lost faint
   spots just peeking out from the background, looking at detailed
   peak profiles -- then the lossless compression version is the
   better choice.  The annotation for both sets should be the same.
   The difference is in storage and network bandwidth.

   Hopefully the fraud issue will never again rear its ugly head,
   but if it should, then having saved the losslessly compressed
   images might prove to have been a good idea.

   To facilitate experimentation with the idea, if there is agreement
   on the particular lossy compression to be used, I would be happy
   to add it as an option in CBFlib.  Right now all the compressions
 we have are lossless.
   Regards,
Herbert


   

Re: [ccp4bb] image compression

2011-11-07 Thread Jan Dohnalek
I think that real universal image depositions will not take off without a
newish type of compression that will speed up and ease up things.
Therefore the compression discussion is highly relevant - I would even
suggest to go to mathematicians and software engineers to provide
a highly efficient compression format for our type of data - our data sets
have some very typical repetitive features so they can be very likely
compressed as a whole set without loosing information (differential
compression in the series) but this needs experts ..


Jan Dohnalek


On Tue, Nov 8, 2011 at 8:19 AM, Miguel Ortiz Lombardia 
miguel.ortiz-lombar...@afmb.univ-mrs.fr wrote:

 So the purists of speed seem to be more relevant than the purists of
 images.

 We complain all the time about how many errors we have out there in our
 experiments that we seemingly cannot account for. Yet, would we add
 another source?

 Sorry if I'm missing something serious here, but I cannot understand
 this artificial debate. You can do useful remote data collection without
 having look at *each* image.


 Miguel


 Le 08/11/2011 06:27, Frank von Delft a écrit :
  I'll second that...  can't remember anybody on the barricades about
  corrected CCD images, but they've been just so much more practical.
 
  Different kind of problem, I know, but equivalent situation:  the people
  to ask are not the purists, but the ones struggling with the huge
  volumes of data.  I'll take the lossy version any day if it speeds up
  real-time evaluation of data quality, helps me browse my datasets, and
  allows me to do remote but intelligent data collection.
 
  phx.
 
 
 
  On 08/11/2011 02:22, Herbert J. Bernstein wrote:
  Dear James,
 
  You are _not_ wasting your time.  Even if the lossy compression ends
  up only being used to stage preliminary images forward on the net while
  full images slowly work their way forward, having such a compression
  that preserves the crystallography in the image will be an important
  contribution to efficient workflows.  Personally I suspect that
  such images will have more important, uses, e.g. facilitating
  real-time monitoring of experiments using detectors providing
  full images at data rates that simply cannot be handled without
  major compression.  We are already in that world.  The reason that
  the Dectris images use Andy Hammersley's byte-offset compression,
  rather than going uncompressed or using CCP4 compression is that
  in January 2007 we were sitting right on the edge of a nasty
  CPU-performance/disk bandwidth tradeoff, and the byte-offset
  compression won the competition.   In that round a lossless
  compression was sufficient, but just barely.  In the future,
  I am certain some amount of lossy compression will be
  needed to sample the dataflow while the losslessly compressed
  images work their way through a very back-logged queue to the disk.
 
  In the longer term, I can see people working with lossy compressed
  images for analysis of massive volumes of images to select the
  1% to 10% that will be useful in a final analysis, and may need
  to be used in a lossless mode.  If you can reject 90% of the images
  with a fraction of the effort needed to work with the resulting
  10% of good images, you have made a good decision.
 
  An then there is the inevitable need to work with images on
  portable devices with limited storage over cell and WIFI networks. ...
 
  I would not worry about upturned noses.  I would worry about
  the engineering needed to manage experiments.  Lossy compression
  can be an important part of that engineering.
 
  Regards,
Herbert
 
 
  At 4:09 PM -0800 11/7/11, James Holton wrote:
  So far, all I really have is a proof of concept compression
  algorithm here:
  http://bl831.als.lbl.gov/~jamesh/lossy_compression/
 
  Not exactly portable since you need ffmpeg and the x264 libraries
  set up properly.  The latter seems to be constantly changing things
  and breaking the former, so I'm not sure how future proof my
  algorithm is.
 
  Something that caught my eye recently was fractal compression,
  particularly since FIASCO has been part of the NetPBM package for
  about 10 years now.  Seems to give comparable compression vs quality
  as x264 (to my eye), but I'm presently wondering if I'd be wasting my
  time developing this further?  Will the crystallographic world simply
  turn up its collective nose at lossy images?  Even if it means waiting
  6 years for Nielsen's Law to make up the difference in network
  bandwidth?
 
  -James Holton
  MAD Scientist
 
  On Mon, Nov 7, 2011 at 10:01 AM, Herbert J. Bernstein
  y...@bernstein-plus-sons.com  wrote:
This is a very good question.  I would suggest that both versions
of the old data are useful.  If was is being done is simple
  validation
and regeneration of what was done before, then the lossy compression
should be fine in most instances.  However, when what is being
done hinges