Re: [ccp4bb] image compression

2011-11-08 Thread Graeme Winter
HI James,

Regarding the suggestion of lossy compression, it is really hard to
comment without having a good idea of the real cost of doing this. So,
I have a suggestion:

 - grab a bag of JCSG data sets, which we know should all be essentially OK.
 - you squash then unsquash them with your macguffin, perhaps
randomizing them as to whether A or B is squashed.
 - process them with Elves / xia2 / autoPROC (something which is reproducible)
 - pop the results into pdb_redo

Then compare the what-comes-out. Ultimately adding noise may (or may
not) make a measurable difference to the final refinement - this may
be a way of telling if it does or doesn't. Why however would I have
any reason to worry? Because the noise being added is not really
random - it will compression artifacts. This could have a subtle
effect on how the errors are estimated and so on. However you can hum
and haw about this for a decade without reaching a conclusion.

Here, it's something which in all honesty we can actually evaluate, so
is it worth giving it a go? If the results were / are persuasive (i.e.
a report on the use of lossy compression in transmission and storage
of X-ray diffraction data was actually read and endorsed by the
community) this would make it much more worthwhile for consideration
for inclusion in e.g. cbflib.

I would however always encourage (if possible) that the original raw
data is kept somewhere on disk in an unmodified form - I am not a fan
of one-way computational processes with unique data.

Thoughts anyone?

Cheerio,

Graeme

On 7 November 2011 17:30, James Holton jmhol...@lbl.gov wrote:
 At the risk of sounding like another poll, I have a pragmatic question for
 the methods development community:

 Hypothetically, assume that there was a website where you could download the
 original diffraction images corresponding to any given PDB file, including
 early datasets that were from the same project, but because of smeary
 spots or whatever, couldn't be solved.  There might even be datasets with
 unknown PDB IDs because that particular project never did work out, or
 because the relevant protein sequence has been lost.  Remember, few of these
 datasets will be less than 5 years old if we try to allow enough time for
 the original data collector to either solve it or graduate (and then cease
 to care).  Even for the final dataset, there will be a delay, since the
 half-life between data collection and coordinate deposition in the PDB is
 still ~20 months.  Plenty of time to forget.  So, although the images were
 archived (probably named test and in a directory called john) it may be
 that the only way to figure out which PDB ID is the right answer is by
 processing them and comparing to all deposited Fs.  Assume this was done.
  But there will always be some datasets that don't match any PDB.  Are those
 interesting?  What about ones that can't be processed?  What about ones that
 can't even be indexed?  There may be a lot of those!  (hypothetically, of
 course).

 Anyway, assume that someone did go through all the trouble to make these
 datasets available for download, just in case they are interesting, and
 annotated them as much as possible.  There will be about 20 datasets for any
 given PDB ID.

 Now assume that for each of these datasets this hypothetical website has two
 links, one for the raw data, which will average ~2 GB per wedge (after
 gzip compression, taking at least ~45 min to download), and a second link
 for a lossy compressed version, which is only ~100 MB/wedge (2 min
 download).  When decompressed, the images will visually look pretty much
 like the originals, and generally give you very similar Rmerge, Rcryst,
 Rfree, I/sigma, anomalous differences, and all other statistics when
 processed with contemporary software.  Perhaps a bit worse.  Essentially,
 lossy compression is equivalent to adding noise to the images.

 Which one would you try first?  Does lossy compression make it easier to
 hunt for interesting datasets?  Or is it just too repugnant to have
 modified the data in any way shape or form ... after the detector
 manufacturer's software has corrected it?  Would it suffice to simply
 supply a couple of example images for download instead?

 -James Holton
 MAD Scientist



Re: [ccp4bb] weight matrix and R-FreeR gap optimization

2011-11-08 Thread Robbie Joosten
Hi James,

 

That is not exactly a lot of info to decide the best weight. The optimal
weight is (very loosely) resolution dependent. At normal resolutions the
optimal matrix weight is usually well below 1.0. Start at 0.3 and try a few
weights to see what works best for your data. To close the R-free gap you
can also try to optimize other refinement parameters such as NCS restraints,
B-factor model (and restraint weight). Jelly body restraints sometimes work
really well to keep the R-free gap sensible, especially at low resolution.

 

Cheers,

Robbie

 

From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of
james09 pruza
Sent: Tuesday, November 08, 2011 06:40
To: CCP4BB@JISCMAIL.AC.UK
Subject: [ccp4bb] weight matrix and R-FreeR gap optimization

 


Dear ccp4bbers,

 

I wonder if someone can help me defining proper weight matrix term in
Refmac5 to lower the R-FreeR gap. The log file indicates weight matrix of
1.98 with a gap of 7. Thanks for suggestions in advance.

James

 



Re: [ccp4bb] image compression

2011-11-08 Thread Kay Diederichs

Hi James,

I see no real need for lossy compression datasets. They may be useful 
for demonstration purposes, and to follow synchrotron data collection 
remotely. But for processing I need the real data. It is my experience 
that structure solution, at least in the difficult cases, depends on 
squeezing out every bit of scattering information from the data, as much 
as is possible with the given software. Using a lossy-compression 
dataset in this situation would give me the feeling if structure 
solution does not work out, I'll have to re-do everything with the 
original data - and that would be double work. Better not start going 
down that route.


The CBF byte compression puts even a 20bit detector pixel into a single 
byte, on average. These frames can be further compressed, in the case of 
Pilatus fine-slicing frames, using bzip2, almost down to the level of 
entropy in the data (since there are so many zero pixels). And that 
would be lossless.


Storing lossily-compressed datasets would of course not double the 
diskspace needed, but would significantly raise the administrative burdens.


Just to point out my standpoint in this whole discussion about storage 
of raw data:
I've been storing our synchrotron datasets on disks, since 1999. The 
amount of money we spend per year for this purpose is constant (less 
than 1000€). This is possible because the price of a GB disk space drops 
faster than the amount of data per synchrotron trip rises. So if the 
current storage is full (about every 3 years), we set up a bigger RAID 
(plus a backup RAID); the old data, after copying over, always consumes 
only a fraction of the space on the new RAID.


So I think the storage cost is actually not the real issue - rather, the 
real issue has a strong psychological component. People a) may not 
realize that the software they use is constantly being improved, and 
that needs data which cover all the corner cases; b) often do not wish 
to give away something because they feel it might help their 
competitors, or expose their faults.


best,

Kay (XDS co-developer)




 Original Message 
Date: Mon, 7 Nov 2011 09:30:11 -0800
From: James Holton jmhol...@lbl.gov
Subject: image compression
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

At the risk of sounding like another poll, I have a pragmatic question
for the methods development community:

Hypothetically, assume that there was a website where you could download
the original diffraction images corresponding to any given PDB file,
including early datasets that were from the same project, but because
of smeary spots or whatever, couldn't be solved.  There might even be
datasets with unknown PDB IDs because that particular project never
did work out, or because the relevant protein sequence has been lost.
Remember, few of these datasets will be less than 5 years old if we try
to allow enough time for the original data collector to either solve it
or graduate (and then cease to care).  Even for the final dataset,
there will be a delay, since the half-life between data collection and
coordinate deposition in the PDB is still ~20 months.  Plenty of time to
forget.  So, although the images were archived (probably named test
and in a directory called john) it may be that the only way to figure
out which PDB ID is the right answer is by processing them and
comparing to all deposited Fs.  Assume this was done.  But there will
always be some datasets that don't match any PDB.  Are those
interesting?  What about ones that can't be processed?  What about ones
that can't even be indexed?  There may be a lot of those!
(hypothetically, of course).

Anyway, assume that someone did go through all the trouble to make these
datasets available for download, just in case they are interesting,
and annotated them as much as possible.  There will be about 20 datasets
for any given PDB ID.

Now assume that for each of these datasets this hypothetical website has
two links, one for the raw data, which will average ~2 GB per wedge
(after gzip compression, taking at least ~45 min to download), and a
second link for a lossy compressed version, which is only ~100
MB/wedge (2 min download).  When decompressed, the images will visually
look pretty much like the originals, and generally give you very similar
Rmerge, Rcryst, Rfree, I/sigma, anomalous differences, and all other
statistics when processed with contemporary software.  Perhaps a bit
worse.  Essentially, lossy compression is equivalent to adding noise to
the images.

Which one would you try first?  Does lossy compression make it easier to
hunt for interesting datasets?  Or is it just too repugnant to have
modified the data in any way shape or form ... after the detector
manufacturer's software has corrected it?  Would it suffice to simply
supply a couple of example images for download instead?

-James Holton
MAD Scientist



smime.p7s
Description: S/MIME 

Re: [ccp4bb] image compression

2011-11-08 Thread Harry Powell
Hi

 I am not a fan
 of one-way computational processes with unique data.
 
 Thoughts anyone?
 
 Cheerio,
 
 Graeme


I agree.

Harry
--
Dr Harry Powell, MRC Laboratory of Molecular Biology, MRC Centre, Hills Road, 
Cambridge, CB2 0QH

http://www.iucr.org/resources/commissions/crystallographic-computing/schools/mieres2011


Re: [ccp4bb] image compression

2011-11-08 Thread Miguel Ortiz Lombardía
Le 08/11/11 10:15, Kay Diederichs a écrit :
 Hi James,
 
 I see no real need for lossy compression datasets. They may be useful
 for demonstration purposes, and to follow synchrotron data collection
 remotely. But for processing I need the real data. It is my experience
 that structure solution, at least in the difficult cases, depends on
 squeezing out every bit of scattering information from the data, as much
 as is possible with the given software. Using a lossy-compression
 dataset in this situation would give me the feeling if structure
 solution does not work out, I'll have to re-do everything with the
 original data - and that would be double work. Better not start going
 down that route.
 
 The CBF byte compression puts even a 20bit detector pixel into a single
 byte, on average. These frames can be further compressed, in the case of
 Pilatus fine-slicing frames, using bzip2, almost down to the level of
 entropy in the data (since there are so many zero pixels). And that
 would be lossless.
 
 Storing lossily-compressed datasets would of course not double the
 diskspace needed, but would significantly raise the administrative burdens.
 
 Just to point out my standpoint in this whole discussion about storage
 of raw data:
 I've been storing our synchrotron datasets on disks, since 1999. The
 amount of money we spend per year for this purpose is constant (less
 than 1000€). This is possible because the price of a GB disk space drops
 faster than the amount of data per synchrotron trip rises. So if the
 current storage is full (about every 3 years), we set up a bigger RAID
 (plus a backup RAID); the old data, after copying over, always consumes
 only a fraction of the space on the new RAID.
 
 So I think the storage cost is actually not the real issue - rather, the
 real issue has a strong psychological component. People a) may not
 realize that the software they use is constantly being improved, and
 that needs data which cover all the corner cases; b) often do not wish
 to give away something because they feel it might help their
 competitors, or expose their faults.
 
 best,
 
 Kay (XDS co-developer)
 

Hi Kay and others,

I completely agree with you.

Datalove, 3
:-)

-- 
Miguel

Architecture et Fonction des Macromolécules Biologiques (UMR6098)
CNRS, Universités d'Aix-Marseille I  II
Case 932, 163 Avenue de Luminy, 13288 Marseille cedex 9, France
Tel: +33(0) 491 82 55 93
Fax: +33(0) 491 26 67 20
mailto:miguel.ortiz-lombar...@afmb.univ-mrs.fr
http://www.afmb.univ-mrs.fr/Miguel-Ortiz-Lombardia


Re: [ccp4bb] image compression

2011-11-08 Thread Herbert J. Bernstein

Um, but isn't Crystallograpy based on a series of
one-way computational processes:
 photons - images
 images - {struture factors, symmetry}
 {structure factors, symmetry, chemistry} - solution
 {structure factors, symmetry, chemistry, solution}
  - refined solution

At each stage we tolerate a certain amount of noise
in going backwards.  Certainly it is desirable to
have the original data to be able to go forwards,
but until the arrival of pixel array detectors, we
were very far from having the true original data,
and even pixel array detectors don't capture every
single photon.

I am not recommending lossy compressed images as
a perfect replacement for lossless compressed images,
any more than I would recommend structure factors
are a replacement for images.  It would be nice
if we all had large budgets, huge storage capacity
and high network speeds and if somebody would repeal
the speed of light and other physical constraints, so that
engineering compromises were never necessary, but as
James has noted, accepting such engineering compromises
has been of great value to our colleagues who work
with the massive image streams of the entertainment
industry.  Without lossy compression, we would not
have the _higher_ image quality we now enjoy in the
less-than-perfectly-faithful HDTV world that has replaced
the highly faithful, but lower capacity, NTSC/PAL world.

Please, in this, let us not allow the perfect to be
the enemy of the good.  James is proposing something
good.

Regards,
  Herbert
=
  Herbert J. Bernstein
Professor of Mathematics and Computer Science
   Dowling College, Kramer Science Center, KSC 121
Idle Hour Blvd, Oakdale, NY, 11769

 +1-631-244-3035
 y...@dowling.edu
=

On Tue, 8 Nov 2011, Harry Powell wrote:


Hi


I am not a fan
of one-way computational processes with unique data.

Thoughts anyone?

Cheerio,

Graeme



I agree.

Harry
--
Dr Harry Powell, MRC Laboratory of Molecular Biology, MRC Centre, Hills Road, 
Cambridge, CB2 0QH

http://www.iucr.org/resources/commissions/crystallographic-computing/schools/mieres2011



Re: [ccp4bb] Installation of CCP4 under Windows 7

2011-11-08 Thread Robert Oeffner
Hi,
I have encountered this problem with CCP4 6.2.0 as well as for the past few 
years. Whenever I run the CCP4 installer for all users on my Windows Vista PC 
as administrator it only creates desktop icons and startup menu item for the 
administrator account, not for other users. 

It appears that the CCP4 desktop icon shortcut is put in 
C:\Users\Administrator\Desktop and the start menu CCP4 folder is put in 
C:\Users\Administrator\AppData\Roaming\Microsoft\Windows\Start 
Menu\Programs\CCP4-Packages-6.2.0. Both of these paths are incorrect as they 
are invisible when logging in as another user.

I'm not sure where the CCP4 desktop icon shortcut should go but placing the 
startup folder in C:\ProgramData\Microsoft\Windows\Start Menu\Programs makes it 
accessible to other users.

On a separate note, it also appears that there is a bug in the ActiveTCL 
installer which is recommended to be installed on Vista platforms. It only 
makes file associations with tcl files for the user who installed this program. 
Consequently when double clicking the CCP4 icon as a different user than 
administrator windows prompts the user asking what program the  file should be 
opened with. Again this can be overcome by making the wish.exe program from the 
ActiveTcl folder the default program to use.


Regards,

Robert Oeffner, Ph.D.
Research Associate, Read group
Department of Haematology, University of Cambridge
Cambridge Institute of Medical Research
Wellcome Trust / MRC Building, Hills Road, Cambridge, CB2 0XY
www-structmed.cimr.cam.ac.uk, tel:01223763234


Re: [ccp4bb] image compression

2011-11-08 Thread Graeme Winter
Dear Herbert,

Sorry, the point I was getting at was that the process is one way, but
if it is also *destructive* i.e. the original master is not
available then I would not be happy. If the master copy of what was
actually recorded is available from a tape someplace perhaps not all
that quickly then to my mind that's fine.

When we go from images to intensities, the images still exist. And by
and large the intensities are useful enough that you don't go back to
the images again. This is worth investigating I believe, which is why
I made that proposal.

Mostly I listen to mp3's as they're convenient, but I still buy CD's
not direct off e.g. itunes, and yes a H264 compressed video stream is
much nicer to watch than VHS.

Best wishes,

Graeme

On 8 November 2011 12:17, Herbert J. Bernstein
y...@bernstein-plus-sons.com wrote:
 Um, but isn't Crystallograpy based on a series of
 one-way computational processes:
     photons - images
     images - {struture factors, symmetry}
  {structure factors, symmetry, chemistry} - solution
  {structure factors, symmetry, chemistry, solution}
      - refined solution

 At each stage we tolerate a certain amount of noise
 in going backwards.  Certainly it is desirable to
 have the original data to be able to go forwards,
 but until the arrival of pixel array detectors, we
 were very far from having the true original data,
 and even pixel array detectors don't capture every
 single photon.

 I am not recommending lossy compressed images as
 a perfect replacement for lossless compressed images,
 any more than I would recommend structure factors
 are a replacement for images.  It would be nice
 if we all had large budgets, huge storage capacity
 and high network speeds and if somebody would repeal
 the speed of light and other physical constraints, so that
 engineering compromises were never necessary, but as
 James has noted, accepting such engineering compromises
 has been of great value to our colleagues who work
 with the massive image streams of the entertainment
 industry.  Without lossy compression, we would not
 have the _higher_ image quality we now enjoy in the
 less-than-perfectly-faithful HDTV world that has replaced
 the highly faithful, but lower capacity, NTSC/PAL world.

 Please, in this, let us not allow the perfect to be
 the enemy of the good.  James is proposing something
 good.

 Regards,
  Herbert
 =
              Herbert J. Bernstein
    Professor of Mathematics and Computer Science
   Dowling College, Kramer Science Center, KSC 121
        Idle Hour Blvd, Oakdale, NY, 11769

                 +1-631-244-3035
                 y...@dowling.edu
 =

 On Tue, 8 Nov 2011, Harry Powell wrote:

 Hi

 I am not a fan
 of one-way computational processes with unique data.

 Thoughts anyone?

 Cheerio,

 Graeme


 I agree.

 Harry
 --
 Dr Harry Powell, MRC Laboratory of Molecular Biology, MRC Centre, Hills
 Road, Cambridge, CB2 0QH


 http://www.iucr.org/resources/commissions/crystallographic-computing/schools/mieres2011




[ccp4bb] 4th Winter School on soft X-rays in Macromolecular Crystallography

2011-11-08 Thread Gordon Leonard


4th Winter School on soft X-rays in Macromolecular
Crystallography

We are pleased to announce that the 4th Winter school on soft X-rays in
Macromolecular Crystallography will take place at the European
Synchrotron Radiation Facility, Grenoble, France 6th – 8th February 2012.


An increasing number of new crystal structures of biological
macromolecules structures are solved exploiting anomalous signals
available at longer wavelengths. The main advantage of using longer
wavelengths is that the anomalous signal from innate sulphur, phosphorous
or other light atoms (e.g. Ca, K, Cl) can be used to obtain phasing
information, thus obviating the need to prepare the derivative crystals
usually used in macromolecular crystal structure solution. However, at
longer wavelengths currently routinely accessible, such anomalous signals
are rather small (usually ~ 1%). Special care must thus be given to
experimental setups and data collection protocols. The Winter School
brings together experts in the hardware and software required to
successfully perform such experiments. The following topics will be
covered:

• Optimised experimental setups for diffraction data collection at longer
wavelengths
• Advanced data collection strategies, taking into account radiation
damage
• Optimised protocols of data processing for longer wavelength
experiments
• Anomalous scattering substructure determination using longer wavelength
X-rays

The Winter School will being held under the auspices of the ESRF Users’
Meeting and the number of participants limited to 20. Confirmed speakers
at the workshop (see

http://www.esrf.fr/events/conferences/users-meeting-2012-workshops/mx-school
) for provisional program) include:

 M. Cianci, EMBL Hamburg, Germany
 K. Djinovic-Carugo, Vienna Biocenter, Austria
 D. de Sanctis, ESRF, Grenoble, France
 P. Johansson, AstraZeneca Structural Chemistry
Laboratory, Molndal, Sweden
 J. Liu, National Laboratory of Biomacromolecules,
Beijing, China
 R. Giordano, ESRF, Grenoble, France
 G. Leonard, ESRF, Grenoble, France
 C. Mueller-Dieckmann, ESRF Grenoble
 J. Pflugrath, Rigaku, USA
 A. Popov, ESRF, Grenoble, France
 I. Uson, IBMB-CSIC, Barcelona, Spain
 A. Wagner, Diamond Light Source, UK
 B.C. Wang, University of Georgia, USA
 M. Wang, Swiss Light Source Villigen, Switzerland
 M.S. Weiss, Helmholtz-Zentrum für Materialien und
Energie, Germany


The Winter School registration fee of 150 € includes all meals, 4 nights’
accommodation in the ESRF Guesthouse as well as registration for the ESRF
Users' Meeting
(
http://www.esrf.fr/events/conferences/users-meeting-2012-workshops/preliminary-programme
).

Applications for participation in the Winter School should include a
C.V., a letter of motivation, a poster abstract and (where appropriate) a
recommendation letter from a PhD supervisor and be sent by Monday 12th
December to the Winter School organisers at
mx-winterschool2...@esrf.fr

Further information concerning the workshop and application procedures
can be found at

http://www.esrf.fr/events/conferences/users-meeting-2012-workshops/mx-school

On behalf of the organizers,

Gordon Leonard 




Re: [ccp4bb] Installation of CCP4 under Windows 7

2011-11-08 Thread Marcin Wojdyr

Hi,
Thanks for listing all the problems.
We are testing a new installer for Windows now. It's written from scratch (we 
switched from InstallShield to WiX) and if we don't find any issues with it 
we'll put it on ftp tomorrow.

ActiveTcl should not be necessary with this and future versions. In September 
we updated our build of Tcl and friends and it seems to work well with ccp4i 
and imosflm. We'll remove the recommendation of ActiveTcl from the website soon.

Cheers
Marcin


-Original Message-
From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Robert 
Oeffner
Sent: 08 November 2011 12:27
To: ccp4bb
Subject: Re: [ccp4bb] Installation of CCP4 under Windows 7

Hi,
I have encountered this problem with CCP4 6.2.0 as well as for the past few 
years. Whenever I run the CCP4 installer for all users on my Windows Vista PC 
as administrator it only creates desktop icons and startup menu item for the 
administrator account, not for other users. 

It appears that the CCP4 desktop icon shortcut is put in 
C:\Users\Administrator\Desktop and the start menu CCP4 folder is put in 
C:\Users\Administrator\AppData\Roaming\Microsoft\Windows\Start 
Menu\Programs\CCP4-Packages-6.2.0. Both of these paths are incorrect as they 
are invisible when logging in as another user.

I'm not sure where the CCP4 desktop icon shortcut should go but placing the 
startup folder in C:\ProgramData\Microsoft\Windows\Start Menu\Programs makes it 
accessible to other users.

On a separate note, it also appears that there is a bug in the ActiveTCL 
installer which is recommended to be installed on Vista platforms. It only 
makes file associations with tcl files for the user who installed this program. 
Consequently when double clicking the CCP4 icon as a different user than 
administrator windows prompts the user asking what program the  file should be 
opened with. Again this can be overcome by making the wish.exe program from the 
ActiveTcl folder the default program to use.


Regards,

Robert Oeffner, Ph.D.
Research Associate, Read group
Department of Haematology, University of Cambridge
Cambridge Institute of Medical Research
Wellcome Trust / MRC Building, Hills Road, Cambridge, CB2 0XY
www-structmed.cimr.cam.ac.uk, tel:01223763234


Re: [ccp4bb] weight matrix and R-FreeR gap optimization

2011-11-08 Thread Eleanor Dodson

On 11/08/2011 05:39 AM, james09 pruza wrote:

Dear ccp4bbers,

I wonder if someone can help me defining proper weight matrix term in
Refmac5 to lower the R-FreeR gap. The log file indicates weight matrix of
1.98 with a gap of 7. Thanks for suggestions in advance.
James




What is your resolution? The gap is usually wider at lower resolution.
Eleanor


Re: [ccp4bb] weight matrix and R-FreeR gap optimization

2011-11-08 Thread Bernhard Rupp (Hofkristallrat a.D.)
 What is your resolution? The gap is usually wider at lower resolution.

Here a figure displaying distribution gap stats:

http://www.ruppweb.org/garland/gallery/Ch12/pages/Biomolecular_Crystallograp
hy_Fig_12-24.htm

Cheers, BR

-Original Message-
From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of
Eleanor Dodson
Sent: Tuesday, November 08, 2011 8:11 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] weight matrix and R-FreeR gap optimization

On 11/08/2011 05:39 AM, james09 pruza wrote:
 Dear ccp4bbers,

 I wonder if someone can help me defining proper weight matrix term in
 Refmac5 to lower the R-FreeR gap. The log file indicates weight matrix of
 1.98 with a gap of 7. Thanks for suggestions in advance.
 James



What is your resolution? The gap is usually wider at lower resolution.
Eleanor


[ccp4bb] phaser openmp

2011-11-08 Thread Ed Pozharski
Could anyone point me towards instructions on how to get/build
parallelized phaser binary on linux?  I searched around but so far found
nothing.  The latest updated phaser binary doesn't seem to be
parallelized.  

Apologies if this has been resolved before - just point at the relevant
thread, please.

-- 
I'd jump in myself, if I weren't so good at whistling.
   Julian, King of Lemurs


Re: [ccp4bb] phaser openmp

2011-11-08 Thread Dr G. Bunkoczi

Hi Ed,

in the CCP4 distribution, openmp is not enabled by default, and there
seems to be no easy way to enable it (i.e. by setting a flag at the
configure stage).

On the other hand, you can easily create a separate build for phaser
that is openmp enabled and use phaser from there. To do this, create a
new folder, say phaser-build, cd into it, and issue the following
commands (this assumes you are using bash):

$ python $CCP4/lib/cctbx/cctbx_sources/cctbx_project/libtbx/configure.py
--repository=$CCP4/src/phaser/source phaser
--build-boost-python-extensions=False --enable-openmp-if-possible=True

$ . ./setpaths.sh (source ./setpaths.csh with csh) $ libtbx.scons (if you 
have several CPUs, add -jX where X is the number of CPUs you want to use 
for compilation)


This will build phaser that is openmp-enabled. You can also try passing
the --static-exe flag (to configure.py), in which case the executable is
static and can be relocated without any headaches. This works with
certain compilers.

Let me know if there are any problems!

BW, Gabor

On Nov 8 2011, Ed Pozharski wrote:


Could anyone point me towards instructions on how to get/build
parallelized phaser binary on linux?  I searched around but so far found
nothing.  The latest updated phaser binary doesn't seem to be
parallelized.  


Apologies if this has been resolved before - just point at the relevant
thread, please.




Re: [ccp4bb] image compression

2011-11-08 Thread James Holton
At the risk of putting this thread back on-topic, my original question 
was not should I just lossfully compress my images and throw away the 
originals.  My question was:


 would you download the compressed images first?

So far, noone has really answered it.

I think it is obvious that of course we would RATHER have the original 
data, but if access to the original data is slow (by a factor of 30 at 
best) then can the mp3 version of diffraction data play a useful role 
in YOUR work?


Taking Graeme's request from a different thread as an example, he would 
like to see stuff in P21 with a 90 degree beta angle.  There are 
currently ~609 examples of this in the PDB.  So, I ask again: which one 
would you download first?.  1aip? (It is first alphabetically).  Then 
again, if you just email the corresponding authors of all 609 papers, 
the response rate alone might whittle the number of datasets to deal 
with down to less than 10.  Perhaps even less than 1.


-James Holton
MAD Scientist


On 11/8/2011 5:17 AM, Graeme Winter wrote:

Dear Herbert,

Sorry, the point I was getting at was that the process is one way, but
if it is also *destructive* i.e. the original master is not
available then I would not be happy. If the master copy of what was
actually recorded is available from a tape someplace perhaps not all
that quickly then to my mind that's fine.

When we go from images to intensities, the images still exist. And by
and large the intensities are useful enough that you don't go back to
the images again. This is worth investigating I believe, which is why
I made that proposal.

Mostly I listen to mp3's as they're convenient, but I still buy CD's
not direct off e.g. itunes, and yes a H264 compressed video stream is
much nicer to watch than VHS.

Best wishes,

Graeme

On 8 November 2011 12:17, Herbert J. Bernstein
y...@bernstein-plus-sons.com  wrote:

Um, but isn't Crystallograpy based on a series of
one-way computational processes:
 photons -  images
 images -  {struture factors, symmetry}
  {structure factors, symmetry, chemistry} -  solution
  {structure factors, symmetry, chemistry, solution}
  -  refined solution

At each stage we tolerate a certain amount of noise
in going backwards.  Certainly it is desirable to
have the original data to be able to go forwards,
but until the arrival of pixel array detectors, we
were very far from having the true original data,
and even pixel array detectors don't capture every
single photon.

I am not recommending lossy compressed images as
a perfect replacement for lossless compressed images,
any more than I would recommend structure factors
are a replacement for images.  It would be nice
if we all had large budgets, huge storage capacity
and high network speeds and if somebody would repeal
the speed of light and other physical constraints, so that
engineering compromises were never necessary, but as
James has noted, accepting such engineering compromises
has been of great value to our colleagues who work
with the massive image streams of the entertainment
industry.  Without lossy compression, we would not
have the _higher_ image quality we now enjoy in the
less-than-perfectly-faithful HDTV world that has replaced
the highly faithful, but lower capacity, NTSC/PAL world.

Please, in this, let us not allow the perfect to be
the enemy of the good.  James is proposing something
good.

Regards,
  Herbert
=
  Herbert J. Bernstein
Professor of Mathematics and Computer Science
   Dowling College, Kramer Science Center, KSC 121
Idle Hour Blvd, Oakdale, NY, 11769

 +1-631-244-3035
 y...@dowling.edu
=

On Tue, 8 Nov 2011, Harry Powell wrote:


Hi


I am not a fan
of one-way computational processes with unique data.

Thoughts anyone?

Cheerio,

Graeme


I agree.

Harry
--
Dr Harry Powell, MRC Laboratory of Molecular Biology, MRC Centre, Hills
Road, Cambridge, CB2 0QH


http://www.iucr.org/resources/commissions/crystallographic-computing/schools/mieres2011



Re: [ccp4bb] Archiving Images for PDB Depositions

2011-11-08 Thread Deacon, Ashley M.
All,



We have been following the CCP4BB discussion with interest. As has been 
mentioned on several occasions, 
the JCSG has maintained, for several years now, an open archive of all 
diffraction datasets associated with 
our deposited structures. Overall this has been a highly positive experience 
and many developers, researchers, 
teachers and students have benefited from our archive. We currently have close 
to 100 registered users of our

archive and we seem to receive a new batch of users each time our archive is 
acknowledged in a paper or is 
mentioned at a conference. Building on this initial success, we are currently 
extending our archive to include 
unsolved datasets, which will help us more readily share data and collaborate 
with methods developers on some 
of our less tractable datasets. We are also planning to include screening 
images for all crystals evaluated as part 
of the JCSG pipeline (largely as a feedback tool to help improve crystal 
quality).



At JCSG, we benefit tremendously from our central database, which already 
tracks all required metadata associated 
with any crystal. Thus I agree with other comments that the cost of such an 
undertaking should not be underestimated. 
The cost of the hardware may be modest; however, people and resources are 
needed to develop and maintain a robust 
and reliable archive.



To date we have not assigned DOIs to our datasets, but we certainly feel this 
would be of value going forward and are 
currently considering this option for our revised archive, which is currently 
in development.



If successful then this may form a good prototype system, which could be opened 
up to a broader community outside 
of JCSG.



We (JCSG) have already shared much of our experiences with the IUCR working 
group and we would be happy to 
participate
 and contribute to any ongoing  efforts.



Sincerely,
Ashley.Deacon

JCSG Structure Determination Core Leader


Re: [ccp4bb] image compression

2011-11-08 Thread Miguel Ortiz Lombardia
Le 08/11/2011 19:19, James Holton a écrit :
 At the risk of putting this thread back on-topic, my original question
 was not should I just lossfully compress my images and throw away the
 originals.  My question was:
 
  would you download the compressed images first?
 
 So far, noone has really answered it.
 
 I think it is obvious that of course we would RATHER have the original
 data, but if access to the original data is slow (by a factor of 30 at
 best) then can the mp3 version of diffraction data play a useful role
 in YOUR work?
 
 Taking Graeme's request from a different thread as an example, he would
 like to see stuff in P21 with a 90 degree beta angle.  There are
 currently ~609 examples of this in the PDB.  So, I ask again: which one
 would you download first?.  1aip? (It is first alphabetically).  Then
 again, if you just email the corresponding authors of all 609 papers,
 the response rate alone might whittle the number of datasets to deal
 with down to less than 10.  Perhaps even less than 1.
 
 -James Holton
 MAD Scientist
 

Hmm, I thought I had been clear. I will try to be more direct:

Given the option, I would *only* download the original,
non-lossy-compressed data. At the expense of time, yes. I don't think
Graeme's example is very representative of our work, sorry.

As long as the option between the two is warranted, I don't care. I just
don't see the point for the very same reasons Kay has very clearly exposed.

Best regards,

-- 
Miguel

Architecture et Fonction des Macromolécules Biologiques (UMR6098)
CNRS, Universités d'Aix-Marseille I  II
Case 932, 163 Avenue de Luminy, 13288 Marseille cedex 9, France
Tel: +33(0) 491 82 55 93
Fax: +33(0) 491 26 67 20
mailto:miguel.ortiz-lombar...@afmb.univ-mrs.fr
http://www.afmb.univ-mrs.fr/Miguel-Ortiz-Lombardia


Re: [ccp4bb] image compression

2011-11-08 Thread mjvdwoerd

 Hmmm, so you would, when collecting large data images, say 4 images, 100MB in 
size, per second, in the middle of the night, from home, reject seeing 
compressed images on your data collection software, while the real thing is 
lingering behind somewhere, to be downloaded and stored later? As opposed to 
not seeing the images (because your home internet access cannot keep up) and 
only inspecting 1 in a 100 images to see progress?

I think there are instances where compressed (lossy or not) images will be 
invaluable. I know the above situation was not the context, but (y'all may gasp 
about this) I still have some friends (in the US) who live so far out in the 
wilderness that only dial-up internet is available. That while synchrotrons and 
the detectors used get better all the time, which means more MB/s produced. 

James has already said (and I agree) that the original images (with all 
information) should not necessarily be thrown away. Perhaps a better question 
would be which would you use for what purpose, since I am convinced that 
compressed images are useful. 

I would want to process the real thing, unless I have been shown by 
scientific evidence that the compressed thing works equally well. It seems 
reasonable to assume that such evidence can be acquired and/or that we can be 
shown by evidence what we gain and lose by lossy-compressed images. Key might 
be to be able to choose the best thing for your particular 
application/case/location etc. 

So yes, James, of course this is useful and not a waste of time.

Mark

 

 

-Original Message-
From: Miguel Ortiz Lombardia miguel.ortiz-lombar...@afmb.univ-mrs.fr
To: CCP4BB CCP4BB@JISCMAIL.AC.UK
Sent: Tue, Nov 8, 2011 12:29 pm
Subject: Re: [ccp4bb] image compression


Le 08/11/2011 19:19, James Holton a écrit :
 At the risk of putting this thread back on-topic, my original question
 was not should I just lossfully compress my images and throw away the
 originals.  My question was:
 
  would you download the compressed images first?
 
 So far, noone has really answered it.
 
 I think it is obvious that of course we would RATHER have the original
 data, but if access to the original data is slow (by a factor of 30 at
 best) then can the mp3 version of diffraction data play a useful role
 in YOUR work?
 
 Taking Graeme's request from a different thread as an example, he would
 like to see stuff in P21 with a 90 degree beta angle.  There are
 currently ~609 examples of this in the PDB.  So, I ask again: which one
 would you download first?.  1aip? (It is first alphabetically).  Then
 again, if you just email the corresponding authors of all 609 papers,
 the response rate alone might whittle the number of datasets to deal
 with down to less than 10.  Perhaps even less than 1.
 
 -James Holton
 MAD Scientist
 

Hmm, I thought I had been clear. I will try to be more direct:

Given the option, I would *only* download the original,
non-lossy-compressed data. At the expense of time, yes. I don't think
Graeme's example is very representative of our work, sorry.

As long as the option between the two is warranted, I don't care. I just
don't see the point for the very same reasons Kay has very clearly exposed.

Best regards,

-- 
Miguel

Architecture et Fonction des Macromolécules Biologiques (UMR6098)
CNRS, Universités d'Aix-Marseille I  II
Case 932, 163 Avenue de Luminy, 13288 Marseille cedex 9, France
Tel: +33(0) 491 82 55 93
Fax: +33(0) 491 26 67 20
mailto:miguel.ortiz-lombar...@afmb.univ-mrs.fr
http://www.afmb.univ-mrs.fr/Miguel-Ortiz-Lombardia

 


Re: [ccp4bb] image compression

2011-11-08 Thread Miguel Ortiz Lombardia
Le 08/11/2011 20:46, mjvdwo...@netscape.net a écrit :
 Hmmm, so you would, when collecting large data images, say 4 images,
 100MB in size, per second, in the middle of the night, from home, reject
 seeing compressed images on your data collection software, while the
 real thing is lingering behind somewhere, to be downloaded and stored
 later? As opposed to not seeing the images (because your home internet
 access cannot keep up) and only inspecting 1 in a 100 images to see
 progress?
 

1. I don't need to *see* all images to verify whether the collection is
going all right. If I collect remotely, I process remotely, no need to
transfer images. Data is collected so fast today that you may, even
while collecting at the synchrotron, finish the collection without a)
seeing actually all the images (cf. Pilatus detectors) b) keeping in
pace at all your data processing. The crystal died or was not collected
properly? You try to understand why, you recollect it if possible or you
try a new crystal. It's been always like this, it's call trial and error.

2. The ESRF in Grenoble produces thumbnails of the images. If all you
want to see is whether there is diffraction, they are good enough and
they are useful. They are extremely lossy and useless for anything else.

3. Please, compare contemporary facts. Today's bandwidth is what it is,
today's images are *not* 100 Mb (yet). When they get there, let us know
what is the bandwidth.

 I think there are instances where compressed (lossy or not) images will
 be invaluable. I know the above situation was not the context, but
 (y'all may gasp about this) I still have some friends (in the US) who
 live so far out in the wilderness that only dial-up internet is
 available. That while synchrotrons and the detectors used get better all
 the time, which means more MB/s produced.

I would understand a situation like the one you describe for a poor, or
an embargoed country where unfortunately there is no other way to
connect to a synchrotron. Still, that should be solved by the community
in a different way: by gracious cooperation with our colleagues in those
countries. Your example is actually quite upsetting, given the current
state of affairs in the world.

 
 James has already said (and I agree) that the original images (with all
 information) should not necessarily be thrown away. Perhaps a better
 question would be which would you use for what purpose, since I am
 convinced that compressed images are useful.
 

I think I was clear: as long as we have access to the original data, I
don't care. I would only use the original data.

 I would want to process the real thing, unless I have been shown by
 scientific evidence that the compressed thing works equally well. It
 seems reasonable to assume that such evidence can be acquired and/or
 that we can be shown by evidence what we gain and lose by
 lossy-compressed images. Key might be to be able to choose the best
 thing for your particular application/case/location etc.
 

This still assumes that future software will not be able to detect the
differences that you cannot see today. This may or may not be true, the
consequences may or may not be important. But there is, I think,
reasonable doubt on both questions.

 So yes, James, of course this is useful and not a waste of time.
 

I have said to James, off the list, that he should go on if he's
convinced about the usefulness of his approach. For a very scientific
reason: I could be wrong. Yet, if need be to go into the compression
path, I think we should prefer lossless options.

Best regards,


-- 
Miguel

Architecture et Fonction des Macromolécules Biologiques (UMR6098)
CNRS, Universités d'Aix-Marseille I  II
Case 932, 163 Avenue de Luminy, 13288 Marseille cedex 9, France
Tel: +33(0) 491 82 55 93
Fax: +33(0) 491 26 67 20
mailto:miguel.ortiz-lombar...@afmb.univ-mrs.fr
http://www.afmb.univ-mrs.fr/Miguel-Ortiz-Lombardia


Re: [ccp4bb] image compression

2011-11-08 Thread Phil Evans
It would be a good start to get all images written now with lossless 
compression, instead of the uncompressed images we still get from the ADSC 
detectors. Something that we've been promised for many years

Phil


Re: [ccp4bb] phaser openmp

2011-11-08 Thread Nat Echols
On Tue, Nov 8, 2011 at 4:22 PM, Francois Berenger beren...@riken.jp wrote:
 In the past I have been quite badly surprised by
 the no-acceleration I gained when using OpenMP
 with some of my programs... :(

Amdahl's law is cruel:

http://en.wikipedia.org/wiki/Amdahl's_law

This is the same reason why GPU acceleration isn't very useful for
most crystallography software.

-Nat


Re: [ccp4bb] image compression

2011-11-08 Thread Herbert J. Bernstein

ADSC has been a leader in supporting compressed CBF's.
=
  Herbert J. Bernstein
Professor of Mathematics and Computer Science
   Dowling College, Kramer Science Center, KSC 121
Idle Hour Blvd, Oakdale, NY, 11769

 +1-631-244-3035
 y...@dowling.edu
=

On Tue, 8 Nov 2011, Phil Evans wrote:

It would be a good start to get all images written now with lossless 
compression, instead of the uncompressed images we still get from the 
ADSC detectors. Something that we've been promised for many years


Phil



Re: [ccp4bb] image compression

2011-11-08 Thread William G. Scott
The mp3/music analogy might be quite appropriate.

On some commercial music download sites, there are several options for 
purchase, ranging from audiophool-grade 24-bit, 192kHz sampled music, to 
CD-quality (16-bit, 44.1kHz), to mp3 compression and various lossy bit-rates.  
I am told that the resampling and compression is actually done on the fly by 
the server, from a single master, and the purchaser chooses what files to 
download based on cost, ability to play high-res data, degree of canine-like 
hearing, intolerance for lossy compression with its limited dynamic range, etc.

Perhaps that would be the best way to handle it from a central repository, 
allowing the end-user to decide on the fly. The lossless files could somehow be 
tagged as such, to avoid confusion.


Bill




William G. Scott
Professor
Department of Chemistry and Biochemistry
and The Center for the Molecular Biology of RNA
228 Sinsheimer Laboratories
University of California at Santa Cruz
Santa Cruz, California 95064
USA

phone:  +1-831-459-5367 (office)
  +1-831-459-5292 (lab)
fax:+1-831-4593139  (fax) 


Re: [ccp4bb] phaser openmp

2011-11-08 Thread Ed Pozharski
See page 3 of this

http://www-structmed.cimr.cam.ac.uk/phaser/ccp4-sw2011.pdf



On Wed, 2011-11-09 at 09:22 +0900, Francois Berenger wrote:
 Hello,
 
 How faster is the OpenMP version of Phaser
 versus number of cores used?
 
 In the past I have been quite badly surprised by
 the no-acceleration I gained when using OpenMP
 with some of my programs... :(
 
 Regards,
 F.
 
 On 11/09/2011 02:59 AM, Dr G. Bunkoczi wrote:
  Hi Ed,
 
  in the CCP4 distribution, openmp is not enabled by default, and there
  seems to be no easy way to enable it (i.e. by setting a flag at the
  configure stage).
 
  On the other hand, you can easily create a separate build for phaser
  that is openmp enabled and use phaser from there. To do this, create a
  new folder, say phaser-build, cd into it, and issue the following
  commands (this assumes you are using bash):
 
  $ python $CCP4/lib/cctbx/cctbx_sources/cctbx_project/libtbx/configure.py
  --repository=$CCP4/src/phaser/source phaser
  --build-boost-python-extensions=False --enable-openmp-if-possible=True
 
  $ . ./setpaths.sh (source ./setpaths.csh with csh) $ libtbx.scons (if
  you have several CPUs, add -jX where X is the number of CPUs you want to
  use for compilation)
 
  This will build phaser that is openmp-enabled. You can also try passing
  the --static-exe flag (to configure.py), in which case the executable is
  static and can be relocated without any headaches. This works with
  certain compilers.
 
  Let me know if there are any problems!
 
  BW, Gabor
 
  On Nov 8 2011, Ed Pozharski wrote:
 
  Could anyone point me towards instructions on how to get/build
  parallelized phaser binary on linux? I searched around but so far found
  nothing. The latest updated phaser binary doesn't seem to be
  parallelized.
  Apologies if this has been resolved before - just point at the relevant
  thread, please.