Re: [ccp4bb] Search for a particular motif [off-topic]

2021-10-19 Thread Robbie Joosten
You can analyse your hits with DSSP to see the secondary structure afterwards.Cheers,RobbieOn 19 Oct 2021 21:58, Guillaume Gaullier  wrote:
Hello,


ScanProsite almost does what you want, but since it only searches sequence databases, it has no notion of secondary structures and therefore cannot exclude motifs found in a secondary structure.
https://prosite.expasy.org/scanprosite/


I don’t know of any tool that would search structures and not only sequences. But if your motif isn’t too short or simple, ScanProsite should return a small enough number of hits that you could then inspect structures (or AlphaFold models) manually.


I hope this helps,












Guillaume















On 19 Oct 2021, at 20:43, Jan van Agthoven  wrote:



Dear all,
I apologize for the off-topic question. I’d like to search for a particular aa sequence motif inside the protein sequence data bank (Swiss-prot, Uniprot, etc…) with the following criteria:

It should not be inside a secondary structure.



Does anyone know a program that could do that?
Thanks,
Jan



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1
















När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/


E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1



[ccp4bb] Scientist - CMCF Beamlines, Canadian Light Source

2021-10-19 Thread Michel Fodje
The Canadian Light Source Inc. (CLSI) is the national synchrotron research 
facility on the
University of Saskatchewan campus in Saskatoon. This facility serves national 
and international users from academia, industry, and government institutions. 
The CLS facilitates scientific research aimed at finding solutions for global 
challenges in agriculture, health, advanced materials, and the
environment.

The CLSI is now accepting applications for a Scientist at the Canadian 
Macromolecular Crystallography Facility (CMCF) within the Bio/Life Sciences 
department. The CMCF operates two beamlines, CMCF-ID and CMCF-BM which enable 
high-resolution structural studies of proteins, nucleic acids, and other
macromolecules, satisfying the requirements of challenging and diverse 
structural biology experiments. It is a unique national facility that supports 
Canadian macromolecular crystallography labs.

Responsibilities:

  *   Provides user support to general users, Beam Team members, 
fee-for-service clients and CLS staff. Provides the following functions:
 *   Beamline and endstation setup
 *   Assists with sample preparation
 *   User training and support including supervision
 *   Technical troubleshooting
  *   Solves difficult, complex or unusual problems relating to the beamline. 
Maintains, defines specifications, and software development for beamline 
software and technical troubleshooting.
  *   Prepares beamline documentation in the form of written reports, 
guidelines, and logbooks, ensuring CLS standards, and other relevant codes, 
standards and practices are met.
  *   Participates in the installation, maintenance, periodic upgrades and 
operation of specific components or endstations.
  *   Maintains beamline Quality Assurance and preventative maintenance 
procedures for the safe and efficient operation of the beamline.
  *   Undertakes and oversees medium to large projects of high complexity on 
the beamline, for example, the operation, maintenance and upgrade of a 
scientific endstation, providing scientific leadership within a project team.
  *   May provide expert technical review of beamtime proposals and input to 
the Peer Review Committee (PRC) concerning general user proposals.
  *   Develops recognized scientific expertise related to the beamline or 
equipment. Develops an approved research program (independently or 
collaboratively) that positions the CLS as a solutions provider in one of the 
four key sectors of health, agriculture, environment, and advanced materials, 
utilizing up to 20% of assigned CLS resources (time, equipment, materials, 
beamtime).

Required Qualifications:

  *   All applicants are expected to have completed a relevant Ph.D.
  *   Experience: A minimum of 1-2 years of directly related experience in a 
synchrotron or scientific laboratory.
  *   Experience in X-ray crystallography is a requirement for this position
  *   A background with hard X-ray synchrotron-based techniques is an asset
  *   Python programming experience would be considered an asset
  *   This position will be filled by an individual with a clear aptitude for 
science and instrumentation, and with a desire to participate in an active and 
exciting scientific program

Remuneration:
Remuneration will be commensurate with qualifications and experience. A 
comprehensive benefits package, including supplemental health & dental, life 
insurance, pension plan, and four weeks’ vacation is part of a competitive 
compensation package.

To Apply:
Submit a resume along with references, in confidence, online at  
https://www.lightsource.ca/careers. Applications will be considered as of 
November 5, 2021. While all applicants are thanked for their interest, only 
short-listed candidates will be contacted.

Canadian Light Source Inc. is an equal opportunity employer and encourages 
members of designated groups (women, Indigenous people, people with 
disabilities and visible minorities) to self-identify on their applications




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] Search for a particular motif [off-topic]

2021-10-19 Thread Guillaume Gaullier
Hello,

ScanProsite almost does what you want, but since it only searches sequence 
databases, it has no notion of secondary structures and therefore cannot 
exclude motifs found in a secondary structure.
https://prosite.expasy.org/scanprosite/

I don’t know of any tool that would search structures and not only sequences. 
But if your motif isn’t too short or simple, ScanProsite should return a small 
enough number of hits that you could then inspect structures (or AlphaFold 
models) manually.

I hope this helps,

Guillaume


On 19 Oct 2021, at 20:43, Jan van Agthoven  wrote:

Dear all,
I apologize for the off-topic question. I’d like to search for a particular aa 
sequence motif inside the protein sequence data bank (Swiss-prot, Uniprot, 
etc…) with the following criteria:

  *   It should not be inside a secondary structure.

Does anyone know a program that could do that?
Thanks,
Jan



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1









När du har kontakt med oss på Uppsala universitet med e-post så innebär det att 
vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du 
läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/

E-mailing Uppsala University means that we will process your personal data. For 
more information on how this is performed, please read here: 
http://www.uu.se/en/about-uu/data-protection-policy



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


[ccp4bb] (off topic) Job posting - Associate Director, Protein Science

2021-10-19 Thread Artem Evdokimov
https://boards.greenhouse.io/roivantsciences/jobs/3549863

Finally posted! Thank you for looking.

Artem

- Cosmic Cats approve of this message



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


[ccp4bb] Search for a particular motif [off-topic]

2021-10-19 Thread Jan van Agthoven
Dear all,
I apologize for the off-topic question. I’d like to search for a particular aa 
sequence motif inside the protein sequence data bank (Swiss-prot, Uniprot, 
etc…) with the following criteria:
It should not be inside a secondary structure.

Does anyone know a program that could do that?
Thanks,
Jan


To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


[ccp4bb] two years (24 months) position for a postdoc c/o Free University of Bolzano

2021-10-19 Thread Benini Stefano
Dear All job seekers et al.,

We have a call for a two years (24 months) position for a postdoc with 
experience in protein purification, characterization and structure 
determination by X-ray crystallography for the project:

Discovering Apple Quality Markers for a Sustainable Conservation in Dynamic 
Controlled Atmosphere (DAMaSCo)

The sustainable reduction of post-harvest losses can be achieved by regulating 
apple metabolism during storage. A decrease of oxygen to concentrations ranging 
from 2-3 to 1 % is used to extend the storage life of apples (Malus domestica) 
in controlled atmosphere (CA). In dynamic CA (Dynamic CA, DCA), oxygen levels 
are set at concentrations as low as 0.4 %. However, in DCA oxygen reduction 
reaches the lowest level tolerated by the fruit with a high risk of severe 
quality loss.
Aims of the project “DAMaSCo ” are:

  *   identification of quality markers (genes and metabolites) to prevent 
decay of apples during storage in DCA. (university of Padua)
  *   characterization of the M. domestica enzymes differentially expressed in 
low oxygen conditions. (university of Bolzano)

  *   elucidation of their reaction mechanisms and substrate specificity 
(university of Bolzano).
The project “DAMaSCo” brings together in a multidisciplinary collaborative 
effort
Dr Stefano Benini (Free University of Bozen-Bolzano, UNIBZ), Prof. Benedetto 
Ruperti (University of Padua, UNIPD), Prof. Francesco Musiani (University of 
Bologna, UNIBO)

The salary is €24000/year (48000€).
For the call and to apply online please visit:

https://www.unibz.it/en/home/position-calls/positions-for-academic-staff/5485-chimica-organica-dr-benini?group=18

Best regards

Stefano Benini, Ph.D. Assistant Professor

https://sbenini.people.unibz.it/bioorganic-chemistry-bio-crystallography-laboratory/

“And money wasn't what I had in mind. Oh God, no, what I wanted was to do good. 
I was dying to do something good.” Saul Bellow

“articolo 21 della Costituzione Italiana:  Tutti hanno diritto di manifestare 
liberamente il proprio pensiero con la parola, lo scritto e ogni altro mezzo di 
diffusione.”
*
Bioorganic chemistry and Bio-Crystallography laboratory (B2Cl)
Faculty of Science and Technology, Libera Università di Bolzano
Piazza Università, 5
39100 Bolzano, Italy
Office (room K2.14):  +39 0471 017128
Laboratory (room E.021): +39 0471 017910
Fax: +39 0471 017009
https://sbenini.people.unibz.it/
orcid.org/-0001-6299-888X
Scopus: https://www.scopus.com/authid/detail.uri?authorId=7004187955





To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] am I doing this right?

2021-10-19 Thread James Holton

Thank you Gergely,

Oh, don't worry, I am not concerned about belief. Neither the model nor 
the data care what I believe.


What I am really asking is: what is the proper way to combine weak 
observations?


Right now, in pretty much all structural sciences we are not used to 
doing this, but we are entering an era where we will have to.


I was trying to ask a simple question with the 10x10 pixel patch because 
(as Graeme, Ian and others pointed out) it highlights how the solution 
must also apply to two patches of 50 pixels.  In reality, unfortunately, 
those two patches might not be next to each other and will have 
different Lorentz factors, polarizaiton factors, absorption factors, and 
probably different partiality as well. These values are knowable, but 
they are not integers. The way we currently deal with all this is to 
first convert patches of pixels into an expectation and variance, then 
apply all the corrections, and finally "merge" everything with error 
propagation into simple list of h,k,l,Iobs,sigIobs that we can compare 
to a PDB file.


You are absolutely right that the best thing to do would be fitting a 
model of the whole diffractometer and crystal, structure factors 
included, directly and pixel-by-pixel to the image data.  Some 
colleagues and I managed to do this recently 
(https://doi.org/10.1107/s2052252520013007). It is rather 
computationally expensive, but seems to be working.


I hope this will be a useful tool, but I don't think such an approach 
will ever completely supplant data reduction, as there are many 
advantages to the latter.  But only if you do the statistics right!  
This is why I asked the community so that folks cleverer and more 
experienced than I in such matters (such as yourself) can correct me if 
I'm getting something wrong.  And the community benefits from the 
discussion.


Thank you for your thoughtful and thought-provoking insights!

-James Holton
MAD Scientist


On 10/19/2021 2:05 AM, Gergely Katona wrote:

Dear James,

I am sorry to nitpick, but this is the answer to "what is my belief of expectation 
and variance if I observe a 10x10 patch of pixels with zero counts?" This will 
heavily depend on my model.
When I make predictions like this, my intention is not to replace the data with a 
"new and improved" data that is closer to the Truth and deposit in some 
database from the position of authority.

I would simply use it to validate my model. Well, my model expects the Iobs to 
be 0.01, but in fact it is 0. This may make me slightly worried, but then I 
look at the posterior distribution and I see 0 with highest posterior 
probability so I relax a bit that I do not have to throw out my model outright. 
Still, a better model may be out there.
For a Bayesian the data is fixed and holy, the model may change. And the question rarely manifests 
like that one does not have to spend a lot of time pondering about if a uniform distribution of the 
rate is compatible with my belief in some quantum process. Bayesian folks are pragmatic. Your 
question about "what is my belief about the slope and intercept of a line that is the basis of 
some time-dependent random process given my observations" is more relevant. It is 
straightforward to implement as a Bayesian network to answer this question and it will give you 
predictions that looks deceptively like the data. Here, you only care about your prior belief about 
the magnitude of slope and intercept, the belief about what the rate may be independent of time is 
quite irrelevant and so are the predictions they may make. And I guess you would not intend to 
deposit images that were generated by the predictions of these posterior models and the "new 
and improved data".

Best wishes,

Gergely


Gergely Katona, Professor, Chairman of the Chemistry Program Council
Department of Chemistry and Molecular Biology, University of Gothenburg
Box 462, 40530 Göteborg, Sweden
Tel: +46-31-786-3959 / M: +46-70-912-3309 / Fax: +46-31-786-3910
Web: http://katonalab.eu, Email: gergely.kat...@gu.se

-Original Message-
From: CCP4 bulletin board  On Behalf Of James Holton
Sent: 18 October, 2021 21:41
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] am I doing this right?

Thank you very much for this Kay!

So, to summarize, you are saying the answer to my question "what is the expectation 
and variance if I observe a 10x10 patch of pixels with zero counts?" is:
Iobs = 0.01
sigIobs = 0.01 (defining sigIobs = sqrt(variance(Iobs)))

And for the one-pixel case:
Iobs = 1
sigIobs = 1

but in both cases the distribution is NOT Gaussian, but rather exponential. And 
that means adding variances may not be the way to propagate error.

Is that right?

-James Holton
MAD Scientist



On 10/18/2021 7:00 AM, Kay Diederichs wrote:

Hi James,

I'm a bit behind ...

My answer about the basic question ("a patch of 100 pixels each with zero counts - 
what is the variance?") you ask is the following:

1) we all know the Poisson PDF 

[ccp4bb] webinar gene-to-structure workflow for membrane proteins

2021-10-19 Thread Hans Raaijmakers
Dear all,

Please note our October 20th webinar that showcases a gene-to-structure 
workflow for membrane proteins. Our speakers are

* Michael Liss (GeneArt/Thermo Fisher),
* Jens Frauenfeld (Salipro Biotech) 
* Ieva Drulyte (Thermo Fisher).

The webinar format is interactive, giving you an opportunity to find out about 
every aspect of this workflow. We’re particularly excited to show you a 
structure of apo-CXCR4 and look forward to seeing you on Wednesday:

https://www.labroots.com/webinar/membrane-proteins-gene-cryo-em-structure-thermo-fisher-geneart-salipro-biotech

Best wishes
Hans



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] am I doing this right?

2021-10-19 Thread Kay Diederichs
James, 

I am saying that my answer to "what is the expectation and variance if I 
observe a 10x10 patch of pixels with zero
counts?" is Iobs=0.01 sigIobs=0.01 (and Iobs=sigIobs=1 if there is only one 
pixel) IF the uniform prior applies. I agree with Gergely and others that this 
prior (with its high expectation value and variance) appears unrealistic.

In your posting of Sat, 16 Oct 2021 12:00:30 -0700 you make a calculation of 
Ppix that appears like a more suitable expectation value of a prior to me. A 
suitable prior might then be 1/Ppix * e^(-l/Ppix) (Agostini §7.7.1). The 
Bayesian argument is IIUC that the prior plays a minor role if you do repeated 
measurements of the same value, because you use the posterior of the first 
measurement as the prior for the second, and so on. What this means is that 
your Ppix must play the role of a scale factor if you consider the 100-pixel 
experiment.
However, for the 1-pixel experiment, having a more suitable prior should be 
more important.

best,
Kay




On Mon, 18 Oct 2021 12:40:45 -0700, James Holton  wrote:

>Thank you very much for this Kay!
>
>So, to summarize, you are saying the answer to my question "what is the
>expectation and variance if I observe a 10x10 patch of pixels with zero
>counts?" is:
>Iobs = 0.01
>sigIobs = 0.01 (defining sigIobs = sqrt(variance(Iobs)))
>
>And for the one-pixel case:
>Iobs = 1
>sigIobs = 1
>
>but in both cases the distribution is NOT Gaussian, but rather
>exponential. And that means adding variances may not be the way to
>propagate error.
>
>Is that right?
>
>-James Holton
>MAD Scientist
>
>
>
>On 10/18/2021 7:00 AM, Kay Diederichs wrote:
>> Hi James,
>>
>> I'm a bit behind ...
>>
>> My answer about the basic question ("a patch of 100 pixels each with zero 
>> counts - what is the variance?") you ask is the following:
>>
>> 1) we all know the Poisson PDF (Probability Distribution Function)  P(k|l) = 
>> l^k*e^(-l)/k!  (where k stands for for an integer >=0 and l is lambda) which 
>> tells us the probability of observing k counts if we know l. The PDF is 
>> normalized: SUM_over_k (P(k|l)) is 1 when k=0...infinity is 1.
>> 2) you don't know before the experiment what l is, and you assume it is some 
>> number x with 0<=x<=xmax (the xmax limit can be calculated by looking at the 
>> physics of the experiment; it is finite and less than the overload value of 
>> the pixel, otherwise you should do a different experiment). Since you don't 
>> know that number, all the x values are equally likely - you use a uniform 
>> prior.
>> 3) what is the PDF P(l|k) of l if we observe k counts?  That can be found 
>> with Bayes theorem, and it turns out that (due to the uniform prior) the 
>> right hand side of the formula looks the same as in 1) : P(l|k) = 
>> l^k*e^(-l)/k! (again, the ! stands for the factorial, it is not a semantic 
>> exclamation mark). This is eqs. 7.42 and 7.43 in Agostini "Bayesian 
>> Reasoning in Data Analysis".
>> 3a) side note: if we calculate the expectation value for l, by multiplying 
>> with l and integrating over l from 0 to infinity, we obtain E(P(l|k))=k+1, 
>> and similarly for the variance (Agostini eqs 7.45 and 7.46)
>> 4) for k=0 (zero counts observed in a single pixel), this reduces to 
>> P(l|0)=e^(-l) for a single observation (pixel). (this is basic math; see 
>> also §7.4.1 of Agostini.
>> 5) since we have 100 independent pixels, we must multiply the individual 
>> PDFs to get the overall PDF f, and also normalize to make the integral over 
>> that PDF to be 1: the result is f(l|all 100 pixels are 0)=n*e^(-n*l). (basic 
>> math). A more Bayesian procedure would be to realize that the posterior PDF 
>> P(l|0)=e^(-l) of the first pixel should be used as the prior for the second 
>> pixel, and so forth until the 100th pixel. This has the same result f(l|all 
>> 100 pixels are 0)=n*e^(-n*l) (Agostini § 7.7.2)!
>> 6) the expectation value INTEGRAL_0_to_infinity over l*n*e^(-n*l) dl is 1/n 
>> .  This is 1 if n=1 as we know from 3a), and 1/100 for 100 pixels with 0 
>> counts.
>> 7) the variance is then INTEGRAL_0_to_infinity over (l-1/n)^2*n*e^(-n*l) dl 
>> . This is 1/n^2
>>
>> I find these results quite satisfactory. Please note that they deviate from 
>> the MLE result: expectation value=0, variance=0 . The problem appears to be 
>> that a Maximum Likelihood Estimator may give wrong results for small n; 
>> something that I've read a couple of times but which appears not to be 
>> universally known/taught. Clearly, the result in 6) and 7) for large n 
>> converges towards 0, as it should be.
>> What this also means is that one should really work out the PDF instead of 
>> just adding expectation values and variances (and arriving at 100 if all 100 
>> pixels have zero counts) because it is contradictory to use a uniform prior 
>> for all the pixels if OTOH these agree perfectly in being 0!
>>
>> What this means for zero-dose extrapolation I have not thought about. At 
>> least it prevents 

Re: [ccp4bb] am I doing this right?

2021-10-19 Thread Gergely Katona
Dear James,

I am sorry to nitpick, but this is the answer to "what is my belief of 
expectation and variance if I observe a 10x10 patch of pixels with zero 
counts?" This will heavily depend on my model.
When I make predictions like this, my intention is not to replace the data with 
a "new and improved" data that is closer to the Truth and deposit in some 
database from the position of authority. 

I would simply use it to validate my model. Well, my model expects the Iobs to 
be 0.01, but in fact it is 0. This may make me slightly worried, but then I 
look at the posterior distribution and I see 0 with highest posterior 
probability so I relax a bit that I do not have to throw out my model outright. 
Still, a better model may be out there.
For a Bayesian the data is fixed and holy, the model may change. And the 
question rarely manifests like that one does not have to spend a lot of time 
pondering about if a uniform distribution of the rate is compatible with my 
belief in some quantum process. Bayesian folks are pragmatic. Your question 
about "what is my belief about the slope and intercept of a line that is the 
basis of some time-dependent random process given my observations" is more 
relevant. It is straightforward to implement as a Bayesian network to answer 
this question and it will give you predictions that looks deceptively like the 
data. Here, you only care about your prior belief about the magnitude of slope 
and intercept, the belief about what the rate may be independent of time is 
quite irrelevant and so are the predictions they may make. And I guess you 
would not intend to deposit images that were generated by the predictions of 
these posterior models and the "new and improved data".

Best wishes,

Gergely


Gergely Katona, Professor, Chairman of the Chemistry Program Council
Department of Chemistry and Molecular Biology, University of Gothenburg
Box 462, 40530 Göteborg, Sweden
Tel: +46-31-786-3959 / M: +46-70-912-3309 / Fax: +46-31-786-3910
Web: http://katonalab.eu, Email: gergely.kat...@gu.se

-Original Message-
From: CCP4 bulletin board  On Behalf Of James Holton
Sent: 18 October, 2021 21:41
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] am I doing this right?

Thank you very much for this Kay!

So, to summarize, you are saying the answer to my question "what is the 
expectation and variance if I observe a 10x10 patch of pixels with zero 
counts?" is:
Iobs = 0.01
sigIobs = 0.01 (defining sigIobs = sqrt(variance(Iobs)))

And for the one-pixel case:
Iobs = 1
sigIobs = 1

but in both cases the distribution is NOT Gaussian, but rather exponential. And 
that means adding variances may not be the way to propagate error.

Is that right?

-James Holton
MAD Scientist



On 10/18/2021 7:00 AM, Kay Diederichs wrote:
> Hi James,
>
> I'm a bit behind ...
>
> My answer about the basic question ("a patch of 100 pixels each with zero 
> counts - what is the variance?") you ask is the following:
>
> 1) we all know the Poisson PDF (Probability Distribution Function)  P(k|l) = 
> l^k*e^(-l)/k!  (where k stands for for an integer >=0 and l is lambda) which 
> tells us the probability of observing k counts if we know l. The PDF is 
> normalized: SUM_over_k (P(k|l)) is 1 when k=0...infinity is 1.
> 2) you don't know before the experiment what l is, and you assume it is some 
> number x with 0<=x<=xmax (the xmax limit can be calculated by looking at the 
> physics of the experiment; it is finite and less than the overload value of 
> the pixel, otherwise you should do a different experiment). Since you don't 
> know that number, all the x values are equally likely - you use a uniform 
> prior.
> 3) what is the PDF P(l|k) of l if we observe k counts?  That can be found 
> with Bayes theorem, and it turns out that (due to the uniform prior) the 
> right hand side of the formula looks the same as in 1) : P(l|k) = 
> l^k*e^(-l)/k! (again, the ! stands for the factorial, it is not a semantic 
> exclamation mark). This is eqs. 7.42 and 7.43 in Agostini "Bayesian Reasoning 
> in Data Analysis".
> 3a) side note: if we calculate the expectation value for l, by 
> multiplying with l and integrating over l from 0 to infinity, we 
> obtain E(P(l|k))=k+1, and similarly for the variance (Agostini eqs 
> 7.45 and 7.46)
> 4) for k=0 (zero counts observed in a single pixel), this reduces to 
> P(l|0)=e^(-l) for a single observation (pixel). (this is basic math; see also 
> §7.4.1 of Agostini.
> 5) since we have 100 independent pixels, we must multiply the individual PDFs 
> to get the overall PDF f, and also normalize to make the integral over that 
> PDF to be 1: the result is f(l|all 100 pixels are 0)=n*e^(-n*l). (basic 
> math). A more Bayesian procedure would be to realize that the posterior PDF 
> P(l|0)=e^(-l) of the first pixel should be used as the prior for the second 
> pixel, and so forth until the 100th pixel. This has the same result f(l|all 
> 100 pixels are 0)=n*e^(-n*l) (Agostini § 

Re: [ccp4bb] am I doing this right?

2021-10-19 Thread Rasmus Fogh

Dear All,

This is way over my head, but did Jaynes not say something about getting 
different results depending whether you assume both outcomes are 
possible? I think his example was that if you observe the sun rise for 
100 days in a row, the probability that it will rise on day 101 is no 
greater than about 99% - *provided* you can assume that there may be 
some days where the sun does not rise. In your case could it make a 
difference if you need to consider the possibility that the synchrotron 
is powered down or the shutter is closed, so there will never be any 
photons?


Yours,
Rasmus

On 18/10/2021 20:48, James Holton wrote:
HDF5 is still "framing", but using better compression than the "byte 
offset" one implemented in Pilatus CBFs, which has a minimum of one byte 
per pixel. Very fast, but not designed for near-blank images.


Assuming entropy-limited compression the ultimate data rate is the 
number of photons/s hitting the detector multiplied by log2(Npix) where 
Npix is the number of pixels. The reason its log2() is because that's 
the number of bits needed to store the address of which pixel got the 
photon, and since the arrival of each photon is basically random further 
compression is generally not possible without loss of information.  
There might be some additional bits about the time interval, but it 
might be more efficient to store that implicitly in the framing. As long 
as storing "no photons" only takes up one bit that would probably be 
more efficient.


So, for a 100 micron thick sample, flux = 1e12 photons/s and ~4000 
pixels you get ~3.4 GB/s of perfectly and losslessly compressed data.  
Making it smaller than that requires throwing away information.


I'm starting to think this might be the best prior. If you start out 
assuming nothing (not even uniform), then the variance of 0 photons may 
well be infinite. However, it is perhaps safe to assume that the dataset 
as a whole as at least one photon in it. And then if you happen to know 
the whole data set contains N photons and you have F images of Q pixels, 
then maybe a reasonable prior distribution is Poissonian with 
mean=variance= N/F/Q photons/pixel ?


-James Holton
MAD Scientist

On 10/17/2021 11:30 PM, Frank von Delft wrote:
Thanks, I learnt two things now - one of which being that I'm credited 
with coining that word!  Stap me vittals...


If it's single photon events you're after, isn't it quantum statistics 
where you need to go find that prior?  (Or is that what you're doing 
in this thread - I wouldn't be able to tell.)


Also:  should the detectors change how they read out things, then?  
Just write out the events with timestamp, rather than dumping all 
pixels all the time into these arbitrary containers called "image".  
Or is that what's already happening in HDF5 (which I don't understand 
one bit, I should add).


Frank




On 17/10/2021 18:12, James Holton wrote:


Well Frank, I think it comes down to something I believe you were the 
first to call "dose slicing".


Like fine phi slicing, collecting a larger number of weaker images 
records the same photons, but with more information about the sample 
before it dies. In fine phi slicing the extra information allows you 
to do better background rejection, and in "dose slicing" the extra 
information is about radiation damage. We lose that information when 
we use longer exposures per image, and if you burn up the entire 
useful life of your crystal in one shot, then all information about 
how the spots decayed during the exposure is lost. Your data are also 
rather incomplete.


How much information is lost? Well, how much more disk space would be 
taken up, even after compression, if you collected only 1 photon per 
image?  And kept collecting all the way out to 30 MGy in dose? That's 
about 1 million photons (images) per cubic micron of crystal.  So, 
I'd say the amount of information lost is "quite a bit".


But what makes matters worse is that if you did collect this data set 
and preserved all information available from your crystal you'd have 
no way to process it. This is not because its impossible, its just 
that we don't have the software. Your only choice would be to go find 
images with the same "phi" value and add them together until you have 
enough photons/pixel to index it. Once you've got an indexing 
solution you can map every photon hit to a position in reciprocal 
space as well as give it a time/dose stamp. What do you do with 
that?  You can do zero-dose extrapolation, of course!  Damage-free 
data! Wouldn't that be nice. Or can you?  The data you will have in 
hand for each reciprocal-space pixel might look something like:
tic tic .. tic . tic ... tic tictic ... 
tictic.


So. Eight photons.  With time-of-arrival information.  How do you fit 
a straight line to that?  You could "bin" the data or do some kind of 
smoothing thing, but then you are losing information again. Perhaps 
also 

[ccp4bb] Faculty position (W3) in cryo-EM of membrane proteins

2021-10-19 Thread Roy Lancaster

Dear all,

we have a new tenured faculty position at the W3 level (full  
professor) available at Saarland University’s Faculty of Medicine in  
Homburg, Germany. This has been funded through a new initiative to  
further strengthen the cooperative ties between Saarland University  
and the Helmholtz Institute for Pharmaceutical Research Saarland  
(HIPS). In particular, we are looking for someone with an established  
record in the cryo-EM of membrane proteins yielding information  
relevant to  structure-based drug design. The University is prepared  
to invest in new cryo-EM-instrumentation and laboratory space  
to accommodate the new appointee.


Details of the position are available here:
https://www.uni-saarland.de/fileadmin/upload/verwaltung/stellen/Wissenschaftler/W1972_EN_W3-Structural_Biology_.pdf

Please feel free to contact me, if you would like to discuss any  
aspect of the position.


Thank you and best wishes

Roy Lancaster
--
Prof. Dr. C. Roy D. Lancaster
Lehrstuhl für Strukturbiologie
Medizinische Fakultät
Universität des Saarlandes
Gebäude 60
D-66421 Homburg (Saar) Germany
Tel: +49 6841 1626235
Fax: +49 6841 1626251
E-mail: roy.lancas...@structural-biology.eu
http://strukturbiologie.uni-saarland.de/

Zentrum für Human- und Molekularbiologie (ZHMB)
http://zhmb.uni-saarland.de



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/