Re: [ccp4bb] [RANT] Publication Data Formats
On Nov 16, 2010, at 10:57 PM, Ethan Merritt wrote: Bleah. Virtually none of those are human-readable, no matter what the wikipedia page may choose to put as a heading title. What kind of data are you dealing with? PDF would indeed be an odd format for diffraction images, but it would be miles better than most of the formats on the list you point to. The operative word is dataset, which is a subset of all things data. A dataset should be in a format that 1. can be validated 2. is structured 3. is machine readable A pdf file guarantees none of the above. It is a presentation format and is not optimized for validating, structuring, or ensuring the machine readability of the data that it might contain. I'm not advocating for any particular serialization format. So this isn't about JSON v. XML religion wars. This is JSON or XML versus a file format that is basically designed to ferry presentation information between printers or computer screens. James
Re: [ccp4bb] expression of Cys-rich small protein
We are trying to express for structural studies a 257 AA eukaryotic intracellular [...] As you describe about your protein, I guest your protein may required disulfide bonds to be folded correctly As Laurie described her protein is intra-cellular, so it will not need disulfide bonds, unless it has some really weird compartmentalization. To add to the nice tricks about E.coli strains forming disulfide bonds, if your institute is well-equipped for cell culture, just secret it from HEK293 cells. Its easier and cheaper than what most people think - all you need is a good person to show you around the basic tricks, a hood, and an incubator, which most places have anyway. A. On Nov 16, 2010, at 18:22, van dat nguyen wrote: Hi Laurie, What E. coli strain did you use? As you describe about your protein, I guest your protein may required disulfide bonds to be folded correctly. E. coli cytoplasm is a reduced environment, which is not suitable to make disulfide bonded proteins. to solve this problem it is recommended to use E. coli strain which a less reduced cytoplasm, ex SHuffle® T7 strain (from NEB) or Rosseta Garmi strain, or express rich Cysteine proteins in the periplasm of E. coli. using those strain with co-expression of Protien Disulfide Isomerase will be good to try. However, Recently our group have made a very good system in E. coli to expressed disulfide bonded protein in E. coli cytoplasm, please have a look at this paper: http://www.ncbi.nlm.nih.gov/pubmed/20836848 A better system will be published soon. Best Wishes, Dat On Tue, Nov 16, 2010 at 6:13 PM, Laurie Betts laurie.betts0...@gmail.com wrote: All - We are trying to express for structural studies a 257 AA eukaryotic intracellular (also possibly nuclear) protein (predicted to be single domain all-helical) that has 12 Cysteines. No known metal- binding function not that it couldn't happen. So far (E. coli) it expressed solubly as MBP fusion (with an N-terminal region deleted predicted disordered) until cleavage of MBP, then it's not soluble, including detergents added. THe MBP fusion is usually soluble aggregate so we assume that our part is not folded right. We have so far assumed it needs a lot of reducing agent (5 mM DTT or TCEP).Thinking of trying chaperones and insect cells next. Any experience out there that might help? Mostly I wonder about all the cysteines. Don't really know if that is the problem. Laurie Betts P please don't print this e-mail unless you really need to Anastassis (Tassos) Perrakis, Principal Investigator / Staff Member Department of Biochemistry (B8) Netherlands Cancer Institute, Dept. B8, 1066 CX Amsterdam, The Netherlands Tel: +31 20 512 1951 Fax: +31 20 512 1954 Mobile / SMS: +31 6 28 597791
Re: [ccp4bb] [RANT] Publication Data Formats
Dear James, On Wed, Nov 17, 2010 at 12:12:10AM -0800, James Stroud wrote: [...] The operative word is dataset, which is a subset of all things data. A dataset should be in a format that 1. can be validated 2. is structured 3. is machine readable What do these items have to do with a journal or an article therein? Why should a journal be concerned with conserving data? Tim [...] -- -- Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen phone: +49 (0)551 39 22149 GPG Key ID = A46BEE1A signature.asc Description: Digital signature
[ccp4bb] Heavy atom salt at low pH
Hi All, Sorry for a non CCP4 question. I have been trying to phase a protein structure using different heavy atom derivatives. The problem is the crystallization pH is very low (from 2.8 to 3.5). I will be highly benefited if anybody kindly suggests me the possible heavy atom salts to try sincerely Debajyoti
Re: [ccp4bb] [RANT] Publication Data Formats
On Nov 17, 2010, at 1:24 AM, Tim Gruene wrote: On Wed, Nov 17, 2010 at 12:12:10AM -0800, James Stroud wrote: [...] The operative word is dataset, which is a subset of all things data. A dataset should be in a format that 1. can be validated 2. is structured 3. is machine readable What do these items have to do with a journal or an article therein? Why should a journal be concerned with conserving data? For Posterity. I did a 5 minute search for an example, and the best I could do with the patience I had was this: http://onlinelibrary.wiley.com/doi/10.1002/pmic.200700038/suppinfo You'll see in the available PDF file Tables S1-S3. Were I to look for any significant amount of time, I could find much more egregious examples. For this particular example, your eyes may deceive you into thinking that the PDF file can be parsed and the data represented in the tables extracted with a script of some sort. But, if you have the patience, go to Table S3 and start selecting text at Accession Number in the heading. You'll find that the selection goes down that column only about half way and then begins selecting at the next column, Swissprot Identifier. So basically, the data represented in these tables is useless for any computational analysis by the end user except for (1) those who wish to type the data in by hand or (2) individuals like Dr. Merritt who can presumably just read the data and do the analysis in cranio. James
Re: [ccp4bb] autoSHARP/SHELXD : Scatter plot program for OS X
Dear Raspudin, you should already have that binary in /where/ever/sharp/helpers/darwin/plotmtv Cheers Clemens On Tue, Nov 16, 2010 at 08:52:10PM +0100, Raspudin wrote: Dear all, Could anyone please suggest me a program to open scatter plots ( .mtv file format) from SHELXD/autoSHARP run in OS X Platform. Thank you, Raspudin -- *** * Clemens Vonrhein, Ph.D. vonrhein AT GlobalPhasing DOT com * * Global Phasing Ltd. * Sheraton House, Castle Park * Cambridge CB3 0AX, UK *-- * BUSTER Development Group (http://www.globalphasing.com) ***
Re: [ccp4bb] [RANT] Publication Data Formats
On Nov 17, 2010, at 1:42 AM, James Stroud wrote: You'll find that the selection goes down that column only about half way and then begins selecting at the next column, Swissprot Identifier. I forgot to mention that the point of this sentence is that the semantics of the data, which is represented by the visual appearance of the document, is scrambled relative to the structure of the data within in the document--rendering the data useless to any reasonable parser. This scrambling is the rule rather than the exception for PDF files. James
Re: [ccp4bb] [RANT] Publication Data Formats
Hi, On Wed, Nov 17, 2010 at 01:42:40AM -0800, James Stroud wrote: http://onlinelibrary.wiley.com/doi/10.1002/pmic.200700038/suppinfo You'll see in the available PDF file Tables S1-S3. Were I to look for any significant amount of time, I could find much more egregious examples. For this particular example, your eyes may deceive you into thinking that the PDF file can be parsed and the data represented in the tables extracted with a script of some sort. But, if you have the patience, go to Table S3 and start selecting text at Accession Number in the heading. You'll find that the selection goes down that column only about half way and then begins selecting at the next column, Swissprot Identifier. Pick a better PDF viewer: with my version of xpdf (on Ubuntu 10.04) I can easily select that table over three pages and get a reasonably good looking ASCII representation of it. Takes about 10 seconds ... Acrobat reader is not very good for selecting text in PDF files. I don't know about others, but xpdf is really good at it. Cheers Clemens -- *** * Clemens Vonrhein, Ph.D. vonrhein AT GlobalPhasing DOT com * * Global Phasing Ltd. * Sheraton House, Castle Park * Cambridge CB3 0AX, UK *-- * BUSTER Development Group (http://www.globalphasing.com) ***
Re: [ccp4bb] [RANT] Publication Data Formats
On Nov 17, 2010, at 2:01 AM, Tim Gruene wrote: the supplement you are referring to does not seem to be an appropriate example. If I use pdftotext the result is the attached file pro200700038_s.txt from which I could extract the information of table S3 column wise (except for the Protein names which have no field separators and contain an uncertain number of words per entry) So basically if you, manually adjust the field separators after carefully inspecting the data after trying several different ways to convert it, etc., you can get the data out. I concede. It's better for numerous people who want to use the data to spend effort reverse-engineering the tables than the one guy writing the paper to engineer them properly in the first place. What was I thinking? James
Re: [ccp4bb] [RANT] Publication Data Formats
Dear all, Irrespective of whether or not one program or the other can extract the information James was looking for I suppose everybody agrees that the suggested methods are fairly awkward and cumbersome. And after all this was not the main question posed by James. The 'torches and march' you were referring to would probably be to raise a couple of people to sign up a list with which you can approach the journal of your choice in order to change their policy. If that does not work you can try to convince people not to use that journal for publication until they change their minds. In the long term, though, it should not be the journals' task to archive data but a central data base corresponding to the PDB, (I)CSD, http://arxiv.org/, etc. So you would have to sit down with people from the corresponding field and discuss how to set up such a data bank. Once this is established you can still make use of the second paragraph of my email in order to press journals to only accept articles for publications after the data have been submitted at the corresponding data base. Tim On Wed, Nov 17, 2010 at 12:12:10AM -0800, James Stroud wrote: On Nov 16, 2010, at 10:57 PM, Ethan Merritt wrote: Bleah. Virtually none of those are human-readable, no matter what the wikipedia page may choose to put as a heading title. What kind of data are you dealing with? PDF would indeed be an odd format for diffraction images, but it would be miles better than most of the formats on the list you point to. The operative word is dataset, which is a subset of all things data. A dataset should be in a format that 1. can be validated 2. is structured 3. is machine readable A pdf file guarantees none of the above. It is a presentation format and is not optimized for validating, structuring, or ensuring the machine readability of the data that it might contain. I'm not advocating for any particular serialization format. So this isn't about JSON v. XML religion wars. This is JSON or XML versus a file format that is basically designed to ferry presentation information between printers or computer screens. James -- -- Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen phone: +49 (0)551 39 22149 GPG Key ID = A46BEE1A signature.asc Description: Digital signature
[ccp4bb] - subtracting diffraction images?
Hi there, I am trying to compare the difference between two diffraction images from a Mar detector before and after irradiation. At present I have used the marcombine program to subtract the diffraction images, but I don't have any statistical information. Can anyone suggest either another good program to observe the differences in the images or a way to obtain some statistics about the subtracted image? Cheers, Mark
Re: [ccp4bb] Heavy atom salt at low pH
Hi Debajyoti, For my low pH phasing (pH 3) the thing that worked for me was the Magic Triangle. It works best at Cu K alpha (i.e. home source). There is a Mad Triangle that works at synchrotron sources as well. You can get it all made up from Hampton or Jena Biosciences but you can also buy the chemical direct from Fisher or Sigma and make it up in LiOH or KOH for super cheap. You can check out the papers here: http://www.ncbi.nlm.nih.gov/pubmed/19020356 or here: http://www.ncbi.nlm.nih.gov/pubmed/19851024 Best of luck, Katherine On Wed, Nov 17, 2010 at 4:31 AM, Debajyoti Dutta debajyoti_dutt...@rediffmail.com wrote: Hi All, Sorry for a non CCP4 question. I have been trying to phase a protein structure using different heavy atom derivatives. The problem is the crystallization pH is very low (from 2.8 to 3.5). I will be highly benefited if anybody kindly suggests me the possible heavy atom salts to try sincerely Debajyoti http://sigads.rediff.com/RealMedia/ads/click_nx.ads/www.rediffmail.com/signatureline@middle?
Re: [ccp4bb] Digital microscope camera
On 11/16/10 19:44, Julian Nomme wrote: Dear all, We are investigating the possibility to upgrade from an old 35mm film camera to a digitalcamera for taking crystal pictures through a microscope... We are using an internet webcam for that purpose, the IQinvision IQeye3 (probably no longer available, it must be about 8 years old). It has its own network port and embedded web server. This allows us to connect to it from any computer on the lab network equipped with a browser. This circumvents the whole argument over which platforms are supported, which is nice because usually that question revolves around Windows vs. Mac, and some of us use Linux. Another possibility you might consider is a digital microscope. In this setup, a microscope is built around a camera, with no eyepieces. The image is viewed on a computer monitor. That means the optics can be better optimised for the camera, and new features may be enabled, such as a scale bar overlaid on the images (since the cameras knows the zoom setting). I don't know if any manufacturer has ventured yet into digital microscopes with stereovision, or that are suitable for crystallography. Cheers, -- === All Things Serve the Beam === David J. Schuller modern man in a post-modern world MacCHESS, Cornell University schul...@cornell.edu
[ccp4bb] CCP4 Study Weekend - January 5th-7th 2011
Dear All, A quick reminder to you all about this coming January's CCP4 Study Weekend entitled Model Building Refinement Validation (January 5th-7th 2011). The deadline for the first round of registrations is Monday the 22nd of November after which the registration price will increase from £210 to £260. So please get your registration in before then. This time the meeting will be held at Warwick University in the UK. For more details and to register please see the Study Weekend website at: http://www.cse.scitech.ac.uk/events/CCP4_2011/ We hope to see you there. Best wishes, Ronan Ronan Keegan CCP4 Group
Re: [ccp4bb] Heavy atom salt at low pH
Crystals of tendamistat were grown from hydrochloric acid and solved by MIR. I do not recall anything special about the heavy atom soaks, so try everything in your heavy atom closet. What have you tried that has not worked? _ From: CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] On Behalf Of Debajyoti Dutta Sent: Wednesday, November 17, 2010 3:32 AM To: CCP4BB@JISCMAIL.AC.UK Subject: [ccp4bb] Heavy atom salt at low pH Hi All, Sorry for a non CCP4 question. I have been trying to phase a protein structure using different heavy atom derivatives. The problem is the crystallization pH is very low (from 2.8 to 3.5). I will be highly benefited if anybody kindly suggests me the possible heavy atom salts to try sincerely Debajyoti
Re: [ccp4bb] [RANT] Publication Data Formats
Dear Colleagues, In trying to perhaps see some level of virtue in the PNAS approach one can imagine that not all deposited data can be well characterised in a way that is easy for computers to parse automatically. In such circumstances, a deposited PDF may be better than nothing at all. As yet, not all journal publishing platforms can or will serve a variety of different file formats, which is probably in part why PDFs might be used, since they are easy to generate. That said I agree with previous postings today that Journals should encourage authors to supply data in well-characterised machine-readable formats ie to the extent that this is feasible. For small molecule crystal structures within IUCr Journal articles, and associated crystal structure data sets, this is straightforward, since variants of the IUCr's CIF standard cover diffraction images, structure factors and refined coordinates and ADPs. For protein crystal structures, as this CCP4bb well knows, articles are accompanied by RCSB deposition of coordinates and structure factors. Nevertheless, it would be good to see research scientists increasing pressure on journals to deposit and disseminate supplementary data in machine-readable formats, since that would in the long run greatly increase the value of the deposited material. An open-access paper I recently published with a colleague from the IUCr office discusses the importance of fully integrating experimental data with the finished research analysis, to complete the scientific record. See: Helliwell, J. R. McMahon, B. (2010) The record of experimental science: archiving data with literature. Information Services and Use 30, 31-37; DOI: 10.3233/ISU-2010-0609. Many of the things we discuss in that article are equally relevant to supplementary information as discussed in this thread. Yours sincerely, John Professor John R Helliwell DSc On Wed, Nov 17, 2010 at 6:39 AM, James Stroud xtald...@gmail.com wrote: I was reading the PNAS author guidelines and I came across this gem: Datasets: Supply Excel (.xls), RTF, or PDF files. This file type will be published in raw format and will not be edited or composed. Did I read those last two file formats correctly? I have actually came across a dataset in supplementary information that was several dozen pages of PDF. It was effectively impossible to extract the data from this document. (I can dig it up if pressed, probably.) I had no idea that the authors may have been encouraged to submit their data like that. Does a premiere scientific journal actually request data to be in PDF format? I can think of dozens of other formats that would be more fitting. They are summarized here: http://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats What is the scholarly equivalent to a torch and pitchfork march and how can we organize such a march to encourage journals to require proper serialization formats for datasets in supplementary info? James P.S. I am aware that it is better to submit data to a dedicated repository, but let's consider those cases where research produces data for which there is not yet a dedicated repository.
[ccp4bb] Citations in supplementary material
Dear All, I would like to bring to your attention the recent Editorial in Acta Cryst D (http://journals.iucr.org/d/issues/2010/12/00/issconts.html), which highlights the long-standing issue of under-citation of papers published in the IUCr journals. The Editorial, having looked at the papers published in 2009 in Nature, Science, Cell and PNAS, concluded: 'almost half of all references to publications in IUCr journals end up being published in the supplementary material only... Not only does this mean that the impact factor of IUCr journals should be higher, but also that the real overall numbers of citations of methods papers are much higher than what is reported, for instance, by the Web of Science' Although this topic may seem to concern mostly methods developers, I think the whole research community will only benefit from more fair credit that we all give to our colleagues via referencing their publications. What do you think? Victor
Re: [ccp4bb] Heavy atom salt at low pH
You may want to try the Heavy-atom Database System http://hatodas.harima.riken.go.jp/ Version 2 (http://hatodas.harima.riken.go.jp/hatodas_v2/query/queryheavy.jsp) allows for pH as a search option. http://hatodas.harima.riken.go.jp/hatodas_v2/query/queryheavy.jsp Hope that helps, Sean
Re: [ccp4bb] Citations in supplementary material
Thank you, Victor ! It is my many-years complain, and I tried to point it out orally at a number of conferences many times concerning both the publications and presenting results in posters. Very-very often we see a structure as given with no comments which (often very multiple and sophisticate!) computer and theoretical tools were exploited to obtain it. After this there is no surprise that : - developers of methods are poorly supported except of a few big factories (my impression; maybe I'm wrong ?) - scientific administration believes that in crystallography already everything is done, that it is no more a science but a pure routine Obviously, there are barriers beyond which citation are useless, but at least some basic references must be always done. I think that the group leaders should play the major role to correct the situation described by Victor, to make always proper citations and to stop cutting the branch on which we all together are sitting on. With best wishes, Sacha Urzhumtsev -Message d'origine- De : CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] De la part de Victor Lamzin Envoyé : mercredi 17 novembre 2010 17:06 À : CCP4BB@JISCMAIL.AC.UK Objet : [ccp4bb] Citations in supplementary material Dear All, I would like to bring to your attention the recent Editorial in Acta Cryst D (http://journals.iucr.org/d/issues/2010/12/00/issconts.html), which highlights the long-standing issue of under-citation of papers published in the IUCr journals. The Editorial, having looked at the papers published in 2009 in Nature, Science, Cell and PNAS, concluded: 'almost half of all references to publications in IUCr journals end up being published in the supplementary material only... Not only does this mean that the impact factor of IUCr journals should be higher, but also that the real overall numbers of citations of methods papers are much higher than what is reported, for instance, by the Web of Science' Although this topic may seem to concern mostly methods developers, I think the whole research community will only benefit from more fair credit that we all give to our colleagues via referencing their publications. What do you think? Victor
Re: [ccp4bb] Citations in supplementary material
Given that an increasing amount of material is going into supplementary data, it would be better if the citation indexers could be persuaded to count supplementary references. I see no reason why they shouldn't Phil On 17 Nov 2010, at 16:06, Victor Lamzin wrote: Dear All, I would like to bring to your attention the recent Editorial in Acta Cryst D (http://journals.iucr.org/d/issues/2010/12/00/issconts.html), which highlights the long-standing issue of under-citation of papers published in the IUCr journals. The Editorial, having looked at the papers published in 2009 in Nature, Science, Cell and PNAS, concluded: 'almost half of all references to publications in IUCr journals end up being published in the supplementary material only... Not only does this mean that the impact factor of IUCr journals should be higher, but also that the real overall numbers of citations of methods papers are much higher than what is reported, for instance, by the Web of Science' Although this topic may seem to concern mostly methods developers, I think the whole research community will only benefit from more fair credit that we all give to our colleagues via referencing their publications. What do you think? Victor
[ccp4bb] EMDataBank.org News -- Change in EMDB Hold Policy for Maps
17-November-2010 Announcement - Change in EMDB Hold Policy for Maps Effective January 2011, the option to hold EM map volumes for two years before release to the public will no longer be available to EM Data Bank depositors. This policy change implements a recent recommendation of the EMDB Advisory Committee and reflects the strong support in the community for elimination of long hold periods for scientific data. EMDB depositors will continue to have the following options for release of maps and associated masks, structure factors, and/or layer-line data: release immediately, hold until publication, or hold for 1 year. These options are the same that apply for atomic models and associated experimental data deposited in the Protein Data Bank (PDB), thus ensuring consistency of the hold policies of these closely related core resources of biomacromolecular structure data. The full policy can be viewed at http://emdatabank.org/hold_policy.html. Please contact us at h...@emdatabank.org with any questions or comments about this policy change. The EMDataBank.org Team
[ccp4bb] Off Topic - Nickel Column
I have a His-tagged protein which I am coexpressing with it's binding partner to prevent proteolysis. Once on the Nickel column I can remove 80% of the partner by flushing 2l of 1.3M NaCl solution buffered at pH 8.5 overnight. However the last 20% is difficult to remove, even if I reload the Nickel column and flush a further 2l of salt solution. I am wondering if I can increase the pH to 9.0 or 9.5. It should not effect the binding of His for the Nickel as the His-tag has to be deprotonated to bind, though will it causing stripping of the Nickel? Thanks Dan
Re: [ccp4bb] Citations in supplementary material
Dear Victor, I strongly support the stance that is in the Acta D Editorial. Manfred Weiss worked very hard assembling those details and over quite some time; he deserves our thanks. Greetings, John On Wed, Nov 17, 2010 at 4:06 PM, Victor Lamzin vic...@embl-hamburg.de wrote: Dear All, I would like to bring to your attention the recent Editorial in Acta Cryst D (http://journals.iucr.org/d/issues/2010/12/00/issconts.html), which highlights the long-standing issue of under-citation of papers published in the IUCr journals. The Editorial, having looked at the papers published in 2009 in Nature, Science, Cell and PNAS, concluded: 'almost half of all references to publications in IUCr journals end up being published in the supplementary material only... Not only does this mean that the impact factor of IUCr journals should be higher, but also that the real overall numbers of citations of methods papers are much higher than what is reported, for instance, by the Web of Science' Although this topic may seem to concern mostly methods developers, I think the whole research community will only benefit from more fair credit that we all give to our colleagues via referencing their publications. What do you think? Victor -- Professor John R Helliwell DSc
Re: [ccp4bb] Citations in supplementary material
Another unfortunate aspect of this sort of editorial policy is that many of these papers contain almost no technical information at all, except for the supplement. I've started to avoid using Nature papers for class discussions becuase they leave the students so puzzled, and with a glossiness-is-all-that-matters idea of science. = Phoebe A. Rice Dept. of Biochemistry Molecular Biology The University of Chicago phone 773 834 1723 http://bmb.bsd.uchicago.edu/Faculty_and_Research/01_Faculty/01_Faculty_Alphabetically.php?faculty_id=123 http://www.rsc.org/shop/books/2008/9780854042722.asp Original message Date: Wed, 17 Nov 2010 17:12:26 + From: CCP4 bulletin board CCP4BB@JISCMAIL.AC.UK (on behalf of John R Helliwell jrhelliw...@gmail.com) Subject: Re: [ccp4bb] Citations in supplementary material To: CCP4BB@JISCMAIL.AC.UK Dear Victor, I strongly support the stance that is in the Acta D Editorial. Manfred Weiss worked very hard assembling those details and over quite some time; he deserves our thanks. Greetings, John On Wed, Nov 17, 2010 at 4:06 PM, Victor Lamzin vic...@embl-hamburg.de wrote: Dear All, I would like to bring to your attention the recent Editorial in Acta Cryst D (http://journals.iucr.org/d/issues/2010/12/00/issconts.html), which highlights the long-standing issue of under-citation of papers published in the IUCr journals. The Editorial, having looked at the papers published in 2009 in Nature, Science, Cell and PNAS, concluded: 'almost half of all references to publications in IUCr journals end up being published in the supplementary material only... Not only does this mean that the impact factor of IUCr journals should be higher, but also that the real overall numbers of citations of methods papers are much higher than what is reported, for instance, by the Web of Science' Although this topic may seem to concern mostly methods developers, I think the whole research community will only benefit from more fair credit that we all give to our colleagues via referencing their publications. What do you think? Victor -- Professor John R Helliwell DSc
Re: [ccp4bb] Citations in supplementary material
Supplementary info seems to me to be a double-edged sword--I just read a Nature article that had 45 pages of supplementary info. This means that you get a lot more for your money, but all of the methods and their references go by the wayside if you are reading just the paper. Why not have papers be as long as the authors want, now that almost everything is internet-based? It would make the papers much more organized overall, and would obviate the reference issue mentioned in this thread. To avoid them being too too long, reviewers could object to long-windedness etc. But, it would definitely make for a more complete lab notebook of the scientific community, assuming that that is what we are after. Incidentally, I have been curious in the past why journals are not going out the window themselves--why not have individual labs just post their most recent data and interpretations on their own websites, with a comments section perhaps? (I know there are about a thousand cynical reasons why not...) One could even have a place for reliability rating or impact rating on each new chunk of data. Anyway, it would be much more like a real-time, public lab notebook, and would make interaction much faster, and cut out the publishing middlemen. JPK On Wed, Nov 17, 2010 at 11:48 AM, Phoebe Rice pr...@uchicago.edu wrote: Another unfortunate aspect of this sort of editorial policy is that many of these papers contain almost no technical information at all, except for the supplement. I've started to avoid using Nature papers for class discussions becuase they leave the students so puzzled, and with a glossiness-is-all-that-matters idea of science. = Phoebe A. Rice Dept. of Biochemistry Molecular Biology The University of Chicago phone 773 834 1723 http://bmb.bsd.uchicago.edu/Faculty_and_Research/01_Faculty/01_Faculty_Alphabetically.php?faculty_id=123 http://www.rsc.org/shop/books/2008/9780854042722.asp Original message Date: Wed, 17 Nov 2010 17:12:26 + From: CCP4 bulletin board CCP4BB@JISCMAIL.AC.UK (on behalf of John R Helliwell jrhelliw...@gmail.com) Subject: Re: [ccp4bb] Citations in supplementary material To: CCP4BB@JISCMAIL.AC.UK Dear Victor, I strongly support the stance that is in the Acta D Editorial. Manfred Weiss worked very hard assembling those details and over quite some time; he deserves our thanks. Greetings, John On Wed, Nov 17, 2010 at 4:06 PM, Victor Lamzin vic...@embl-hamburg.de wrote: Dear All, I would like to bring to your attention the recent Editorial in Acta Cryst D (http://journals.iucr.org/d/issues/2010/12/00/issconts.html), which highlights the long-standing issue of under-citation of papers published in the IUCr journals. The Editorial, having looked at the papers published in 2009 in Nature, Science, Cell and PNAS, concluded: 'almost half of all references to publications in IUCr journals end up being published in the supplementary material only... Not only does this mean that the impact factor of IUCr journals should be higher, but also that the real overall numbers of citations of methods papers are much higher than what is reported, for instance, by the Web of Science' Although this topic may seem to concern mostly methods developers, I think the whole research community will only benefit from more fair credit that we all give to our colleagues via referencing their publications. What do you think? Victor -- Professor John R Helliwell DSc
[ccp4bb] Postdoc position at the University of Bayreuth, Germany
Postdoc Position in Protein Crystallography A postdoctoral research position in protein crystallography is available in the group of Clemens Steegborn at the University of Bayreuth, Germany. Research in the laboratory is focused on understanding the molecular signaling mechanisms involved in aging processes and disease. In particular, we study deacetylases of the Sirtuin family and the cyclic nucleotide signaling network. We use protein crystallography, combined with biochemical and enzymological methods and bioinformatics approaches, to obtain a molecular understanding of these interconnected signaling systems and to develop compounds for their modulation. We are looking for a Postdoc with experience in protein x-ray crystallography to join our team studying the regulation of Sirtuins. Our laboratory at the University of Bayreuth offers access to state-of-the art equipment for protein biochemistry and crystallography, and a stimulating environment for research in structural biology, with two collaborating protein crystallography units (Steegborn lab and Blankenfeldt lab). We further have access to state-of the art synchrotron beamlines, and strongly interact with other labs within the Research Institute for Biomacromolecules (spectroscopy, NMR, bioinformatics) and the Department of Chemistry and Biology (cell biology, genetics, synthetic chemistry). Our lab offers excellent research opportunities and a stimulating environment for research in Structural Biology. The ideal candidate is a highly motivated Ph.D. with an interest in medically relevant questions. Experience in protein purification crystallization and x-ray structure analysis is ABSOLUTELY MANDATORY. Additional experience in molecular biology would be an asset. Applications (including CV, research experience, and at least two names and contact information for references) should be sent, preferably per email as PDF attachment, to clemens.steegb...@uni-bayreuth.de mailto:clemens.steegb...@rub.de Evaluation of applications will start by December 1st. --- Prof. Dr. Clemens Steegborn University of Bayreuth Dept. Biochemistry, NW I Universitaetsstr. 30 95447 Bayreuth, Germany phone: ++49 0921 / 55 - 2421 fax: ++49 0921 / 55 - 2432 email: mailto:clemens.steegb...@uni-bayreuth.de clemens.steegb...@uni-bayreuth.de web: http://www.biochemie.uni-bayreuth.de www.biochemie.uni-bayreuth.de
Re: [ccp4bb] [RANT] Publication Data Formats
Alas, 95% of science seems to be converting data from one file format to another. I hate all file formats. There are FAR FAR too many of them! For example, the suffix PDF can also mean Powder Diffraction File, which, believe it or not, IS a very common machine-readable scientific data format. Do you have a program that can read it? I thought so. Every time I encounter a new file format (which seems to happen every time I download a new computer program), I have to then go and figure out how to convert it into text and write an awk program for parsing it into something I can use. Strangely, this used to bother me a lot more than it does now. Perhaps this is because I have resigned myself to the fact that there is nothing I can do to stop the process of file format proliferation. That said, as file formats go, I don't think Adobe PDF is so bad. It has the advantage of being widespread enough that it probably won't go away for at least a few more decades, and it is a general way to represent anything that can go onto a printed page. Yes, it is a type-setting file format (every letter or word having a 2D coordinate, plus a font), and yes they are a pain to parse! I once burned up an entire week trying to extract author, title, journal, etc. from a pile of 300 sdarticle.pdf files. It is NOT easy! It was after this highly painful experience that I realized PDF files are not documents. They are annotated 2D images. I think the right way to think about PDF files is to consider them equivalent to a hard copy. For the younger readers: a hard copy is like a PDF file, but after you have killed a tree to print it out. Believe it or not, once upon a time there were no PDF files, and hard copy was the ONLY format for long-term archival storage. Your university probably still has a large amount of hard copy journals. They are usually located in that great big building called The Library, that you may or may not have been to. Fortunately, the technology for converting hard copy (or a PDF) into something useful is maturing rapidly. Not so long ago I was faced with trying to get a large table of numbers out of the International Tables of Crystallography (of which I only have a hard copy) and into a computer program for doing absorption corrections. After spending an hour or so typing in 5-digit numbers, I remembered that several years ago I had bought an $80 HP print/scan/fax machine that can produce a searchable PDF. I scanned in the table, (after masking off the caption, etc with blank sheets of paper) which produced a PDF file. I then easily selected the numbers in Acrobat, and pasted them into a text file. One awk script later, and I was done! I suppose what James Stroud would really like, however, is a way to select the area of the PDF digitally and then right click for a convert to ... option. I don't think such a feature is available just yet. Google, however, has done a great deal of work on this, and they seem to be doing pdf-to-html conversion automatically now if you gmail yourself a PDF file. The best scriptable programs for getting data out of PDFs I have found so far are the poppler Linux pdf converter programs: pdftotext, pdf2ps, ps2ascii. Usually some combination of them will give you text with the formatting close to what you want. Or, if all else fails, print it, cut out the table you want with scissors, and then scan it back in with OCR turned on. You might even be able to out source this: by giving an undergrad the gift of their first trip to the Library. One day, when they become a scientist, they might think more carefully about how they format their supplementary documents. -James Holton MAD Scientist On 11/17/2010 7:41 AM, John R Helliwell wrote: Dear Colleagues, In trying to perhaps see some level of virtue in the PNAS approach one can imagine that not all deposited data can be well characterised in a way that is easy for computers to parse automatically. In such circumstances, a deposited PDF may be better than nothing at all. As yet, not all journal publishing platforms can or will serve a variety of different file formats, which is probably in part why PDFs might be used, since they are easy to generate. That said I agree with previous postings today that Journals should encourage authors to supply data in well-characterised machine-readable formats ie to the extent that this is feasible. For small molecule crystal structures within IUCr Journal articles, and associated crystal structure data sets, this is straightforward, since variants of the IUCr's CIF standard cover diffraction images, structure factors and refined coordinates and ADPs. For protein crystal structures, as this CCP4bb well knows, articles are accompanied by RCSB deposition of coordinates and structure factors. Nevertheless, it would be good to see research scientists increasing pressure on journals to deposit and disseminate
Re: [ccp4bb] [RANT] Publication Data Formats
On Nov 17, 2010, at 2:30 PM, James Holton wrote: I once burned up an entire week trying to extract author, title, journal, etc. from a pile of 300 sdarticle.pdf files. It is NOT easy! drag drop, it is that easy :-) And you can even export it into text afterwards. James, you should have invested into this program: http://mekentosj.com/papers/ The time(money/salary) you spent could have been very well invested in this program :-) And I know you have a Mac somewhere. Jürgen - Jürgen Bosch Johns Hopkins Bloomberg School of Public Health Department of Biochemistry Molecular Biology Johns Hopkins Malaria Research Institute 615 North Wolfe Street, W8708 Baltimore, MD 21205 Phone: +1-410-614-4742 Lab: +1-410-614-4894 Fax: +1-410-955-3655 http://web.mac.com/bosch_lab/http://web.me.com/bosch_lab/
Re: [ccp4bb] [RANT] Publication Data Formats
On Wednesday, November 17, 2010 01:42:40 am James Stroud wrote: I did a 5 minute search for an example, and the best I could do with the patience I had was this: http://onlinelibrary.wiley.com/doi/10.1002/pmic.200700038/suppinfo You'll see in the available PDF file Tables S1-S3. Were I to look for any significant amount of time, I could find much more egregious examples. For this particular example, your eyes may deceive you into thinking that the PDF file can be parsed and the data represented in the tables extracted with a script of some sort. But, if you have the patience, go to Table S3 and start selecting text at Accession Number in the heading. You'll find that the selection goes down that column only about half way and then begins selecting at the next column, Swissprot Identifier. So basically, the data represented in these tables is useless for any computational analysis by the end user except for (1) those who wish to type the data in by hand or (2) individuals like Dr. Merritt who can presumably just read the data and do the analysis in cranio. merritt [36] which in_cranio in_cranio: aliased to pdftotext -layout merritt [37] in_cranio pro200700038_s.pdf The result is a set of nicely formatted ascii tables with column headings maintained correctly. OK, the sequence alignment is mangled, but that's not the data part. cheers, Ethan 40 years of augmenting brain cycles with code Merritt -- Ethan A Merritt Biomolecular Structure Center, K-428 Health Sciences Bldg University of Washington, Seattle 98195-7742
Re: [ccp4bb] [RANT] Publication Data Formats
On Nov 17, 2010, at 11:46 AM, Bosch, Juergen wrote: On Nov 17, 2010, at 2:30 PM, James Holton wrote: I once burned up an entire week trying to extract author, title, journal, etc. from a pile of 300 sdarticle.pdf files. It is NOT easy! drag drop, it is that easy :-) And you can even export it into text afterwards. James, you should have invested into this program: The program you suggest might be able to do author, title, and journal for many of the articles, but would likely bonk terribly on the etc. part. Anyone with a certain level of programming ability can dredge through PDF files. That's not the point. The point is that computers are unlike people in that they can not yet decode semantics in the absence of structured context. Humans are good at this task although computers are not. For example, you understand the meaning of this paragraph, but a computer would just see a bunch of words... clause type=conditional language=English conditionunless/condition subjectI/subject predicatefollowed/predicate qualification adjectivecertain/adjective adjectiveprescribed/adjective object type=indirectrules/object /qualification /clause (And still the computer would have a lot of trouble identifying exactly who I was in that clause because the structure does not extend to broader context--I have conveniently not added the 'author' attribute to the clause.) Unlike the above XML*, the PDF file format is not required to give any indication of which of its bytes represent a certain type of data. James *Again, not advocating for XML or any other specific structured data format.
Re: [ccp4bb] [RANT] Publication Data Formats
Adobe Acrobat Pro should convert any pdf file (text or image) into a format recognized by text editing program. --Chun From: CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] On Behalf Of James Stroud Sent: Wednesday, November 17, 2010 12:36 PM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] [RANT] Publication Data Formats On Nov 17, 2010, at 11:46 AM, Bosch, Juergen wrote: On Nov 17, 2010, at 2:30 PM, James Holton wrote: I once burned up an entire week trying to extract author, title, journal, etc. from a pile of 300 sdarticle.pdf files. It is NOT easy! drag drop, it is that easy :-) And you can even export it into text afterwards. James, you should have invested into this program: The program you suggest might be able to do author, title, and journal for many of the articles, but would likely bonk terribly on the etc. part. Anyone with a certain level of programming ability can dredge through PDF files. That's not the point. The point is that computers are unlike people in that they can not yet decode semantics in the absence of structured context. Humans are good at this task although computers are not. For example, you understand the meaning of this paragraph, but a computer would just see a bunch of words... clause type=conditional language=English conditionunless/condition subjectI/subject predicatefollowed/predicate qualification adjectivecertain/adjective adjectiveprescribed/adjective object type=indirectrules/object /qualification /clause (And still the computer would have a lot of trouble identifying exactly who I was in that clause because the structure does not extend to broader context--I have conveniently not added the 'author' attribute to the clause.) Unlike the above XML*, the PDF file format is not required to give any indication of which of its bytes represent a certain type of data. James *Again, not advocating for XML or any other specific structured data format.
Re: [ccp4bb] [RANT] Publication Data Formats
On Nov 17, 2010, at 12:10 PM, Ethan Merritt wrote: merritt [36] which in_cranio in_cranio: aliased to pdftotext -layout merritt [37] in_cranio pro200700038_s.pdf This has just taken the data from one visual format to another purely text-based visual format. You still have to split it into text files manually and then import the data manually. Table S3 is split, so it will have to be imported to the spreadsheet program in three steps and then merged. We have to be careful when we split Best Ion Score, Best Ion C.I. %, and Coverage. These headings merge to a single space between them, so we can't use that spacing as an indicator. Try this, start your stopwatch, begin to convert all of these tables to spreadsheets or your favorite database format, validate that the import was correct and that the data types are what you expect, clean up extraneous information like (page n/N), save all the files, stop your stopwatch, and then tell us how long it took. That is the real task, not simply reformatting the data to pure text. If you think that this is an unreasonable request, then you are starting to get my point. James
Re: [ccp4bb] how to optimize small rod-shaped crystals
Agree. two comments for your reference: 1. When you have glycerol in your protein buffer, always add same % glycerol in your reservoir solution.You have 10% glycerol in your protein buffer, but not in your reservoir solution, the glycerol will overcome all the other facts and grasp water from reservoir to your drop, make your protein more diluted and hard to growing big crystal, try to add 10% glycerol in your reservoir solution to balance the glycerol force, and the vapor will go from drop to reservoir as normal, you may have an other lucky direction. 2. Ni--phosphate crystal will be colored and very hard to form crystal in normal condition, if you worry about Ni treat your sample with EDTA than dialysis against out your protein buffer. Deqian --- On Tue, 11/16/10, Clement Angkawidjaja clem...@bio.mls.eng.osaka-u.ac.jp wrote: From: Clement Angkawidjaja clem...@bio.mls.eng.osaka-u.ac.jp Subject: Re: [ccp4bb] how to optimize small rod-shaped crystals To: CCP4BB@JISCMAIL.AC.UK Date: Tuesday, November 16, 2010, 8:19 PM I strongly agree with Eric Larson’s suggestion on trying to see the diffraction of your crystal. The most straightforward solution. Other suggestions may work too, but there are chances they will still give you false positives. If you need bigger crystals, try to slow down the nucleation (use lower temperature, different ratio of protein:crystallant, etc). Clement From: yybbll Sent: Wednesday, November 17, 2010 2:42 AM To: CCP4BB@JISCMAIL.AC.UK Subject: [ccp4bb] how to optimize small rod-shaped crystals Hi, everybody, I try to crystallize one membrane protein. All crystals were grown by handing-drop vapor diffusion at 20 degree. A protein solution containing about 8-10mg/ml protein in 20mM Tris (pH7.5), 0.017% DDM, 100mM NaCl, 10% glycerol, 2mM DDT was mixed with an equal volume of a reservoir solution containing 45% PEG200, 0.1 M phosphate/citrate (pH4.2). First crystal appeared in the drop within 4 days. And one week a lot of crystals appeared in the drops. Our question is all of these crystals are too small to check them by X-ray diffraction and SDS-PAGE. We are not sure they are protein crystals or salt crystals. Our condition seems difficult to produce salt crystal. But I am a little warry because we use reloaded our sample to small Ni-resin column to reduce the concentration of detergent. Maybe some nickel ion dropped off, and then our protein sample contained some this ion. And nickel ion may react with phosphate, and then produced nickel phosphate crystal. Could somebody tell me if it is possible? I attach some photos of our crystals. Could somebody give me some suggestions about how to optimize this type crystal to get bigger crystal? Thanks a lot! Yibin
[ccp4bb] [Fwd: Re: [ccp4bb] Graphics for notebook]
---BeginMessage--- Kay Diederichs wrote: Eric Karg harvard...@yahoo.com Datum: Sun, 14 Nov 2010 21:37:10 + Dear all, Thanks for your suggestions. From what I learned new GPUs from NVIDIA are using the Optimus technology which does not support Linux, meaning that only the dedicated graphics on the system will be used in Linux. Does it still make sense to go for NVIDIA instead of ATI? No, the right way is to contact NVIDIA and pressure them to support Linux. Just sending a mail to customer support saying what you just wrote before is enough. Also, Eric suggest a smart way. But even if it works, you should bother NVIDIA so that in the future things will evolve in the right way. Many people did this several years ago, so at some point, NVIDIA started providing quality Linux drivers. In fact, people should bother NVIDIA so much so it is even possible for people outside of NVIDIA to support the Linux driver even when NVIDIA will no more be interested into supporting it. Eric Eric, Optimus is a technology for fast switching between the slow internal graphics unit and a fast, but power-hungry, NVidia chip. Unfortunately, it is currently only supported by Windows7. If the notebook's BIOS offers to permanently disable, or permanently enable, the NVidia graphics then, from the Linux view, this would be equivalent to a conventional notebook with slow/fast graphics. If it just defaults to one of those states then, using Linux, you are at the mercy of the decision of the BIOS developers. So I'd say: before you buy investigate what the BIOS offers. HTH, Kay ---End Message---
Re: [ccp4bb] Off Topic - Nickel Column
Dan, You could try a denaturing purification to get rid of the binding partner. First, take your cell lysate and spin it down at high speed to remove insoluble contaminants. (You probably do this already.) Then take your clarified lysate and dialyze it into buffer containing 6M Guanidine HCl. This will unfold everything, including proteases (no proteolysis under these conditions), and break the interaction between your protein and its binding partner. Then you can spin again at high speed to remove any additional aggregates and load this denatured sample onto the nickel column. The His-tag will still stick if the protein is unfolded in the presence of 6M GdHCl. Elute the protein in the same buffer with 6M GdHCl plus high imidazole concentration. Then you can take your eluent and dialyze out the GdHCl to refold the protein. Refolding (in the test tube, anyway) is not possible (practical?) for all proteins, but it often works very well. Good luck, Mike Thompson - Original Message - From: Daniel Bonsor bon...@bbri.org To: CCP4BB@JISCMAIL.AC.UK Sent: Wednesday, November 17, 2010 8:49:37 AM GMT -08:00 US/Canada Pacific Subject: [ccp4bb] Off Topic - Nickel Column I have a His-tagged protein which I am coexpressing with it's binding partner to prevent proteolysis. Once on the Nickel column I can remove 80% of the partner by flushing 2l of 1.3M NaCl solution buffered at pH 8.5 overnight. However the last 20% is difficult to remove, even if I reload the Nickel column and flush a further 2l of salt solution. I am wondering if I can increase the pH to 9.0 or 9.5. It should not effect the binding of His for the Nickel as the His-tag has to be deprotonated to bind, though will it causing stripping of the Nickel? Thanks Dan -- Michael C. Thompson Graduate Student Biochemistry Molecular Biology Division Department of Chemistry Biochemistry University of California, Los Angeles mi...@chem.ucla.edu
Re: [ccp4bb] [RANT] Publication Data Formats
James Stroud wrote: On Nov 16, 2010, at 10:57 PM, Ethan Merritt wrote: Bleah. Virtually none of those are human-readable, no matter what the wikipedia page may choose to put as a heading title. What kind of data are you dealing with? PDF would indeed be an odd format for diffraction images, but it would be miles better than most of the formats on the list you point to. The operative word is dataset, which is a subset of all things data. A dataset should be in a format that 1. can be validated 2. is structured 3. is machine readable Hello, They should allow YAML: http://en.wikipedia.org/wiki/YAML Then they will keep all the above and win an extra: 4. human readable Which makes it way better than the ugly and verbose XML. Regards, F. A pdf file *guarantees* none of the above. It is a presentation format and is not optimized for validating, structuring, or ensuring the machine readability of the data that it might contain. I'm not advocating for any particular serialization format. So this isn't about JSON v. XML religion wars. This is JSON or XML versus a file format that is basically designed to ferry presentation information between printers or computer screens. James