[ccp4bb] US Distributed Computing Summer School (fully funded)
[Forwarded on behalf of organizers. You may need to be US-based to participate. They are specifically interested in contacting life sciences researchers.] Hello! We invite you and your colleagues to apply now for the 2011 OSG Summer School where you can learn to harness the power of distributed computing while spending a week at the beautiful University of Wisconsin-Madison. During the school, you will learn to use high-throughput computing (HTC) systems--at your own campus or using the national Open Science Grid (OSG)--to run large-scale computing applications that are at the heart of today’s cutting-edge science. Through lectures, discussions, and lots of hands-on activities with experienced OSG staff, you will learn how HTC systems work, how to run and manage lots of jobs and huge datasets to implement a scientific computing workflow, and where to turn for more information and help. The school is ideal for graduate students in computer science or other sciences where large-scale computing is a vital part of the research process, but any qualified and interested applicant will be considered. In the past, we've had students from diverse backgrounds including genetics, geographic information systems, and physics. Successful applicants will have all travel and school expenses paid for by the OSG. Furthermore, as part of a collaboration with TeraGrid, students will go to the annual TeraGrid Conference (TG11, July 18–21, 2011 in Salt Lake City, Utah) with all travel and conference expenses paid. Important dates: Applications Open: Now Applications Close: Friday, April 1, 2011 School Session: June 26-30, 2011 For more information, please visit http://www.opensciencegrid.org/GridSchool or email us at gridschool-2011-i...@opensciencegrid.org. We hope to hear from you soon. Sincerely, Tim Cartwright and Alain Roy 2011 OSG Summer School Organizers
Re: [ccp4bb] PDB data mining
On 3/8/11 5:44 PM, Cale Dakwar wrote: Hello all, For any given structure in the PDB, I want to identify all the Histidine ND1 atoms. I then want to consider these atoms in pairs, measure the distance in Angstroms between the ND1 atoms in each pair, and compile these distances (along with residue numbers of the pair) in a table. I then want to repeat this procedure for each unique structure in the PDB and generate a table containing all occurrences of HisND1 pairs with their corresponding separation distance. Amongst other things, I want e.g. generate a histogram from this table and determine e.g. the shortest HisND1 pair distance observed and the structure in which this happens. Does anyone have any suggestions for any tools I might be able to use to perform this search? You'll have to more-or-less write something to do this yourself, as others have suggested. Ideally you should use some kind of library that gives you a more usable version of the PDB files, rather than have to parse the PDB files yourself. Something like PROSS, Phenix iotbx.pdb, pdb-tools (all Python), or ParsePDB (Perl) should make your life easier. If you don't already have a local mirror of RCSB, you'll need that too. Some info on that process is here: http://www.wwpdb.org/downloads.html If you are only interested in a small subset of all PDBs, and you can identify that subset based exclusively on sequence comparisons, then you can do searches on the per-chain FASTA entries for all PDBs which are available in this file: ftp://ftp.wwpdb.org/pub/pdb/derived_data/pdb_seqres.txt Hope that helps, Ian attachment: ijstokes.vcf
Re: [ccp4bb] CCP4 for iphones
In a sentence, primarily due to cost and power constraints mobile devices don't (currently) have the horsepower to do any serious *generic* number crunching, as would be required for anything of interest to this community. On the topic of using otherwise-idle compute time, our group has a publicly available service for doing molecular replacement which accesses a federation of computing centers across the US (through Open Science Grid): https://portal.nebiogrid.org/secure/apps/wsmr/ We regularly secure 50-150,000 hours per day of computing time from OSG. We're in the process of improving this and adding in additional services. Watch this space. For those with more of an interest on this topic, you can read on below. Regards, Ian This thread raises some interesting questions, but indicates a lack of understanding of the difference between what a mobile device like an iPhone, iPad, or Android can do compared to a rack-mounted server, desktop computer, or even laptop.The number crunching mobile devices are capable of is for specific sorts of data like audio and video codecs which are offloaded to specialized hardware and which can't (currently) be reused for other applications (like protein structure studies). GPUs are showing how this can change, but I wouldn't hold your breath. I think power and battery life will continue to be challenges for mobile devices for a long time, so even if generic computing ability catches up with conventional desktop/server capabilities, few people will want their batteries drained by their device running continuously doing an MD simulation or structure refinement. On 2/25/11 5:01 PM, Xiaoguang Xue wrote: Well, maybe building a distributed computing network (Like Fold@Home) by iphone is an improvement of the clusters. Let's think about a phenomenon, the most common functions of our iphone are calling, playing music, and maybe gaming, so most of the time the phone is idle. Why don't we try to use these idle computing time to help us doing some more important and interesting things, like determining the proteins structures US-based non-commercial researchers can access Open Science Grid (http://www.opensciencegrid.org/), which consists of a federation of about 80,000 compute cores, by registering for a certificate and joining (or forming) a Virtual Organization. We host a Virtual Organization in OSG called SBGrid which is open to all SBGrid consortium members (http://sbgrid.org/). We regularly get 2000-4000 compute cores from OSG for extended periods (12-96 hours), so it is a very powerful resource. Another alternative for structural biologists who could benefit from 1000s of compute cores is to get an allocation at a national supercomputing center. In the US, NERSC or TeraGrid are good routes for this, and many options exist. In Europe EGI and DEISA provide a similar one stop shop for federated grid computing and supercomputing center access. http://www.nersc.gov/ https://www.teragrid.org/ http://www.egi.eu/ http://www.deisa.eu/ Finally, you can benefit from the millions of desktop computers out there with super-powerful compute cores and GPUs that spend most of the time (often 90%) completely idle using screen saver computing. Here there is really only one option which is BOINC, developed by the group that created SETI@Home. Rosetta is (sort-of) available this way through Rosetta@home, developed by the Baker Lab. http://boinc.berkeley.edu/ http://boinc.bakerlab.org/ I also noticed that there is some progress in grid computing on iphone and PS3. So I think it's possible to apply this technique to structural biology. http://www.sciencedaily.com/releases/2010/04/100413072040.htm I think adding iPhone to the title of that article was just to attract readers. They are only using the standard web-browsing features available on pretty much any smart phone or mobile device to view web-portal views of computational infrastructure. All the actual computing was done on PS3s (and only 16 of them). In other words, if you consider browsing to EBI or RCSB to access some sequence alignment program or view some protein structures, then you can say I've used an iPhone for grid computing. Most people, however, would question the accuracy of this association.
Re: [ccp4bb] CCP4 for iphones
On 2/25/11 5:41 PM, Nat Echols wrote: On Fri, Feb 25, 2011 at 2:10 PM, Sean Seaver s...@p212121.com wrote: I've been curious if there has been discussion about moving data processing and refinement to a software as a service (SaaS) deployment. If programs were web accessible then it may save researchers time and trouble (maintaining and installing software). In turn, one could then process data via their iphone. The computational demand would be enormous and personally have a hard time even doing a back of the envelope calculation. The demand could be offset such as by limiting jobs or the number of users, etc... It will be interesting to see how mobile plays a role in crystallography. SBGrid has done something like this for massively parallel MR searches: https://portal.nebiogrid.org/secure/apps/wsmr/ But that's a massively parallel and highly distributed calculation, which isn't what crystallographers do most of the time. Nor do they need to be particularly mobile in an era of remote synchrotron data collection. Nat, thanks for commenting on this. As the person who has developed it, I'm glad someone has noticed the connection between the web-based application (well, really just an application wrapper, since it uses CCP4 software underneath) and what it is actually doing behind the scenes. It seems to us (within SBGrid) that there are quite a few applications that can benefit from access to large scale computational infrastructure. Sometimes having that resource available will allow people to ask new questions or pose old questions in a new way. We're always happy to talk to people who have ideas for new computational workflows or applications that can benefit from 10s of thousands of compute cores or that process TB of data. And of course the underlying resources are available to others to access themselves (see another post I made on this same thread about an hour ago). I have a lot of other objections to the idea of doing everything as a webapp, but that's a separate rant. I do, however, like the idea of using multi-touch interfaces for model-building, but you need something at least the size of an iPad for that to be more productive than using a traditional computer with a mouse. I agree that not everything should be done as a web app. When high-functionality UI features are required, developing these with CSS, jQuery, AJAX, HTML5, Java, etc. is super time consuming, compared with conventional integrated UI toolkits (Tcl/TK, Qt, Cocoa, .NET, etc.). Similarly when significant "real-time" data processing is required, or if multiple applications are interacting with the same data, then the UI (graphical or otherwise) needs to be "close" to the user data, and not stuck messing around with web browsers (which can't really be scripted) and web forms. I got a 21" HP multi-touch screen last year to explore improved touch-based interfaces for structural biology applications, however it doesn't work (properly) under OS X, and I'm not inclined to shift to a Windows based environment to develop for it. Hopefully some standard USB interfaces/drivers/libraries (events) will appear soon so the iPad and other tablets aren't the exclusive domain for touch-based applications. Ian
Re: [ccp4bb] brute force MR
Arnon, We have developed an MR search mechanism which may be helpful in this scenario. It is web accessible and available to any public or academic researchers: https://portal.nebiogrid.org/secure/apps/wsmr/ It can use up to the full set of SCOP domains (100k) to attempt a Phaser MR placement of each domain and then ranks the results, allowing you to identify a single well placed domain. The web-based system does not allow you to fix that domain to continue the search for subsequent domains but we can do this from the command line interface. If you have a particular set of domains you'd like to search against (RCSB or SCOP PDB codes), then we can limit the search to that set. If you decide to use this, please contact us once your first domain search has been completed (these take 2-3 years of serial computing time and will finish in 1-3 days, depending on how many other computations are ahead of it in our queue, the complexity/resolution of your data set, space group, and unit cell size). Regards, Ian Stokes-Rees
Re: [ccp4bb] brute force MR
For anyone who is interested, I meant to include a reference to the PNAS paper that has just come out (web-only early release) describing the wide search MR strategy we've developed: Stokes-Rees, Sliz Protein structure determination by exhaustive search of Protein Data Bank derived databases Proc. Nat'l Academy of Sciences doi:10.1073/pnas.1012095107 http://www.pnas.org/content/early/2010/11/17/1012095107 Ian