[ccp4bb] US Distributed Computing Summer School (fully funded)

2011-03-17 Thread Ian Stokes-Rees
[Forwarded on behalf of organizers.  You may need to be US-based to
participate.  They are specifically interested in contacting life
sciences researchers.]

Hello!

We invite you and your colleagues to apply now for the 2011 OSG Summer
School where you can learn to harness the power of distributed computing
while spending a week at the beautiful University of Wisconsin-Madison.

During the school, you will learn to use high-throughput computing (HTC)
systems--at your own campus or using the national Open Science Grid
(OSG)--to run large-scale computing applications that are at the heart
of today’s cutting-edge science. Through lectures, discussions, and lots
of hands-on activities with experienced OSG staff, you will learn how
HTC systems work, how to run and manage lots of jobs and huge datasets
to implement a scientific computing workflow, and where to turn for more
information and help.

The school is ideal for graduate students in computer science or other
sciences where large-scale computing is a vital part of the research
process, but any qualified and interested applicant will be considered.
In the past, we've had students from diverse backgrounds including
genetics, geographic information systems, and physics. Successful
applicants will have all travel and school expenses paid for by the OSG.
Furthermore, as part of a collaboration with TeraGrid, students will go
to the annual TeraGrid Conference (TG11, July 18–21, 2011 in Salt Lake
City, Utah) with all travel and conference expenses paid.

Important dates:
  Applications Open: Now
  Applications Close: Friday, April 1, 2011
  School Session: June 26-30, 2011

For more information, please visit
http://www.opensciencegrid.org/GridSchool or email us at
gridschool-2011-i...@opensciencegrid.org.
We hope to hear from you soon.

Sincerely,
Tim Cartwright and Alain Roy
2011 OSG Summer School Organizers


Re: [ccp4bb] PDB data mining

2011-03-08 Thread Ian Stokes-Rees


On 3/8/11 5:44 PM, Cale Dakwar wrote:

 Hello all,

 For any given structure in the PDB, I want to identify all the
 Histidine ND1 atoms.  I then want to consider these atoms in pairs,
 measure the distance in Angstroms between the ND1 atoms in each pair,
 and compile these distances (along with residue numbers of the pair)
 in a table.  I then want to repeat this procedure for each unique
 structure in the PDB and generate a table containing all occurrences
 of HisND1 pairs with their corresponding separation distance.  Amongst
 other things, I want e.g. generate a histogram from this table and
 determine e.g. the shortest HisND1 pair distance observed and the
 structure in which this happens.  Does anyone have any suggestions for
 any tools I might be able to use to perform this search?

You'll have to more-or-less write something to do this yourself, as
others have suggested.  Ideally you should use some kind of library that
gives you a more usable version of the PDB files, rather than have to
parse the PDB files yourself.  Something like PROSS, Phenix iotbx.pdb,
pdb-tools (all Python), or ParsePDB (Perl) should make your life easier.

If you don't already have a local mirror of RCSB, you'll need that too. 
Some info on that process is here:

http://www.wwpdb.org/downloads.html

If you are only interested in a small subset of all PDBs, and you can
identify that subset based exclusively on sequence comparisons, then you
can do searches on the per-chain FASTA entries for all PDBs which are
available in this file:

ftp://ftp.wwpdb.org/pub/pdb/derived_data/pdb_seqres.txt

Hope that helps,

Ian
attachment: ijstokes.vcf

Re: [ccp4bb] CCP4 for iphones

2011-02-28 Thread Ian Stokes-Rees
In a sentence, primarily due to cost and power constraints mobile
devices don't (currently) have the horsepower to do any serious
*generic* number crunching, as would be required for anything of
interest to this community.

On the topic of using otherwise-idle compute time, our group has a
publicly available service for doing molecular replacement which
accesses a federation of computing centers across the US (through Open
Science Grid):

https://portal.nebiogrid.org/secure/apps/wsmr/

We regularly secure 50-150,000 hours per day of computing time from
OSG.  We're in the process of improving this and adding in additional
services.  Watch this space.  For those with more of an interest on this
topic, you can read on below.

Regards,

Ian



This thread raises some interesting questions, but indicates a lack of
understanding of the difference between what a mobile device like an
iPhone, iPad, or Android can do compared to a rack-mounted server,
desktop computer, or even laptop.The number crunching mobile devices
are capable of is for specific sorts of data like audio and video codecs
which are offloaded to specialized hardware and which can't (currently)
be reused for other applications (like protein structure studies).  GPUs
are showing how this can change, but I wouldn't hold your breath.  I
think power and battery life will continue to be challenges for mobile
devices for a long time, so even if generic computing ability catches up
with conventional desktop/server capabilities, few people will want
their batteries drained by their device running continuously doing an MD
simulation or structure refinement.

On 2/25/11 5:01 PM, Xiaoguang Xue wrote:
 Well, maybe building a distributed computing network (Like Fold@Home)
 by iphone is an improvement of the clusters. Let's think about a
 phenomenon, the most common functions of our iphone are calling,
 playing music, and maybe gaming, so most of the time the phone is
 idle. Why don't we try to use these idle computing time to help us
 doing some more important and interesting things, like determining the
 proteins structures

US-based non-commercial researchers can access Open Science Grid
(http://www.opensciencegrid.org/), which consists of a federation of
about 80,000 compute cores, by registering for a certificate and joining
(or forming) a Virtual Organization.  We host a Virtual Organization in
OSG called SBGrid which is open to all SBGrid consortium members
(http://sbgrid.org/).  We regularly get 2000-4000 compute cores from OSG
for extended periods (12-96 hours), so it is a very powerful resource.

Another alternative for structural biologists who could benefit from
1000s of compute cores is to get an allocation at a national
supercomputing center.  In the US, NERSC or TeraGrid are good routes for
this, and many options exist.  In Europe EGI and DEISA provide a similar
one stop shop for federated grid computing and supercomputing center
access.

http://www.nersc.gov/
https://www.teragrid.org/
http://www.egi.eu/
http://www.deisa.eu/

Finally, you can benefit from the millions of desktop computers out
there with super-powerful compute cores and GPUs that spend most of the
time (often 90%) completely idle using screen saver computing.  Here
there is really only one option which is BOINC, developed by the group
that created SETI@Home.  Rosetta is (sort-of) available this way through
Rosetta@home, developed by the Baker Lab.

http://boinc.berkeley.edu/
http://boinc.bakerlab.org/

 I also noticed that there is some progress in grid computing on iphone
 and PS3. So I think it's possible to apply this technique to
 structural biology.
 http://www.sciencedaily.com/releases/2010/04/100413072040.htm

I think adding iPhone to the title of that article was just to attract
readers.  They are only using the standard web-browsing features
available on pretty much any smart phone or mobile device to view
web-portal views of computational infrastructure.  All the actual
computing was done on PS3s (and only 16 of them).  In other words, if
you consider browsing to EBI or RCSB to access some sequence alignment
program or view some protein structures, then you can say I've used an
iPhone for grid computing.  Most people, however, would question the
accuracy of this association.


Re: [ccp4bb] CCP4 for iphones

2011-02-28 Thread Ian Stokes-Rees


  
  


On 2/25/11 5:41 PM, Nat Echols wrote:
On Fri, Feb 25, 2011 at 2:10 PM, Sean Seaver s...@p212121.com
  wrote:
  

  I've been curious if there has been discussion about moving
  data processing and refinement to a software as a service
  (SaaS) deployment. If programs were web accessible then it
  may save researchers time and trouble (maintaining and
  installing software). In turn, one could then process data
  via their iphone.
  
  The computational demand would be enormous and personally have
  a hard time even doing a back of the envelope calculation.
  The demand could be offset such as by limiting jobs or the
  number of users, etc... It will be interesting to see how
  mobile plays a role in crystallography.



SBGrid has done something like this for massively parallel
  MR searches:


https://portal.nebiogrid.org/secure/apps/wsmr/


But that's a massively parallel and highly distributed
  calculation, which isn't what crystallographers do most of the
  time. Nor do they need to be particularly mobile in an era of
  remote synchrotron data collection.
  


Nat, thanks for commenting on this. As the person who has developed
it, I'm glad someone has noticed the connection between the
web-based application (well, really just an application wrapper,
since it uses CCP4 software underneath) and what it is actually
doing behind the scenes. It seems to us (within SBGrid) that there
are quite a few applications that can benefit from access to large
scale computational infrastructure. Sometimes having that resource
available will allow people to ask new questions or pose old
questions in a new way. We're always happy to talk to people who
have ideas for new computational workflows or applications that can
benefit from 10s of thousands of compute cores or that process TB of
data. And of course the underlying resources are available to
others to access themselves (see another post I made on this same
thread about an hour ago).


  
I have a lot of other objections to the idea of doing
  everything as a webapp, but that's a separate rant. I do,
  however, like the idea of using multi-touch interfaces for
  model-building, but you need something at least the size of an
  iPad for that to be more productive than using a traditional
  computer with a mouse.

  


I agree that not everything should be done as a web app. When
high-functionality UI features are required, developing these with
CSS, jQuery, AJAX, HTML5, Java, etc. is super time consuming,
compared with conventional integrated UI toolkits (Tcl/TK, Qt,
Cocoa, .NET, etc.). Similarly when significant "real-time" data
processing is required, or if multiple applications are interacting
with the same data, then the UI (graphical or otherwise) needs to be
"close" to the user data, and not stuck messing around with web
browsers (which can't really be scripted) and web forms.

I got a 21" HP multi-touch screen last year to explore improved
touch-based interfaces for structural biology applications, however
it doesn't work (properly) under OS X, and I'm not inclined to shift
to a Windows based environment to develop for it. Hopefully some
standard USB interfaces/drivers/libraries (events) will appear soon
so the iPad and other tablets aren't the exclusive domain for
touch-based applications.

Ian
  



Re: [ccp4bb] brute force MR

2010-12-10 Thread Ian Stokes-Rees
Arnon,

We have developed an MR search mechanism which may be helpful in this scenario. 
 It is web accessible and available to any public or academic researchers:

https://portal.nebiogrid.org/secure/apps/wsmr/

It can use up to the full set of SCOP domains (100k) to attempt a Phaser MR 
placement of each domain and then ranks the results, allowing you to identify a 
single well placed domain.  The web-based system does not allow you to fix that 
domain to continue the search for subsequent domains but we can do this from 
the command line interface.  If you have a particular set of domains you'd like 
to search against (RCSB or SCOP PDB codes), then we can limit the search to 
that set.

If you decide to use this, please contact us once your first domain search has 
been completed (these take 2-3 years of serial computing time and will finish 
in 1-3 days, depending on how many other computations are ahead of it in our 
queue, the complexity/resolution of your data set, space group, and unit cell 
size).

Regards,

Ian Stokes-Rees


Re: [ccp4bb] brute force MR

2010-12-10 Thread Ian Stokes-Rees
For anyone who is interested, I meant to include a reference to the PNAS paper 
that has just come out (web-only early release) describing the wide search MR 
strategy we've developed:

Stokes-Rees, Sliz
Protein structure determination by exhaustive search of Protein Data Bank 
derived databases
Proc. Nat'l Academy of Sciences
doi:10.1073/pnas.1012095107
http://www.pnas.org/content/early/2010/11/17/1012095107

Ian