On Mon, Feb 27, 2012 at 3:54 AM, Rhoda Kinsella <[email protected]> wrote: > Hi Dan > I have consulted our resident Cloud expert. Please find his comments below > in between your text. I hope this helps, but please don't hesitate to get in > contact should you require more information.
Thank you. Please see my further questions below. > Regards > Rhoda > > On 23 Feb 2012, at 19:52, Dan Tenenbaum wrote: > > Hello, > > I am interested in exploring ENSEMBL / Biomart datasets that are made > available on Amazon EC2. > > I'm wondering what is available and how to use it. > > If I search for "ENSEMBL" in "public datasets", I see three results: > http://aws.amazon.com/search?searchQuery=ensembl&searchPath=datasets&x=0&y=0 > > Ensembl Annotated Human Genome Data (FASTA Release 65) > Ensembl Annotated Human Genome Data (MySQL Release 65) > Ensembl - FASTA Database Files > > However, none of the snapshot IDs associated with these three data > sets show up in the list of datasets available when I try and create a > new volume in the EC2 web console. Instead, I see the following > datasets: > > > What Region are you trying to use these in? Amazon public datasets are > available only in the useast region. > > I want to use them in the us-east region. > > Ensembl BioMart (Linux) > Main Ensembl (Linux) > Ensembl-53 (Linux) > Ensembl - FASTA Database Files (Linux) > Ensembl-54 (Linux) > Ensembl-54b (Linux) > Ensembl-55-FASTA-DB (Linux) > Ensembl-55 (Linux) > Ensembl-56 (Linux) > Ensembl-56-FASTA-DB (Linux) > Ensembl 57 for MySQL (Linux) > Ensembl 57 for FASTA (Linux) > Ensembl 59 FASTA dump > Ensembl 59 MySQL flat file dumps > Ensembl 60 MySQL flat file dumps > Ensembl 60 fasta dumps > Ensembl Release 61 FASTA dumps > Ensembl Release 61 MySQL...at file dumps > Ensembl 62 Fasta Data > Ensembl 62 MySQL Data > Ensembl Release 63 MySQL...t file dumps > Ensembl Release 63 FASTA Dumps > Ensembl 64 MySQL flat file dumps > Ensembl 64 FASTA dumps > Ensembl 64 FASTA dumps > Ensembl 64 MySQL flat file dumps > Ensembl Release 65 FASTA dumps > Ensembl Release 65 MySQL dumps > ensembl release 65 binary MySQL > > I mounted snap-c48360ad, referred to above as "Ensembl BioMart". > Inspecting the contents of the disk, I see what look like MySQL > database files (.MYI, .MYD. and .frm files). > > I would like to create the corresponding databases but I can't find > any documentation about doing so. > The page on amazon for one of the datasets > (http://aws.amazon.com/datasets/2315?_encoding=UTF8&queryArg=searchQuery&x=0&fromSearch=1&y=0&searchPath=datasets&searchQuery=ensembl) > tells me to look here for documentation: > http://www.ensembl.org/info/docs/webcode/install/ensembl-data.html > > > > > These are quite old versions of our data in the Public Datasets program and > some represent some early experiments of ours with Amazon public datasets, > and sets you have listed - whilst originating from us at a point in the past > - are actually currently owned and controlled by the public dataset program. > > We do not currently submit separate biomart dumps to the public dataset > program(although we tried it out once in 2008). What we do currently submit > are the FASTA dumps and MYSQL text dumps for all of our databases. After > some early tweaking, we finalised to this format after discussion, agreement > and arrangement with the public dataset program. In the future there a may > be an opportunity for further changes - but not at the minute. > > In this dataset you will find the MySQL text dumps for biomart (amongst all > of our databases) > > http://aws.amazon.com/datasets/2315?_encoding=UTF8&jiveRedirect=1 > > And this instructions on how to turn this into a mysql database is here:- > > http://www.ensembl.org/info/docs/webcode/install/ensembl-data.html > > You will obviously need to filter for the mart databases first. > > > > But that page has instructions which assume that you have gzipped .sql > and .txt files (as far as I can tell). > Where can I find documentation for creating MySQL databases from the > MYI and MYD files? > > Also, is there any further/more accurate documentation about which > ENSEMBL datasets are available on EC2/AWS and how to use them? > > > > Yes. > > We have pre-baked AMIs that will boot into MySQL database servers here:- > (however there are no biomart databases in this set.) > > http://www.ensembl.org/info/data/amazon_aws.html > http://www.ensembl.info/blog/2011/07/12/run-a-private-ensembl-mysql-in-the-cloud/ > Thank you. Are there plans to make the biomart data available in this way? That's what I am really looking for. Thanks, Dan > > > > Thanks! > Dan > _______________________________________________ > Users mailing list > [email protected] > https://lists.biomart.org/mailman/listinfo/users > > > Rhoda Kinsella Ph.D. > Ensembl Production Project Leader, > European Bioinformatics Institute (EMBL-EBI), > Wellcome Trust Genome Campus, > Hinxton > Cambridge CB10 1SD, > UK. > _______________________________________________ Users mailing list [email protected] https://lists.biomart.org/mailman/listinfo/users
