Hi Dan
I have consulted our resident Cloud expert. Please find his comments below in between your text. I hope this helps, but please don't hesitate to get in contact should you require more information.
Regards
Rhoda

On 23 Feb 2012, at 19:52, Dan Tenenbaum wrote:

Hello,

I am interested in exploring ENSEMBL / Biomart datasets that are made
available on Amazon EC2.

I'm wondering what is available and how to use it.

If I search for "ENSEMBL" in "public datasets", I see three results:
http://aws.amazon.com/search?searchQuery=ensembl&searchPath=datasets&x=0&y=0

Ensembl Annotated Human Genome Data (FASTA Release 65)
Ensembl Annotated Human Genome Data (MySQL Release 65)
Ensembl - FASTA Database Files

However, none of the snapshot IDs associated with these three data
sets show up in the list of datasets available when I try and create a
new volume in the EC2 web console. Instead, I see the following
datasets:

What Region are you trying to use these in? Amazon public datasets are available only in the useast region.


Ensembl BioMart (Linux)
Main Ensembl (Linux)
Ensembl-53 (Linux)
Ensembl - FASTA Database Files (Linux)
Ensembl-54 (Linux)
Ensembl-54b (Linux)
Ensembl-55-FASTA-DB (Linux)
Ensembl-55 (Linux)
Ensembl-56 (Linux)
Ensembl-56-FASTA-DB (Linux)
Ensembl 57 for MySQL (Linux)
Ensembl 57 for FASTA (Linux)
Ensembl 59 FASTA dump
Ensembl 59 MySQL flat file dumps
Ensembl 60 MySQL flat file dumps
Ensembl 60 fasta dumps
Ensembl Release 61 FASTA dumps
Ensembl Release 61 MySQL...at file dumps
Ensembl 62 Fasta Data
Ensembl 62 MySQL Data   
Ensembl Release 63 MySQL...t file dumps
Ensembl Release 63 FASTA Dumps
Ensembl 64 MySQL flat file dumps
Ensembl 64 FASTA dumps
Ensembl 64 FASTA dumps
Ensembl 64 MySQL flat file dumps
Ensembl Release 65 FASTA dumps
Ensembl Release 65 MySQL dumps
ensembl release 65 binary MySQL

I mounted snap-c48360ad, referred to above as "Ensembl BioMart".
Inspecting the contents of the disk, I see what look like MySQL
database files (.MYI, .MYD. and .frm files).

I would like to create the corresponding databases but I can't find
any documentation about doing so.
The page on amazon for one of the datasets
(http://aws.amazon.com/datasets/2315?_encoding=UTF8&queryArg=searchQuery&x=0&fromSearch=1&y=0&searchPath=datasets&searchQuery=ensembl )
tells me to look here for documentation:
http://www.ensembl.org/info/docs/webcode/install/ensembl-data.html



These are quite old versions of our data in the Public Datasets program and some represent some early experiments of ours with Amazon public datasets, and sets you have listed - whilst originating from us at a point in the past - are actually currently owned and controlled by the public dataset program.

We do not currently submit separate biomart dumps to the public dataset program(although we tried it out once in 2008). What we do currently submit are the FASTA dumps and MYSQL text dumps for all of our databases. After some early tweaking, we finalised to this format after discussion, agreement and arrangement with the public dataset program. In the future there a may be an opportunity for further changes - but not at the minute.

In this dataset you will find the MySQL text dumps for biomart (amongst all of our databases)

http://aws.amazon.com/datasets/2315?_encoding=UTF8&jiveRedirect=1

And this instructions on how to turn this into a mysql database is here:-

http://www.ensembl.org/info/docs/webcode/install/ensembl-data.html

You will obviously need to filter for the mart databases first.



But that page has instructions which assume that you have gzipped .sql
and .txt files (as far as I can tell).
Where can I find documentation for creating MySQL databases from the
MYI and MYD files?

Also, is there any further/more accurate documentation about which
ENSEMBL datasets are available on EC2/AWS and how to use them?


Yes.

We have pre-baked AMIs that will boot into MySQL database servers here:- (however there are no biomart databases in this set.)

http://www.ensembl.org/info/data/amazon_aws.html
http://www.ensembl.info/blog/2011/07/12/run-a-private-ensembl-mysql-in-the-cloud/




Thanks!
Dan
_______________________________________________
Users mailing list
[email protected]
https://lists.biomart.org/mailman/listinfo/users

Rhoda Kinsella Ph.D.
Ensembl Production Project Leader,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus,
Hinxton
Cambridge CB10 1SD,
UK.

_______________________________________________
Users mailing list
[email protected]
https://lists.biomart.org/mailman/listinfo/users

Reply via email to