Re: [galaxy-dev] Fwd: pass more information on a dataset merge
Hi John, Do you think it's possible to create a test for your 'm: format? I couldn't find how to specify multi input files for the test. -Alex -Original Message- From: jmchil...@gmail.com [mailto:jmchil...@gmail.com] On Behalf Of John Chilton Sent: Tuesday, 23 October 2012 7:59 AM To: Jorrit Boekel Cc: Khassapov, Alex (CSIRO IMT, Clayton) Subject: Re: Fwd: [galaxy-dev] pass more information on a dataset merge Hello again Jorrit, Great, I am glad we are largely on the same page here. I don't know when I will get a chance to look at this particular aspect, if you get there first that will be great, if not I will get there eventually. -John On Mon, Oct 22, 2012 at 2:51 AM, Jorrit Boekel jorrit.boe...@scilifelab.se wrote: IIRC, I implemented the task_X suffix (galaxy does so as well but to the split subdirectories) to ensure jobs that contained multiple split datasets would be run in sync. Files from two datasets that belong together then get analysed together in subsequent steps. It would however be much nicer to retain original file names through a pipeline or at least the possibility to retrieve them. Since the split/merge now run actively look and match for files with identical 'task_x', it may be an option to do: fraction1.raw - fraction1.raw_dataset_43.dat_task_0 - fraction1.raw_dataset_44.dat_task_0 fraction2.raw - fraction2.raw_dataset_43.dat_task_1 - fraction2.raw_dataset_44.dat_task_1 (Note that python starts counting at 0, while most researchers number their first fraction 1.) I wouldn't mind looking more into that as well, since it would be a big improvement UI-wise. cheers, jorrit On 10/19/2012 04:40 PM, John Chilton wrote: Jorrit I meant to cc you on this response to Alex. -- Forwarded message -- From: John Chilton chil0...@umn.edu Date: Fri, Oct 19, 2012 at 9:40 AM Subject: Re: [galaxy-dev] pass more information on a dataset merge To: alex.khassa...@csiro.au Hey Alex, I think the idea here is that your initially uploaded files would have different names, but after Jorrit's tool split/merge step they will all just be named after the dataset id (see screenshot) so you need the task_X at the end so they don't all just have the same name. I have not thought a whole lot about the naming thing, in general it seems like a tough problem and one that Galaxy itself doesn't do a particularly good job at. Jorrit have you given any thought to this? I wonder if it would be feasible to use the initial uploaded name as a sort of prefix going forward. So if I upload say fraction1.RAW fraction2.RAW fraction3.RAW and run a conversion step, maybe I could get: fraction1_dataset567.ms2 fraction2_dataset567.ms2 fraction3_dataset567.ms2 instead of dataset567.dat_task_0 dataset567.dat_task_1 dataset567.dat_task_2 Jorrit do you mind if I give implementing that a shot? It seems like it would be a win to me. Am I am going to hit some problem I don't see now (presumable we have to send some data from the split to the merge and that might be tricky)? -John On Thu, Oct 18, 2012 at 7:00 PM, alex.khassa...@csiro.au wrote: Thanks John, I wonder what's the reason for appending _task_XX to the file names, why can't we just keep original file names? Alex -Original Message- From: jmchil...@gmail.com [mailto:jmchil...@gmail.com] On Behalf Of John Chilton Sent: Friday, 19 October 2012 6:16 AM To: Khassapov, Alex (CSIRO IMT, Clayton) Subject: Re: [galaxy-dev] pass more information on a dataset merge On Tue, Oct 16, 2012 at 11:11 PM, alex.khassa...@csiro.au wrote: Hi John, I am definitely interested in this idea, not only me - we are currently working on moving a few scientific tools (not related to genome) into cloud using Galaxy. Great. My interests in Galaxy are mostly outside of genomics as well, it is good to have more people utilizing Galaxy in this way because it will force the platform to become more generic and address more broader use cases. We will try it further and see if we need any changes. For now one improvement would be nice, make dataset_id.dat contain list of paths to the location of the uploaded files, so by displaying html page the user could just click on the link and download the file. Code that attempted to do this was in there, but didn't work obviously. I have now fixed it up. Thanks for beta testing. -John We are pretty new to Galaxy, so our understanding of Galaxy is pretty limited. Thanks again, Alex -Original Message- From: jmchil...@gmail.com [mailto:jmchil...@gmail.com] On Behalf Of John Chilton Sent: Wednesday, 17 October 2012 3:21 AM To: Khassapov, Alex (CSIRO IMT, Clayton) Subject: Re: [galaxy-dev] pass more information on a dataset merge Wow, thanks for the rapid feedback! I have made the changes you have suggested. It seems you must be interested in this idea/implementation. Let me know if you have specific use
Re: [galaxy-dev] Dataset upload fail
Hi Vladimir On 11/01/2012 03:46 AM, Vladimir Yamshchikov wrote: Local install of Galaxy on SciLinux55. Fails to upload 5.2 GB fastq file from local HD, while normally loading smaller fastq and fasta datasets (less than 1 GB). Chunks of 1.2 GB size remain in */database/tmp, which all represent beginning of the file that fails to upload. Several attempts to upload made and several chunks of the same size are present. Can I just copy the file dataset to database directory instead of uploading through web interface? Make sure you provide a directory for library_import_dir in the universe_wsgi.ini file. And copy your files to this location (or a sub directory of it). This will give you an extra option in the admin menu: 'Upload directory of files' and then use the 'Link files without copying to Galaxy' option. Alternatively, set allow_library_path_paste to 'True' (see also the comments and warnings in the universe_wsgi.ini file ) Regards, Hans-Rudolf ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] which .loc file for SAM to BAM?
Hi, thank you for the pointer. I was only looking at this wiki page: http://wiki.g2.bx.psu.edu/Admin/Data%20Integration Maybe this should point to your page? regards, Andreas On 31.10.2012 17:50, Carlos Borroto wrote: On Wed, Oct 31, 2012 at 11:30 AM, Andreas Kuntzagk andreas.kuntz...@mdc-berlin.de wrote: Hi, I'm still setting up a local galaxy. Currently I'm testing the setup of NGS tools. If I try SAM to BAM for a BAM file that has hg18 set as build I get a message that Sequences are not currently available for the specified build. I guess that I have either to manipulate one of the .loc files (but which?) or have to download additional data from rsync server. (I already have the tool-data/shared/hg18 completely) The .loc file you want to modify is 'tool-data/sam_fa_indices.loc'. You can find information about this subject in the wiki[1]. Although the table there is not complete, so you could always find the right xml under 'tools' and poke inside to find a line like this one: validator type=dataset_metadata_in_file filename=sam_fa_indices.loc metadata_name=dbkey metadata_column=1 message=Sequences are not currently available for the specified build. line_startswith=index / [1]http://wiki.g2.bx.psu.edu/Admin/NGS%20Local%20Setup And I agree, dealing with .loc files is quite cumbersome. Hope it helps, Carlos -- Andreas Kuntzagk SystemAdministrator Berlin Institute for Medical Systems Biology at the Max-Delbrueck-Center for Molecular Medicine Robert-Roessle-Str. 10, 13125 Berlin, Germany http://www.mdc-berlin.de/en/bimsb/BIMSB_groups/Dieterich ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Parallelism tag and job splitter
On Thu, Nov 1, 2012 at 1:48 AM, Edward Hills ehills...@gmail.com wrote: Hi Peter, thanks again. Turns out that it has been implemented by the looks of it in lib/galaxy/datatypes/tabular.py under class Vcf. Yes, looking at the Vcf class it lacks a merge method (the Sam class earlier in the file defines its own - do something similar). However, despite this, it is always the Text class in data.py that is loaded and not the proper Vcf one. Python inheritance means if the Vcf class it lacks a merge method, it would call the parent class' method (the Tabular class, if it had one), or the grandparent class's method (the Text class). So it is falling back on the Text merge which doesn't know about headers. (As an aside, I would like the Tabular merge to be a bit more clever about #header lines, but this isn't trivial as some tabular files contain #comment lines too.) Can you point me in the direction of where the type is chosen? Your tool's XML should specify the output format - although there could be complications if for example you are doing a dynamic format selection based on one of the parameters. Peter ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] hg18.fa on rsync? [was: which .loc file for SAM to BAM?]
Hi, It's still not working. I just noticed that the sam_index dir only contains links to some files in ../seq which is mostly empty except some 2bit files. I could not find any documentation how to obtain these data files. regards, Andreas On 31.10.2012 17:50, Carlos Borroto wrote: On Wed, Oct 31, 2012 at 11:30 AM, Andreas Kuntzagk andreas.kuntz...@mdc-berlin.de wrote: Hi, I'm still setting up a local galaxy. Currently I'm testing the setup of NGS tools. If I try SAM to BAM for a BAM file that has hg18 set as build I get a message that Sequences are not currently available for the specified build. I guess that I have either to manipulate one of the .loc files (but which?) or have to download additional data from rsync server. (I already have the tool-data/shared/hg18 completely) The .loc file you want to modify is 'tool-data/sam_fa_indices.loc'. You can find information about this subject in the wiki[1]. Although the table there is not complete, so you could always find the right xml under 'tools' and poke inside to find a line like this one: validator type=dataset_metadata_in_file filename=sam_fa_indices.loc metadata_name=dbkey metadata_column=1 message=Sequences are not currently available for the specified build. line_startswith=index / [1]http://wiki.g2.bx.psu.edu/Admin/NGS%20Local%20Setup And I agree, dealing with .loc files is quite cumbersome. Hope it helps, Carlos -- Andreas Kuntzagk SystemAdministrator Berlin Institute for Medical Systems Biology at the Max-Delbrueck-Center for Molecular Medicine Robert-Roessle-Str. 10, 13125 Berlin, Germany http://www.mdc-berlin.de/en/bimsb/BIMSB_groups/Dieterich ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] Stop/Start/Reboot/Terminate
Last night I used the Amazon console to stop my working instance. Today started up the instance using amazon console. Waited appropriate time, using new assigned public ip address and no response. Also did a reboot and no response. I can ssh to the instance but do not know what to check for errors. Should you be able to stop/start and instance in amazon console and have it work correctly? Trying to avoid the power down option using the galaxy web interface since I had the problem with new instances being started in a different availability zone where the EBS volume was located. Looks like I will be leaving the master instance running and contributing to amazon profit margins! ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] Unable to fetch the sequence ERROR
Hello, I am trying to fetch sequences using an interval file with columns 'chr', start, end and name. The names in the chr column are like chr1,chr2...chrY... and for the chrMT and the HSCHR*. I could fetch sequences for chr1-Y but it does not fetch for MT and HSCHR* and GL000*.1 I get the error 4536 warnings, 1st is: Unable to fetch the sequence from '33529672' to '329' for chrom 'HSCHR6_MHC_MCF'. Skipped 4536 invalid lines, 1st is #775, HSCHR6_MHC_MCF 33529672 33530001 KIFC1. I locally cached the database as Human Hg19. Now does ur Hg19 contain MT and the other GL000*.1 and haplotype sequences in it ? Or do I change the headers of those lines in the interval file accordingly ? Id be glad if you could get back to me asap. Thanks ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Stop/Start/Reboot/Terminate
Galaxy Cloudman does not support Stop/Start through the AWS interface, this is known to cause problems and should be avoided. The persistence design allows for complete termination and restart -- the issue with your startup zone can be worked around currently by launching through the AWS console (following the instructions at usegalaxy.org/cloud) instead of cloudlaunch. That said, I'm currently updating cloudlaunch to support zone detection and launch, to avoid any issues with that moving forward, but this won't be available on main until monday, most likely. That out of the way, the issue you're experiencing right now is likely caused by an error with cm_boot.py causing duplicated nginx max_client_body_size directives. This problem has been fixed on the back end and won't happen with any new clusters going forward, but for existing clusters experiencing the problem you can probably fix it in in a few short steps: ssh in to the instance edit /opt/galaxy/pkg/nginx/conf/nginx.conf, removing any redundant max_client_body_size directives. There should be exactly one in the file. download a copy of the newest https://s3.amazonaws.com/cloudman/cm_boot.py (save it to your desktop or something) In the AWS console, go to S3 and find your cluster's bucket (it'll have a file called your cluster name.clusterName, for easy identification. Now upload the new cm_boot.py you just saved, replacing the one currently in the bucket. Once the modified file is in your bucket, simply restart the instance via the AWS console and everything *should* come up fine. Sorry for any inconvenience! -Dannon On Nov 1, 2012, at 10:49 AM, Scooter Willis hwil...@scripps.edu wrote: Last night I used the Amazon console to stop my working instance. Today started up the instance using amazon console. Waited appropriate time, using new assigned public ip address and no response. Also did a reboot and no response. I can ssh to the instance but do not know what to check for errors. Should you be able to stop/start and instance in amazon console and have it work correctly? Trying to avoid the power down option using the galaxy web interface since I had the problem with new instances being started in a different availability zone where the EBS volume was located. Looks like I will be leaving the master instance running and contributing to amazon profit margins! ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] hg18.fa on rsync? [was: which .loc file for SAM to BAM?]
Dave, In the meantime I found that out by myself howto generate the FASTA and also to rum samtools faidx on it. The info about all_fasta.loc was missing. But it's still not working. Let me summarize what I did so far: - tool-data/shared/ucsc/hg18/seq/ contains these files: hg18.2bit hg18.fa hg18.fa.fai where hg18.2bit was downloaded from the rsync server and the other two generated from it. - tool-data/shared/ucsc/builds.txt contains this line: hg18Human Mar. 2006 (NCBI36/hg18) (hg18) - tool-data/all_fasta.loc contains this line: hg18hg18Human (Homo sapiens): hg18 tool-data/shared/ucsc/hg18/seq/hg18.fa - tool-data/sam_fa_indices.loc contains this line: index hg18tool-data/shared/ucsc/hg18/sam_index/hg18.fa - tool-data/srma_index.loc contains this line: hg18hg18hg18tool-data/shared/ucsc/hg18/srma_index/hg18.fa So any ideas where to look further? regards, Andreas On 01.11.2012 15:36, Dave Bouvier wrote: Andreas, When setting up the rsync server, we decided that .fa files would be excluded from the listing, since the 2bit format contains the same data but takes up to 75% less space. I would recommend downloading the relevant .2bit file and converting it back to FASTA with twoBitToFa, then updating your all_fasta.loc file to point to the resulting .fa file. --Dave B. On 11/1/12 06:27:49.000, Andreas Kuntzagk wrote: Hi, It's still not working. I just noticed that the sam_index dir only contains links to some files in ../seq which is mostly empty except some 2bit files. I could not find any documentation how to obtain these data files. regards, Andreas On 31.10.2012 17:50, Carlos Borroto wrote: On Wed, Oct 31, 2012 at 11:30 AM, Andreas Kuntzagk andreas.kuntz...@mdc-berlin.de wrote: Hi, I'm still setting up a local galaxy. Currently I'm testing the setup of NGS tools. If I try SAM to BAM for a BAM file that has hg18 set as build I get a message that Sequences are not currently available for the specified build. I guess that I have either to manipulate one of the .loc files (but which?) or have to download additional data from rsync server. (I already have the tool-data/shared/hg18 completely) The .loc file you want to modify is 'tool-data/sam_fa_indices.loc'. You can find information about this subject in the wiki[1]. Although the table there is not complete, so you could always find the right xml under 'tools' and poke inside to find a line like this one: validator type=dataset_metadata_in_file filename=sam_fa_indices.loc metadata_name=dbkey metadata_column=1 message=Sequences are not currently available for the specified build. line_startswith=index / [1]http://wiki.g2.bx.psu.edu/Admin/NGS%20Local%20Setup And I agree, dealing with .loc files is quite cumbersome. Hope it helps, Carlos ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Andreas Kuntzagk SystemAdministrator Berlin Institute for Medical Systems Biology at the Max-Delbrueck-Center for Molecular Medicine Robert-Roessle-Str. 10, 13125 Berlin, Germany http://www.mdc-berlin.de/en/bimsb/BIMSB_groups/Dieterich ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] which .loc file for SAM to BAM?
Sorry for my unclear sentence, I meant the page you pointed me too. regards, Andreas On 01.11.2012 14:50, Carlos Borroto wrote: Well is not my page, I'm not directly related to Galaxy. Still, I think Galaxy Project would be happy to receive updates to the wiki, maybe a complete table can be added to the page you are mentioning. On Thu, Nov 1, 2012 at 4:27 AM, Andreas Kuntzagk andreas.kuntz...@mdc-berlin.de wrote: Hi, thank you for the pointer. I was only looking at this wiki page: http://wiki.g2.bx.psu.edu/Admin/Data%20Integration Maybe this should point to your page? regards, Andreas On 31.10.2012 17:50, Carlos Borroto wrote: On Wed, Oct 31, 2012 at 11:30 AM, Andreas Kuntzagk andreas.kuntz...@mdc-berlin.de wrote: Hi, I'm still setting up a local galaxy. Currently I'm testing the setup of NGS tools. If I try SAM to BAM for a BAM file that has hg18 set as build I get a message that Sequences are not currently available for the specified build. I guess that I have either to manipulate one of the .loc files (but which?) or have to download additional data from rsync server. (I already have the tool-data/shared/hg18 completely) The .loc file you want to modify is 'tool-data/sam_fa_indices.loc'. You can find information about this subject in the wiki[1]. Although the table there is not complete, so you could always find the right xml under 'tools' and poke inside to find a line like this one: validator type=dataset_metadata_in_file filename=sam_fa_indices.loc metadata_name=dbkey metadata_column=1 message=Sequences are not currently available for the specified build. line_startswith=index / [1]http://wiki.g2.bx.psu.edu/Admin/NGS%20Local%20Setup And I agree, dealing with .loc files is quite cumbersome. Hope it helps, Carlos -- Andreas Kuntzagk SystemAdministrator Berlin Institute for Medical Systems Biology at the Max-Delbrueck-Center for Molecular Medicine Robert-Roessle-Str. 10, 13125 Berlin, Germany http://www.mdc-berlin.de/en/bimsb/BIMSB_groups/Dieterich -- Andreas Kuntzagk SystemAdministrator Berlin Institute for Medical Systems Biology at the Max-Delbrueck-Center for Molecular Medicine Robert-Roessle-Str. 10, 13125 Berlin, Germany http://www.mdc-berlin.de/en/bimsb/BIMSB_groups/Dieterich ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Stop/Start/Reboot/Terminate
Dannon Thanks for the update. I was able to power through getting the EC2 instance to start using the cloud man interface in the same zone as the volume. Will let it run until next week. Amazon was not cooperating and kept launching instances in the opposite zone that I had the yaml file pointing to the storage volume. Finally got lucky after a couple attempts. Let me know if you want me to do any testing. Scooter On 11/1/12 11:20 AM, Dannon Baker dannonba...@me.com wrote: Galaxy Cloudman does not support Stop/Start through the AWS interface, this is known to cause problems and should be avoided. The persistence design allows for complete termination and restart -- the issue with your startup zone can be worked around currently by launching through the AWS console (following the instructions at usegalaxy.org/cloud) instead of cloudlaunch. That said, I'm currently updating cloudlaunch to support zone detection and launch, to avoid any issues with that moving forward, but this won't be available on main until monday, most likely. That out of the way, the issue you're experiencing right now is likely caused by an error with cm_boot.py causing duplicated nginx max_client_body_size directives. This problem has been fixed on the back end and won't happen with any new clusters going forward, but for existing clusters experiencing the problem you can probably fix it in in a few short steps: ssh in to the instance edit /opt/galaxy/pkg/nginx/conf/nginx.conf, removing any redundant max_client_body_size directives. There should be exactly one in the file. download a copy of the newest https://s3.amazonaws.com/cloudman/cm_boot.py (save it to your desktop or something) In the AWS console, go to S3 and find your cluster's bucket (it'll have a file called your cluster name.clusterName, for easy identification. Now upload the new cm_boot.py you just saved, replacing the one currently in the bucket. Once the modified file is in your bucket, simply restart the instance via the AWS console and everything *should* come up fine. Sorry for any inconvenience! -Dannon On Nov 1, 2012, at 10:49 AM, Scooter Willis hwil...@scripps.edu wrote: Last night I used the Amazon console to stop my working instance. Today started up the instance using amazon console. Waited appropriate time, using new assigned public ip address and no response. Also did a reboot and no response. I can ssh to the instance but do not know what to check for errors. Should you be able to stop/start and instance in amazon console and have it work correctly? Trying to avoid the power down option using the galaxy web interface since I had the problem with new instances being started in a different availability zone where the EBS volume was located. Looks like I will be leaving the master instance running and contributing to amazon profit margins! ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] hg18.fa on rsync? [was: which .loc file for SAM to BAM?]
Andreas, I recommend moving hg18.fa.fai into tool-data/shared/ucsc/hg18/sam_index/, and then samtools should work. Also, if you're using picard tools, you'll want hg18.fa.fai and hg18.dict in tool-data/shared/ucsc/hg18/srma_index/, as well as links to hg18.fa in both directories. --Dave B. On 11/1/12 11:59:24.000, Andreas Kuntzagk wrote: Dave, In the meantime I found that out by myself howto generate the FASTA and also to rum samtools faidx on it. The info about all_fasta.loc was missing. But it's still not working. Let me summarize what I did so far: - tool-data/shared/ucsc/hg18/seq/ contains these files: hg18.2bit hg18.fahg18.fa.fai where hg18.2bit was downloaded from the rsync server and the other two generated from it. - tool-data/shared/ucsc/builds.txt contains this line: hg18Human Mar. 2006 (NCBI36/hg18) (hg18) - tool-data/all_fasta.loc contains this line: hg18hg18Human (Homo sapiens): hg18 tool-data/shared/ucsc/hg18/seq/hg18.fa - tool-data/sam_fa_indices.loc contains this line: indexhg18tool-data/shared/ucsc/hg18/sam_index/hg18.fa - tool-data/srma_index.loc contains this line: hg18hg18hg18tool-data/shared/ucsc/hg18/srma_index/hg18.fa So any ideas where to look further? regards, Andreas On 01.11.2012 15:36, Dave Bouvier wrote: Andreas, When setting up the rsync server, we decided that .fa files would be excluded from the listing, since the 2bit format contains the same data but takes up to 75% less space. I would recommend downloading the relevant .2bit file and converting it back to FASTA with twoBitToFa, then updating your all_fasta.loc file to point to the resulting .fa file. --Dave B. On 11/1/12 06:27:49.000, Andreas Kuntzagk wrote: Hi, It's still not working. I just noticed that the sam_index dir only contains links to some files in ../seq which is mostly empty except some 2bit files. I could not find any documentation how to obtain these data files. regards, Andreas On 31.10.2012 17:50, Carlos Borroto wrote: On Wed, Oct 31, 2012 at 11:30 AM, Andreas Kuntzagk andreas.kuntz...@mdc-berlin.de wrote: Hi, I'm still setting up a local galaxy. Currently I'm testing the setup of NGS tools. If I try SAM to BAM for a BAM file that has hg18 set as build I get a message that Sequences are not currently available for the specified build. I guess that I have either to manipulate one of the .loc files (but which?) or have to download additional data from rsync server. (I already have the tool-data/shared/hg18 completely) The .loc file you want to modify is 'tool-data/sam_fa_indices.loc'. You can find information about this subject in the wiki[1]. Although the table there is not complete, so you could always find the right xml under 'tools' and poke inside to find a line like this one: validator type=dataset_metadata_in_file filename=sam_fa_indices.loc metadata_name=dbkey metadata_column=1 message=Sequences are not currently available for the specified build. line_startswith=index / [1]http://wiki.g2.bx.psu.edu/Admin/NGS%20Local%20Setup And I agree, dealing with .loc files is quite cumbersome. Hope it helps, Carlos ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] JBrowse direct export to Galaxy
I just noticed the part: files_0|url_paste: http://jbrowse-server.org/path/to/file.vcf; JBrowse is fully client-side, so there is no jbrowse server we can use. Is there a way to pass data to galaxy directly without the server? This functionality could also be useful for desktop applications. On Tue, Oct 23, 2012 at 9:59 PM, Brad Chapman chapm...@50mail.com wrote: Erik and Jeremy; I am a JBrowse Dev hoping to add the ability to export data directly from JBrowse (JavaScript) to Galaxy The API could be used for this. Specifically, you could do an upload for the user from a URL via the tools API. I know that Brad Chapman (cc'd) has done this successfully recently. Brad, can you share the parameters used to do this? Definitely, you want to do a POST to: /api/tools?key=YOURAPIKEY with a JSON payload of: {tool_id: upload1, history_id: identifier of history to use, params: {file_type: vcf, dbkey: hg19, files_0|url_paste: http://jbrowse-server.org/path/to/file.vcf;, files_0|NAME: file_name_for_history.vcf}} This is all new since the last galaxy-dist release, so you'd need a recent galaxy-central server. Passing it back to Jeremy, there is also a Javascript wrapper around the tools API which might help: https://bitbucket.org/galaxy/galaxy-central/src/tip/static/scripts/mvc/tools.js but I haven't used it myself. Hope this helps, Brad Thanks, Erik Derohanian ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Empty TopHat output
Thanks, Jeremy. I updated my instance of Galaxy by the repository you provided ( https://bitbucket.org/galaxy/galaxy-dist/) and re-ran my analysis. I see that TopHat is generating data (by monitoring the disk usage on the Cloudman interface) again, but this time I the output file is in the error state and the message returned is: Job output not returned from cluster. Any thoughts? Thanks in advance! Cheers, Mo Heydarian PhD candidate The Johns Hopkins School of Medicine Department of Biological Chemistry 725 Wolfe Street 414 Hunterian Baltimore, MD 21205 On Wed, Oct 31, 2012 at 9:03 PM, Jeremy Goecks jeremy.goe...@emory.eduwrote: In this case, it's useful to differentiate between (i) the AMI that Galaxy Cloud uses and (ii) the Galaxy code running on the cloud. I suspect that (ii) is out of data for you; this is not (yet) automatically updated, even when starting a new instance. Try using the admin console to update to the most recent Galaxy dist using this URL: https://bitbucket.org/galaxy/galaxy-dist/ (not galaxy-central, as is the default) Best, J. On Oct 31, 2012, at 8:36 PM, Mohammad Heydarian wrote: We are running galaxy-cloudman-2011-03-22 (ami-da58aab3). Our latest instance was loaded up just last week. Thanks! Cheers, Mo Heydarian PhD candidate The Johns Hopkins School of Medicine Department of Biological Chemistry 725 Wolfe Street 414 Hunterian Baltimore, MD 21205 On Wed, Oct 31, 2012 at 8:30 PM, Jeremy Goecks jeremy.goe...@emory.eduwrote: Given that this doesn't seem to be happening on our public server or on local instances, my best guess is that the issue is old code. Are you running the most recent dist? J. On Oct 31, 2012, at 7:37 PM, Mohammad Heydarian wrote: We are still getting empty TopHat output files on our Galaxy instance on the cloud. We see that TopHat is generating data while the tool is running (by monitoring our disk usage on the Amazon cloud), but the output is empty files. Is anyone else having this issue? Does anyone have any suggestions? Many thanks in advance! Cheers, Mo Heydarian On Mon, Oct 15, 2012 at 4:53 AM, Joachim Jacob joachim.ja...@vib.bewrote: The same here. Cheers, Joachim -- Joachim Jacob, PhD Rijvisschestraat 120, 9052 Zwijnaarde Tel: +32 9 244.66.34 Bioinformatics Training and Services (BITS) http://www.bits.vib.be @bitsatvib __**_ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] Dynamic job runner status? Could I use it in combination with job splitting?
Hi, I've been researching the possibility of using dynamic job runner in combination with job splitting for blast jobs. My main interest is to create a rule where both the size of the query and the database are taken into consideration in order to select DRMAA and splitting options. My first question is what is the status on dynamic job runner? I found these two threads, but is not clear to me if this feature is part of galaxy-dist already: http://lists.bx.psu.edu/pipermail/galaxy-dev/2011-October/007160.html http://lists.bx.psu.edu/pipermail/galaxy-dev/2012-June/010080.html My second question is if there is any documentation other than the threads above to configure something like what I describe? In any case, there is very good information from John in these emails and I think that should get me started at least. Cheers, Carlos ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] Where should I put a tool's required binaries?
I installed the bedtools tool from the public toolshed to my galaxy cluster. The jobs were failing because I hadn't installed the required binaries: An error occurred running this job: /opt/sge/default/spool/execd/ip-10-194-50-118/job_scripts/2: line 13: genomeCoverageBed: command not found Taking a closer look at the toolshed's page, I noticed there were a handful of required binaries (genomeCoverageBed, intersectBed, etc). So I downloaded BEDTools and compiled all the binaries. I've put them in well organized directories using symlinks to specific versions, etc. The final executable directory containing symlinks to these binaries is /mnt/galaxyTools/shed_tool_binaries/bin/. I can make a snapshot of the tools volume so that these compiled binaries are always available to me when I bring up my cluster, but how do I integrate them into the PATH in a way that lets the bedtools galaxy tool see them, but also survive cluster shutdowns? Thanks, -Joel ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] JBrowse direct export to Galaxy
JBrowse is fully client-side, so there is no jbrowse server we can use. Is there a way to pass data to galaxy directly without the server? This functionality could also be useful for desktop applications. This isn't possible right now, though you could imagine using FTP with the API to achieve this. J.___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] Filter by quality error
I have a new workflow setup and filter by quality on an input fastqsanger file. I get the following error message where the original elements worked correctly. The IP address is a compute node. I have one compute node running with autoscaling on. Any reason why the mnt is working? /opt/sge/default/spool/execd/ip-10-140-4-84/job_scripts/39: line 13: /mnt/galaxyTools/tools/fastx_toolkit/0.0.13/env.sh: No such file or directory /opt/sge/default/spool/execd/ip-10-140-4-84/job_scripts/39: line 13: cd: /mnt/galaxyTools/galaxy-central: No such file or directory /opt/sge/default/spool/execd/ip-10-140-4-84/job_scripts/39: line 13: /mnt/galaxyTools/galaxy-central/set_metadata.sh: No such file or directory ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/