Re: [galaxy-dev] Fwd: pass more information on a dataset merge

2012-11-01 Thread Alex.Khassapov
Hi John,

Do you think it's possible to create a test for your 'm: format? I couldn't 
find how to specify multi input files for the test.

-Alex

-Original Message-
From: jmchil...@gmail.com [mailto:jmchil...@gmail.com] On Behalf Of John Chilton
Sent: Tuesday, 23 October 2012 7:59 AM
To: Jorrit Boekel
Cc: Khassapov, Alex (CSIRO IMT, Clayton)
Subject: Re: Fwd: [galaxy-dev] pass more information on a dataset merge

Hello again Jorrit,

Great, I am glad we are largely on the same page here. I don't know when I will 
get a chance to look at this particular aspect, if you get there first that 
will be great, if not I will get there eventually.

-John

On Mon, Oct 22, 2012 at 2:51 AM, Jorrit Boekel jorrit.boe...@scilifelab.se 
wrote:
 IIRC, I implemented the task_X suffix (galaxy does so as well but to
 the split subdirectories) to ensure jobs that contained multiple split
 datasets would be run in sync. Files from two datasets that belong
 together then get analysed together in subsequent steps.

 It would however be much nicer to retain original file names through a
 pipeline or at least the possibility to retrieve them. Since the
 split/merge now run actively look and match for files with identical
 'task_x', it may be an option to do:

 fraction1.raw - fraction1.raw_dataset_43.dat_task_0 -
 fraction1.raw_dataset_44.dat_task_0
 fraction2.raw - fraction2.raw_dataset_43.dat_task_1 -
 fraction2.raw_dataset_44.dat_task_1

 (Note that python starts counting at 0, while most researchers number
 their first fraction 1.)

 I wouldn't mind looking more into that as well, since it would be a
 big improvement UI-wise.

 cheers,
 jorrit






 On 10/19/2012 04:40 PM, John Chilton wrote:

 Jorrit I meant to cc you on this response to Alex.

 -- Forwarded message --
 From: John Chilton chil0...@umn.edu
 Date: Fri, Oct 19, 2012 at 9:40 AM
 Subject: Re: [galaxy-dev] pass more information on a dataset merge
 To: alex.khassa...@csiro.au


 Hey Alex,

 I think the idea here is that your initially uploaded files would
 have different names, but after Jorrit's tool split/merge step they
 will all just be named after the dataset id (see screenshot) so you
 need the task_X at the end so they don't all just have the same name.

 I have not thought a whole lot about the naming thing, in general it
 seems like a tough problem and one that Galaxy itself doesn't do a
 particularly good job at.

 Jorrit have you given any thought to this?

 I wonder if it would be feasible to use the initial uploaded name as
 a sort of prefix going forward. So if I upload say

 fraction1.RAW
 fraction2.RAW
 fraction3.RAW

 and run a conversion step, maybe I could get:

 fraction1_dataset567.ms2
 fraction2_dataset567.ms2
 fraction3_dataset567.ms2

 instead of

 dataset567.dat_task_0
 dataset567.dat_task_1
 dataset567.dat_task_2

 Jorrit do you mind if I give implementing that a shot? It seems like
 it would be a win to me. Am I am going to hit some problem I don't
 see now (presumable we have to send some data from the split to the
 merge and that might be tricky)?

 -John

 On Thu, Oct 18, 2012 at 7:00 PM,  alex.khassa...@csiro.au wrote:

 Thanks John,

 I wonder what's the reason for appending _task_XX to the file names,
 why can't we just keep original file names?

 Alex

 -Original Message-
 From: jmchil...@gmail.com [mailto:jmchil...@gmail.com] On Behalf Of
 John Chilton
 Sent: Friday, 19 October 2012 6:16 AM
 To: Khassapov, Alex (CSIRO IMT, Clayton)
 Subject: Re: [galaxy-dev] pass more information on a dataset merge

 On Tue, Oct 16, 2012 at 11:11 PM,  alex.khassa...@csiro.au wrote:

 Hi John,

 I am definitely interested in this idea, not only me - we are
 currently working on moving a few scientific tools (not related to
 genome) into cloud using Galaxy.

 Great. My interests in Galaxy are mostly outside of genomics as
 well, it is good to have more people utilizing Galaxy in this way
 because it will force the platform to become more generic and
 address more broader use cases.

 We will try it further and see if we need any changes. For now one
 improvement would be nice, make dataset_id.dat contain list of
 paths to the location of the uploaded files, so by displaying html
 page the user could just click on the link and download the file.

 Code that attempted to do this was in there, but didn't work
 obviously. I have now fixed it up.

 Thanks for beta testing.

 -John

 We are pretty new to Galaxy, so our understanding of Galaxy is
 pretty limited.

 Thanks again,

 Alex


 -Original Message-
 From: jmchil...@gmail.com [mailto:jmchil...@gmail.com] On Behalf Of
 John Chilton
 Sent: Wednesday, 17 October 2012 3:21 AM
 To: Khassapov, Alex (CSIRO IMT, Clayton)
 Subject: Re: [galaxy-dev] pass more information on a dataset merge

 Wow, thanks for the rapid feedback! I have made the changes you
 have suggested. It seems you must be interested in this
 idea/implementation. Let me know if you have specific use

Re: [galaxy-dev] Dataset upload fail

2012-11-01 Thread Hans-Rudolf Hotz

Hi Vladimir

On 11/01/2012 03:46 AM, Vladimir Yamshchikov wrote:

Local install of Galaxy on SciLinux55. Fails to upload 5.2 GB fastq file
from local HD, while normally loading smaller fastq and fasta datasets
(less than 1 GB). Chunks of 1.2 GB size remain in */database/tmp, which
all represent beginning of the file that fails to upload. Several
attempts to upload made and several chunks of the same size are present.
Can I just copy the file dataset to database directory instead of
uploading through web interface?




Make sure you provide a directory for library_import_dir in the 
universe_wsgi.ini file.  And copy your files to this location (or a sub 
directory of it). This will give you an extra option in the admin menu: 
'Upload directory of files' and then use the 'Link files without copying 
to Galaxy' option.



Alternatively, set allow_library_path_paste to 'True' (see also the 
comments and warnings in the universe_wsgi.ini file )



Regards, Hans-Rudolf









___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-dev] which .loc file for SAM to BAM?

2012-11-01 Thread Andreas Kuntzagk

Hi,

thank you for the pointer.
I was only looking at this wiki page:
http://wiki.g2.bx.psu.edu/Admin/Data%20Integration

Maybe this should point to your page?

regards, Andreas

On 31.10.2012 17:50, Carlos Borroto wrote:

On Wed, Oct 31, 2012 at 11:30 AM, Andreas Kuntzagk
andreas.kuntz...@mdc-berlin.de wrote:

Hi,

I'm still setting up a local galaxy. Currently I'm testing the setup of NGS
tools. If I try SAM to BAM for a BAM file that has hg18 set as build I
get a message that
Sequences are not currently available for the specified build. I guess
that I have either to manipulate one of the .loc files (but which?) or have
to download additional data from rsync server.
(I already have the tool-data/shared/hg18 completely)



The .loc file you want to modify is 'tool-data/sam_fa_indices.loc'.
You can find information about this subject in the wiki[1]. Although
the table there is not complete, so you could always find the right
xml under 'tools' and poke inside to find a line like this one:
validator type=dataset_metadata_in_file
filename=sam_fa_indices.loc metadata_name=dbkey
metadata_column=1 message=Sequences are not currently available for
the specified build. line_startswith=index /

[1]http://wiki.g2.bx.psu.edu/Admin/NGS%20Local%20Setup

And I agree, dealing with .loc files is quite cumbersome.

Hope it helps,
Carlos



--
Andreas Kuntzagk

SystemAdministrator

Berlin Institute for Medical Systems Biology at the
Max-Delbrueck-Center for Molecular Medicine
Robert-Roessle-Str. 10, 13125 Berlin, Germany

http://www.mdc-berlin.de/en/bimsb/BIMSB_groups/Dieterich
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-dev] Parallelism tag and job splitter

2012-11-01 Thread Peter Cock
On Thu, Nov 1, 2012 at 1:48 AM, Edward Hills ehills...@gmail.com wrote:
 Hi Peter, thanks again.

 Turns out that it has been implemented by the looks of it in
 lib/galaxy/datatypes/tabular.py under class Vcf.

Yes, looking at the Vcf class it lacks a merge method (the Sam class
earlier in the file defines its own - do something similar).

 However, despite this, it
 is always the Text class in data.py that is loaded and not the proper Vcf
 one.

Python inheritance means if the Vcf class it lacks a merge method,
it would call the parent class' method (the Tabular class, if it had one),
or the grandparent class's method (the Text class). So it is falling
back on the Text merge which doesn't know about headers.

(As an aside, I would like the Tabular merge to be a bit more
clever about #header lines, but this isn't trivial as some tabular
files contain #comment lines too.)

 Can you point me in the direction of where the type is chosen?

Your tool's XML should specify the output format - although there
could be complications if for example you are doing a dynamic
format selection based on one of the parameters.

Peter
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-dev] hg18.fa on rsync? [was: which .loc file for SAM to BAM?]

2012-11-01 Thread Andreas Kuntzagk

Hi,

It's still not working. I just noticed that the sam_index dir only contains links to some files in 
../seq which is mostly empty except some 2bit files.

I could not find any documentation how to obtain these data files.

regards, Andreas

On 31.10.2012 17:50, Carlos Borroto wrote:

On Wed, Oct 31, 2012 at 11:30 AM, Andreas Kuntzagk
andreas.kuntz...@mdc-berlin.de wrote:

Hi,

I'm still setting up a local galaxy. Currently I'm testing the setup of NGS
tools. If I try SAM to BAM for a BAM file that has hg18 set as build I
get a message that
Sequences are not currently available for the specified build. I guess
that I have either to manipulate one of the .loc files (but which?) or have
to download additional data from rsync server.
(I already have the tool-data/shared/hg18 completely)



The .loc file you want to modify is 'tool-data/sam_fa_indices.loc'.
You can find information about this subject in the wiki[1]. Although
the table there is not complete, so you could always find the right
xml under 'tools' and poke inside to find a line like this one:
validator type=dataset_metadata_in_file
filename=sam_fa_indices.loc metadata_name=dbkey
metadata_column=1 message=Sequences are not currently available for
the specified build. line_startswith=index /

[1]http://wiki.g2.bx.psu.edu/Admin/NGS%20Local%20Setup

And I agree, dealing with .loc files is quite cumbersome.

Hope it helps,
Carlos



--
Andreas Kuntzagk

SystemAdministrator

Berlin Institute for Medical Systems Biology at the
Max-Delbrueck-Center for Molecular Medicine
Robert-Roessle-Str. 10, 13125 Berlin, Germany

http://www.mdc-berlin.de/en/bimsb/BIMSB_groups/Dieterich
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


[galaxy-dev] Stop/Start/Reboot/Terminate

2012-11-01 Thread Scooter Willis

Last night I used the Amazon console to stop my working instance. Today started 
up the instance using amazon console. Waited appropriate time, using new 
assigned public ip address and no response. Also did a reboot and no response. 
I can ssh to the instance but do not know what to check for errors.

Should you be able to stop/start and instance in amazon console and have it 
work correctly? Trying to avoid the power down option using the galaxy web 
interface since I had the problem with new instances being started in a 
different availability zone where the EBS volume was located.

Looks like I will be leaving the master instance running and contributing to 
amazon profit margins!


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] Unable to fetch the sequence ERROR

2012-11-01 Thread Aarthi Talla
Hello,

I am trying to fetch sequences using an interval file with columns 'chr', 
start, end and name.
The names in the chr column are like chr1,chr2...chrY... and for the chrMT and 
the HSCHR*.
I could fetch sequences for chr1-Y but it does not fetch for MT and HSCHR* and 
GL000*.1

I get the error 4536 warnings, 1st is: Unable to fetch the sequence from 
'33529672' to '329' for chrom 'HSCHR6_MHC_MCF'.
Skipped 4536 invalid lines, 1st is #775, HSCHR6_MHC_MCF 33529672 33530001 
KIFC1.

I locally cached the database as Human Hg19.

Now does ur Hg19 contain MT and the other GL000*.1 and haplotype sequences in 
it ? Or do I change the headers of those lines in the interval file accordingly 
?

Id be glad if you could get back to me asap.

Thanks
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Stop/Start/Reboot/Terminate

2012-11-01 Thread Dannon Baker
Galaxy Cloudman does not support Stop/Start through the AWS interface, this is 
known to cause problems and should be avoided.  The persistence design allows 
for complete termination and restart -- the issue with your startup zone can be 
worked around currently by launching through the AWS console (following the 
instructions at usegalaxy.org/cloud) instead of cloudlaunch.  That said, I'm 
currently updating cloudlaunch to support zone detection and launch, to avoid 
any issues with that moving forward, but this won't be available on main until 
monday, most likely.

That out of the way, the issue you're experiencing right now is likely caused 
by an error with cm_boot.py causing duplicated nginx max_client_body_size 
directives.

This problem has been fixed on the back end and won't happen with any new 
clusters going forward, but for existing clusters experiencing the problem you 
can probably fix it in in a few short steps:

ssh in to the instance
edit /opt/galaxy/pkg/nginx/conf/nginx.conf, removing any redundant 
max_client_body_size directives.  There should be exactly one in the file.
download a copy of the newest https://s3.amazonaws.com/cloudman/cm_boot.py 
(save it to your desktop or something)
In the AWS console, go to S3 and find your cluster's bucket (it'll have a file 
called your cluster name.clusterName, for easy identification.  Now upload 
the new cm_boot.py you just saved, replacing the one currently in the bucket.

Once the modified file is in your bucket, simply restart the instance via the 
AWS console and everything *should* come up fine.

Sorry for any inconvenience!

-Dannon



On Nov 1, 2012, at 10:49 AM, Scooter Willis hwil...@scripps.edu wrote:

 
 Last night I used the Amazon console to stop my working instance. Today 
 started up the instance using amazon console. Waited appropriate time, using 
 new assigned public ip address and no response. Also did a reboot and no 
 response. I can ssh to the instance but do not know what to check for errors.
 
 Should you be able to stop/start and instance in amazon console and have it 
 work correctly? Trying to avoid the power down option using the galaxy web 
 interface since I had the problem with new instances being started in a 
 different availability zone where the EBS volume was located.
 
 Looks like I will be leaving the master instance running and contributing to 
 amazon profit margins!
 
 
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 
  http://lists.bx.psu.edu/

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] hg18.fa on rsync? [was: which .loc file for SAM to BAM?]

2012-11-01 Thread Andreas Kuntzagk

Dave,

In the meantime I found that out by myself howto generate the FASTA and also to rum samtools faidx 
on it. The info about all_fasta.loc was missing. But it's still not working.


Let me summarize what I did so far:

- tool-data/shared/ucsc/hg18/seq/ contains these files:
hg18.2bit  hg18.fa  hg18.fa.fai

where hg18.2bit was downloaded from the rsync server and the other two 
generated from it.

- tool-data/shared/ucsc/builds.txt contains this line:

hg18Human Mar. 2006 (NCBI36/hg18) (hg18)

- tool-data/all_fasta.loc contains this line:

hg18hg18Human (Homo sapiens): hg18  
tool-data/shared/ucsc/hg18/seq/hg18.fa

- tool-data/sam_fa_indices.loc contains this line:

index   hg18tool-data/shared/ucsc/hg18/sam_index/hg18.fa

- tool-data/srma_index.loc contains this line:

hg18hg18hg18tool-data/shared/ucsc/hg18/srma_index/hg18.fa


So any ideas where to look further?


regards, Andreas

On 01.11.2012 15:36, Dave Bouvier wrote:

Andreas,

When setting up the rsync server, we decided that .fa files would be excluded 
from the listing,
since the 2bit format contains the same data but takes up to 75% less space. I 
would recommend
downloading the relevant .2bit file and converting it back to FASTA with 
twoBitToFa, then updating
your all_fasta.loc file to point to the resulting .fa file.

--Dave B.

On 11/1/12 06:27:49.000, Andreas Kuntzagk wrote:

Hi,

It's still not working. I just noticed that the sam_index dir only
contains links to some files in ../seq which is mostly empty except some
2bit files.
I could not find any documentation how to obtain these data files.

regards, Andreas

On 31.10.2012 17:50, Carlos Borroto wrote:

On Wed, Oct 31, 2012 at 11:30 AM, Andreas Kuntzagk
andreas.kuntz...@mdc-berlin.de wrote:

Hi,

I'm still setting up a local galaxy. Currently I'm testing the setup
of NGS
tools. If I try SAM to BAM for a BAM file that has hg18 set as
build I
get a message that
Sequences are not currently available for the specified build. I guess
that I have either to manipulate one of the .loc files (but which?)
or have
to download additional data from rsync server.
(I already have the tool-data/shared/hg18 completely)



The .loc file you want to modify is 'tool-data/sam_fa_indices.loc'.
You can find information about this subject in the wiki[1]. Although
the table there is not complete, so you could always find the right
xml under 'tools' and poke inside to find a line like this one:
validator type=dataset_metadata_in_file
filename=sam_fa_indices.loc metadata_name=dbkey
metadata_column=1 message=Sequences are not currently available for
the specified build. line_startswith=index /

[1]http://wiki.g2.bx.psu.edu/Admin/NGS%20Local%20Setup

And I agree, dealing with .loc files is quite cumbersome.

Hope it helps,
Carlos




___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


--
Andreas Kuntzagk

SystemAdministrator

Berlin Institute for Medical Systems Biology at the
Max-Delbrueck-Center for Molecular Medicine
Robert-Roessle-Str. 10, 13125 Berlin, Germany

http://www.mdc-berlin.de/en/bimsb/BIMSB_groups/Dieterich
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-dev] which .loc file for SAM to BAM?

2012-11-01 Thread Andreas Kuntzagk

Sorry for my unclear sentence, I meant the page you pointed me too.

regards, Andreas

On 01.11.2012 14:50, Carlos Borroto wrote:

Well is not my page, I'm not directly related to Galaxy. Still, I
think Galaxy Project would be happy to receive updates to the wiki,
maybe a complete table can be added to the page you are mentioning.

On Thu, Nov 1, 2012 at 4:27 AM, Andreas Kuntzagk
andreas.kuntz...@mdc-berlin.de wrote:

Hi,

thank you for the pointer.
I was only looking at this wiki page:
http://wiki.g2.bx.psu.edu/Admin/Data%20Integration

Maybe this should point to your page?

regards, Andreas


On 31.10.2012 17:50, Carlos Borroto wrote:


On Wed, Oct 31, 2012 at 11:30 AM, Andreas Kuntzagk
andreas.kuntz...@mdc-berlin.de wrote:


Hi,

I'm still setting up a local galaxy. Currently I'm testing the setup of
NGS
tools. If I try SAM to BAM for a BAM file that has hg18 set as build
I
get a message that
Sequences are not currently available for the specified build. I guess
that I have either to manipulate one of the .loc files (but which?) or
have
to download additional data from rsync server.
(I already have the tool-data/shared/hg18 completely)



The .loc file you want to modify is 'tool-data/sam_fa_indices.loc'.
You can find information about this subject in the wiki[1]. Although
the table there is not complete, so you could always find the right
xml under 'tools' and poke inside to find a line like this one:
validator type=dataset_metadata_in_file
filename=sam_fa_indices.loc metadata_name=dbkey
metadata_column=1 message=Sequences are not currently available for
the specified build. line_startswith=index /

[1]http://wiki.g2.bx.psu.edu/Admin/NGS%20Local%20Setup

And I agree, dealing with .loc files is quite cumbersome.

Hope it helps,
Carlos



--
Andreas Kuntzagk

SystemAdministrator

Berlin Institute for Medical Systems Biology at the
Max-Delbrueck-Center for Molecular Medicine
Robert-Roessle-Str. 10, 13125 Berlin, Germany

http://www.mdc-berlin.de/en/bimsb/BIMSB_groups/Dieterich


--
Andreas Kuntzagk

SystemAdministrator

Berlin Institute for Medical Systems Biology at the
Max-Delbrueck-Center for Molecular Medicine
Robert-Roessle-Str. 10, 13125 Berlin, Germany

http://www.mdc-berlin.de/en/bimsb/BIMSB_groups/Dieterich
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-dev] Stop/Start/Reboot/Terminate

2012-11-01 Thread Scooter Willis
Dannon

Thanks for the update. I was able to power through getting the EC2
instance to start using the cloud man interface in the same zone as the
volume. Will let it run until next week. Amazon was not cooperating and
kept launching instances in the opposite zone that I had the yaml file
pointing to the storage volume. Finally got lucky after a couple attempts.
Let me know if you want me to do any testing.

Scooter 

On 11/1/12 11:20 AM, Dannon Baker dannonba...@me.com wrote:

Galaxy Cloudman does not support Stop/Start through the AWS interface,
this is known to cause problems and should be avoided.  The persistence
design allows for complete termination and restart -- the issue with your
startup zone can be worked around currently by launching through the AWS
console (following the instructions at usegalaxy.org/cloud) instead of
cloudlaunch.  That said, I'm currently updating cloudlaunch to support
zone detection and launch, to avoid any issues with that moving forward,
but this won't be available on main until monday, most likely.

That out of the way, the issue you're experiencing right now is likely
caused by an error with cm_boot.py causing duplicated nginx
max_client_body_size directives.

This problem has been fixed on the back end and won't happen with any new
clusters going forward, but for existing clusters experiencing the
problem you can probably fix it in in a few short steps:

ssh in to the instance
edit /opt/galaxy/pkg/nginx/conf/nginx.conf, removing any redundant
max_client_body_size directives.  There should be exactly one in the file.
download a copy of the newest
https://s3.amazonaws.com/cloudman/cm_boot.py (save it to your desktop or
something)
In the AWS console, go to S3 and find your cluster's bucket (it'll have a
file called your cluster name.clusterName, for easy identification.
Now upload the new cm_boot.py you just saved, replacing the one currently
in the bucket.

Once the modified file is in your bucket, simply restart the instance via
the AWS console and everything *should* come up fine.

Sorry for any inconvenience!

-Dannon



On Nov 1, 2012, at 10:49 AM, Scooter Willis hwil...@scripps.edu wrote:

 
 Last night I used the Amazon console to stop my working instance. Today
started up the instance using amazon console. Waited appropriate time,
using new assigned public ip address and no response. Also did a reboot
and no response. I can ssh to the instance but do not know what to check
for errors.
 
 Should you be able to stop/start and instance in amazon console and
have it work correctly? Trying to avoid the power down option using
the galaxy web interface since I had the problem with new instances
being started in a different availability zone where the EBS volume was
located.
 
 Looks like I will be leaving the master instance running and
contributing to amazon profit margins!
 
 
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 
  http://lists.bx.psu.edu/



___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] hg18.fa on rsync? [was: which .loc file for SAM to BAM?]

2012-11-01 Thread Dave Bouvier

Andreas,

I recommend moving hg18.fa.fai into 
tool-data/shared/ucsc/hg18/sam_index/, and then samtools should work. 
Also, if you're using picard tools, you'll want hg18.fa.fai and 
hg18.dict in tool-data/shared/ucsc/hg18/srma_index/, as well as links to 
hg18.fa in both directories.



   --Dave B.

On 11/1/12 11:59:24.000, Andreas Kuntzagk wrote:

Dave,

In the meantime I found that out by myself howto generate the FASTA and
also to rum samtools faidx on it. The info about all_fasta.loc was
missing. But it's still not working.

Let me summarize what I did so far:

- tool-data/shared/ucsc/hg18/seq/ contains these files:
hg18.2bit  hg18.fahg18.fa.fai

where hg18.2bit was downloaded from the rsync server and the other two
generated from it.

- tool-data/shared/ucsc/builds.txt contains this line:

hg18Human Mar. 2006 (NCBI36/hg18) (hg18)

- tool-data/all_fasta.loc contains this line:

hg18hg18Human (Homo sapiens): hg18
tool-data/shared/ucsc/hg18/seq/hg18.fa

- tool-data/sam_fa_indices.loc contains this line:

indexhg18tool-data/shared/ucsc/hg18/sam_index/hg18.fa

- tool-data/srma_index.loc contains this line:

hg18hg18hg18tool-data/shared/ucsc/hg18/srma_index/hg18.fa


So any ideas where to look further?


regards, Andreas

On 01.11.2012 15:36, Dave Bouvier wrote:

Andreas,

When setting up the rsync server, we decided that .fa files would be
excluded from the listing,
since the 2bit format contains the same data but takes up to 75% less
space. I would recommend
downloading the relevant .2bit file and converting it back to FASTA
with twoBitToFa, then updating
your all_fasta.loc file to point to the resulting .fa file.

--Dave B.

On 11/1/12 06:27:49.000, Andreas Kuntzagk wrote:

Hi,

It's still not working. I just noticed that the sam_index dir only
contains links to some files in ../seq which is mostly empty except some
2bit files.
I could not find any documentation how to obtain these data files.

regards, Andreas

On 31.10.2012 17:50, Carlos Borroto wrote:

On Wed, Oct 31, 2012 at 11:30 AM, Andreas Kuntzagk
andreas.kuntz...@mdc-berlin.de wrote:

Hi,

I'm still setting up a local galaxy. Currently I'm testing the setup
of NGS
tools. If I try SAM to BAM for a BAM file that has hg18 set as
build I
get a message that
Sequences are not currently available for the specified build. I
guess
that I have either to manipulate one of the .loc files (but which?)
or have
to download additional data from rsync server.
(I already have the tool-data/shared/hg18 completely)



The .loc file you want to modify is 'tool-data/sam_fa_indices.loc'.
You can find information about this subject in the wiki[1]. Although
the table there is not complete, so you could always find the right
xml under 'tools' and poke inside to find a line like this one:
validator type=dataset_metadata_in_file
filename=sam_fa_indices.loc metadata_name=dbkey
metadata_column=1 message=Sequences are not currently available for
the specified build. line_startswith=index /

[1]http://wiki.g2.bx.psu.edu/Admin/NGS%20Local%20Setup

And I agree, dealing with .loc files is quite cumbersome.

Hope it helps,
Carlos




___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/



___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-dev] JBrowse direct export to Galaxy

2012-11-01 Thread Erik Derohanian
I just noticed the part:  files_0|url_paste:
http://jbrowse-server.org/path/to/file.vcf;

JBrowse is fully client-side, so there is no jbrowse server we can use. Is
there a way to pass data to galaxy directly without the server? This
functionality could also be useful for desktop applications.



On Tue, Oct 23, 2012 at 9:59 PM, Brad Chapman chapm...@50mail.com wrote:


 Erik and Jeremy;

  I am a JBrowse Dev hoping to add the ability to export data directly
  from JBrowse (JavaScript) to Galaxy
 
  The API could be used for this.
 
  Specifically, you could do an upload for the user from a URL via the
  tools API. I know that Brad Chapman (cc'd) has done this successfully
  recently. Brad, can you share the parameters used to do this?

 Definitely, you want to do a POST to:

 /api/tools?key=YOURAPIKEY

 with a JSON payload of:

 {tool_id: upload1,
  history_id: identifier of history to use,
  params: {file_type: vcf,
 dbkey: hg19,
 files_0|url_paste:
 http://jbrowse-server.org/path/to/file.vcf;,
 files_0|NAME: file_name_for_history.vcf}}

 This is all new since the last galaxy-dist release, so you'd need a
 recent galaxy-central server.

 Passing it back to Jeremy, there is also a Javascript wrapper around the
 tools API which might help:


 https://bitbucket.org/galaxy/galaxy-central/src/tip/static/scripts/mvc/tools.js

 but I haven't used it myself. Hope this helps,
 Brad



Thanks,
Erik Derohanian
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Empty TopHat output

2012-11-01 Thread Mohammad Heydarian
Thanks, Jeremy.

I updated my instance of Galaxy by the repository you provided (
https://bitbucket.org/galaxy/galaxy-dist/) and re-ran my analysis. I see
that TopHat is generating data (by monitoring the disk usage on the
Cloudman interface) again, but this time I the output file is in the error
state and the message returned is: Job output not returned from cluster.

Any thoughts?

Thanks in advance!

Cheers,
Mo Heydarian

PhD candidate
The Johns Hopkins School of Medicine
Department of Biological Chemistry
725 Wolfe Street
414 Hunterian
Baltimore, MD 21205



On Wed, Oct 31, 2012 at 9:03 PM, Jeremy Goecks jeremy.goe...@emory.eduwrote:

 In this case, it's useful to differentiate between (i) the AMI that Galaxy
 Cloud uses and (ii) the Galaxy code running on the cloud. I suspect that
 (ii) is out of data for you; this is not (yet) automatically updated, even
 when starting a new instance.

 Try using the admin console to update to the most recent Galaxy dist using
 this URL:

 https://bitbucket.org/galaxy/galaxy-dist/

 (not galaxy-central, as is the default)

 Best,
 J.

 On Oct 31, 2012, at 8:36 PM, Mohammad Heydarian wrote:

 We are running galaxy-cloudman-2011-03-22 (ami-da58aab3).

 Our latest instance was loaded up just last week.

 Thanks!

 Cheers,
 Mo Heydarian

 PhD candidate
 The Johns Hopkins School of Medicine
 Department of Biological Chemistry
 725 Wolfe Street
 414 Hunterian
 Baltimore, MD 21205



 On Wed, Oct 31, 2012 at 8:30 PM, Jeremy Goecks jeremy.goe...@emory.eduwrote:

 Given that this doesn't seem to be happening on our public server or on
 local instances, my best guess is that the issue is old code. Are you
 running the most recent dist?

 J.


 On Oct 31, 2012, at 7:37 PM, Mohammad Heydarian wrote:

 We are still getting empty TopHat output files on our Galaxy instance on
 the cloud. We see that TopHat is generating data while the tool is running
 (by monitoring our disk usage on the Amazon cloud), but the output is
 empty files.

 Is anyone else having this issue? Does anyone have any suggestions?

 Many thanks in advance!


 Cheers,
 Mo Heydarian



 On Mon, Oct 15, 2012 at 4:53 AM, Joachim Jacob joachim.ja...@vib.bewrote:

 The same here.

 Cheers,
 Joachim

 --
 Joachim Jacob, PhD

 Rijvisschestraat 120, 9052 Zwijnaarde
 Tel: +32 9 244.66.34
 Bioinformatics Training and Services (BITS)
 http://www.bits.vib.be
 @bitsatvib


 __**_
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/



 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/





___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] Dynamic job runner status? Could I use it in combination with job splitting?

2012-11-01 Thread Carlos Borroto
Hi,

I've been researching the possibility of using dynamic job runner in
combination with job splitting for blast jobs. My main interest is to
create a rule where both the size of the query and the database are
taken into consideration in order to select DRMAA and splitting
options.

My first question is what is the status on dynamic job runner? I found
these two threads, but is not clear to me if this feature is part of
galaxy-dist already:
http://lists.bx.psu.edu/pipermail/galaxy-dev/2011-October/007160.html
http://lists.bx.psu.edu/pipermail/galaxy-dev/2012-June/010080.html


My second question is if there is any documentation other than the
threads above to configure something like what I describe? In any
case, there is very good information from John in these emails and I
think that should get me started at least.

Cheers,
Carlos
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-dev] Where should I put a tool's required binaries?

2012-11-01 Thread Joel Rosenberg

I installed the bedtools tool from the public toolshed to my galaxy cluster. 
The jobs were failing because I hadn't installed the required binaries:

   An error occurred running this job: 
/opt/sge/default/spool/execd/ip-10-194-50-118/job_scripts/2: line 13: 
genomeCoverageBed: command not found

Taking a closer look at the toolshed's page, I noticed there were a handful of 
required binaries (genomeCoverageBed, intersectBed, etc).

So I downloaded BEDTools and compiled all the binaries. I've put them in well 
organized directories using symlinks to specific versions, etc. The final 
executable directory containing symlinks to these binaries is 
/mnt/galaxyTools/shed_tool_binaries/bin/.

I can make a snapshot of the tools volume so that these compiled binaries are 
always available to me when I bring up my cluster, but how do I integrate them 
into the PATH in a way that lets the bedtools galaxy tool see them, but also 
survive cluster shutdowns?

Thanks,

-Joel 
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] JBrowse direct export to Galaxy

2012-11-01 Thread Jeremy Goecks

 JBrowse is fully client-side, so there is no jbrowse server we can use. Is 
 there a way to pass data to galaxy directly without the server? This 
 functionality could also be useful for desktop applications.

This isn't possible right now, though you could imagine using FTP with the API 
to achieve this.

J.___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] Filter by quality error

2012-11-01 Thread Scooter Willis
I have a new workflow setup and filter by quality on an input fastqsanger file. 
I get the following error message where the original elements worked correctly. 
The IP address is a compute node. I have one compute node running with 
autoscaling on. Any reason why the mnt is working?


/opt/sge/default/spool/execd/ip-10-140-4-84/job_scripts/39: line 13: 
/mnt/galaxyTools/tools/fastx_toolkit/0.0.13/env.sh: No such file or directory
/opt/sge/default/spool/execd/ip-10-140-4-84/job_scripts/39: line 13: cd: 
/mnt/galaxyTools/galaxy-central: No such file or directory

/opt/sge/default/spool/execd/ip-10-140-4-84/job_scripts/39: line 13: 
/mnt/galaxyTools/galaxy-central/set_metadata.sh: No such file or directory
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/