Re: [galaxy-dev] Use of gzipped files in Galaxy Unit Tests

2014-11-10 Thread Peter Cock
Thanks John,

The technical reasons behind this behaviour are clearer to me now:

For test inputs, Galaxy runs the upload tool which will automatically
decompress gzipped files. This means functional tests can use gzipped
files as inputs.

However, for the expected test output files, Galaxy currently compares
the files directly (without applying the upload tool) and therefore
currently gzipped files cannot (currently) be used in the functional
test definitions.

Thanks,

Peter

On Mon, Nov 10, 2014 at 2:41 PM, John Chilton  wrote:
> I have no issue with supporting this in general - but the
> implementation I think is a bit more tricky than it would seem. The
> test framework doesn't know if Galaxy would uncompress zipped files or
> not - I think the only way to reason about that in the abstract is to
> actually upload the file to Galaxy unfortunately. So I guess we could
> try something like compare the contents - if not matching and the test
> target is compressed - decompress and retry. I've created a Trello
> card - https://trello.com/c/ebdBkezi.
>
> Thanks for the suggestion,
> -John
>
> On Thu, Nov 6, 2014 at 12:32 PM, Peter Cock  wrote:
>> Hello all,
>>
>> Because Galaxy uses the upload tool internally for the inputs for
>> function tests, it is possible to bundle gzipped versions of test
>> inputs which saves space. For example,
>>
>> https://github.com/peterjc/pico_galaxy/blob/master/tools/clc_assembly_cell/clc_mapper.xml
>>
>> However, what I have just found is that this does not work for
>> gzipped files as test outputs. Rather Galaxy seems to compare
>> the (uncompressed) output from the tool to the expected output
>> file as it is (i.e. still compressed). e.g.
>>
>> https://github.com/peterjc/pico_galaxy/commit/c8adfdf5d1f48c00ac72df967d2be3c828400d45
>>
>> (I have only tested adding/removing the gzip compression on the
>> output file locally)
>>
>> Is this deliberate?
>>
>> Peter
>>
>> P.S.
>>
>> I wanted point at the TravisCI output for that commit and its parent
>> to show the output,
>>
>> - https://travis-ci.org/peterjc/pico_galaxy/builds/40203652
>> - https://travis-ci.org/peterjc/pico_galaxy/builds/40202182
>>
>> However there appears to be an egg issue with the current
>> galaxy-central branch:
>>
>> $ python scripts/fetch_eggs.py
>> Warning: MarkupSafe (a dependent egg of Mako) cannot be fetched
>> Traceback (most recent call last):
>>   File "scripts/fetch_eggs.py", line 46, in 
>> c.resolve() # Only fetch eggs required by the config
>>   File 
>> "/home/travis/build/peterjc/pico_galaxy/galaxy-central-master/lib/galaxy/eggs/__init__.py",
>> line 347, in resolve
>> egg.resolve()
>>   File 
>> "/home/travis/build/peterjc/pico_galaxy/galaxy-central-master/lib/galaxy/eggs/__init__.py",
>> line 192, in resolve
>> if e.args[1].key != e.args[0].key:
>> IndexError: tuple index out of range
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] Jobs stuck in "new" state - Data Library datasets to blame?

2014-11-10 Thread Ido Tamir
did you check if the metadata on the input was set correctly/at all?
This happens sometimes in our galaxy instance. Metadata is not set correctly, 
and jobs run after metadata is set by hand.
No reupload necessary.

best,
ido

On 06 Nov 2014, at 17:13, Lance Parsons  wrote:

> I'v run into this same issue again (just with some other Data Library 
> datasets).  This time, there are a few users involved with quite a few 
> "stuck" jobs.  Does anyone have any advice on pushing these jobs through?  
> Maybe even a pointer to the relevant code?  I'm running latest_2014.08.11.  
> Thanks in advance.
> 
> Lance
> 
> Lance Parsons wrote:
>> Thanks, that was the first thing I checked.  However, restarting the handler 
>> didn't help.  Downloading the offending data and re-uploading as a new data 
>> set and then rerunning using the new dataset as input did work.  Also, all 
>> other jobs continued to run fine.
>> 
>> Lance
>> 
>> Kandalaft, Iyad wrote:
>>> I’ve had jobs get stuck in the new state when one of the handler servers 
>>> crashes.  If you have dedicated handlers, check to make sure they are still 
>>> running.
>>> Restart the handler to see if the jobs get resumed automatically.
>>>  
>>>  
>>>  
>>> Iyad Kandalaft
>>> 
>>>  
>>> From: galaxy-dev-boun...@lists.bx.psu.edu 
>>> [mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Aaron Petkau
>>> Sent: Wednesday, October 01, 2014 5:32 PM
>>> To: Lance Parsons
>>> Cc: galaxy-dev@lists.bx.psu.edu
>>> Subject: Re: [galaxy-dev] Jobs stuck in "new" state - Data Library datasets 
>>> to blame?
>>>  
>>> Are you attempting to upload datasets to a Data Library, and then copy to a 
>>> history and run jobs on them right away?  I've run into issues before where 
>>> if I attempt to run a job on a dataset in a library before it is finished 
>>> being uploaded and processed, then the job gets stuck in a queued state and 
>>> never executes.
>>> 
>>> Aaron
>>>  
>>> On Wed, Oct 1, 2014 at 2:51 PM, Lance Parsons  
>>> wrote:
>>> Recently, I updated our Galaxy instance to use two processes (one for web, 
>>> the other as a job handler).  This has been working well, except in a few 
>>> cases.  I've noticed that a number of jobs get stuck in the "new" status.
>>> 
>>> In a number of cases, I've resolved the issue by downloading and uploading 
>>> one of the input files and rerunning the job using the newly uploaded file. 
>>>  In at least one of these cases, the offending input file was one that was 
>>> copied from a Data Library.
>>> 
>>> Can anyone point me to something to look for in the database, etc. that 
>>> would cause a job to think a dataset was not ready for use as a job input?  
>>> I'd very much like to fix these datasets since having to re-upload data 
>>> libraries would be very tedious.
>>> 
>>> Thanks in advance.
>>> 
>>> -- 
>>> Lance Parsons - Scientific Programmer
>>> 134 Carl C. Icahn Laboratory
>>> Lewis-Sigler Institute for Integrative Genomics
>>> Princeton University
>>> 
>>> ___
>>> Please keep all replies on the list by using "reply all"
>>> in your mail client.  To manage your subscriptions to this
>>> and other Galaxy lists, please use the interface at:
>>>  http://lists.bx.psu.edu/
>>> 
>>> To search Galaxy mailing lists use the unified search at:
>>>  http://galaxyproject.org/search/mailinglists/
>>>  
>> 
>> -- 
>> Lance Parsons - Scientific Programmer
>> 134 Carl C. Icahn Laboratory
>> Lewis-Sigler Institute for Integrative Genomics
>> Princeton University
>> 
> 
> -- 
> Lance Parsons - Scientific Programmer
> 134 Carl C. Icahn Laboratory
> Lewis-Sigler Institute for Integrative Genomics
> Princeton University
> 
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>  http://lists.bx.psu.edu/
> 
> To search Galaxy mailing lists use the unified search at:
>  http://galaxyproject.org/search/mailinglists/


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] Error in creating admin user during tool shed bootstrap

2014-11-10 Thread Bruno Grande
I had the brilliant idea of sending the previous email out on a Friday
afternoon/evening. I'm just following up with this thread.

Best regards,
Bruno

--
Bruno Grande, B.Sc. (Hons)
M.Sc. Candidate
Dr. Ryan Morin's Laboratory
Molecular Biology and Biochemistry, Simon Fraser University
SSB7133,  University Drive, Burnaby, BC, Canada, V5A 1S6

On Fri, Nov 7, 2014 at 2:17 PM, Bruno Grande  wrote:

> I'm setting up a local development Tool Shed according to Greg Von
> Kuster's blog post
> . During
> the bootstrapping process, it seems that the creation of the admin user
> based on the information in user_info.xml fails because of a SQLAlchemy
> error (see attached stdout.txt). As a result, I believe the API calls for
> creating categories and users also fail, because they depend on the
> existence of admin user.
>
> This error seems to be due to a missing repository table in the database.
> I ran the SQL query against the Tool Shed database within PostgreSQL after
> the failed bootstrapping command and it returned no error, because the
> repository table actually exists.
>
> So, I don't know whether the SQL query is being run against the wrong
> database (*e.g.* the Galaxy database) or if the database schema isn't
> properly set up by the time SQLAlchemy attempts the query.
>
> I'm using the latest version of Galaxy (revision 83f821c5ecc1).
>
> Best regards,
> Bruno
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

[galaxy-dev] Galaxy Data Sources and Dataset Collections

2014-11-10 Thread Aaron Petkau
Hello,

I've been spending a bit of time looking over Data Sources
 for Galaxy.
I've been thinking about designing a tool in Galaxy, similar to a Data
Sources tool, which would take as input a file defining a list of URLs to
import into Galaxy, along with some user credentials.  In this sense it
would be similar to the GenomeSpace

importer tool.

However, instead of just exporting a set of files to a user's history, I'd
like to be able to also automatically group these files into a dataset
collection.  I would also like to be able to link to these files instead of
creating copies (which I think I can only do through a Data Library am I
correct?).  Ideally, I'd like to be able to use this tool as the first step
in a workflow, which would allow me to import and structure the data needed
for the rest of the workflow.

Does anyone have any experience writing a similar tool?  Is this possible
in the current Galaxy version?

Thanks,

Aaron
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Jobs stuck in "new" state - Data Library datasets to blame?

2014-11-10 Thread John Chilton
Hello Lance,

  I cannot think of a good way to rescue these jobs. If you are
curious about the code where jobs are selected for execution - I would
check out the job handler (lib/galaxy/jobs/handler.py) - see
__monitor_step for instance.

  It seems like to prevent this from happening in the future - we
should only allow copying datasets from libraries into histories if
the the library dataset is in an 'OK' state
(https://trello.com/c/0vxbP4El).

-John

On Thu, Nov 6, 2014 at 11:13 AM, Lance Parsons  wrote:
> I'v run into this same issue again (just with some other Data Library
> datasets).  This time, there are a few users involved with quite a few
> "stuck" jobs.  Does anyone have any advice on pushing these jobs through?
> Maybe even a pointer to the relevant code?  I'm running latest_2014.08.11.
> Thanks in advance.
>
> Lance
>
>
> Lance Parsons wrote:
>
> Thanks, that was the first thing I checked.  However, restarting the handler
> didn't help.  Downloading the offending data and re-uploading as a new data
> set and then rerunning using the new dataset as input did work.  Also, all
> other jobs continued to run fine.
>
> Lance
>
> Kandalaft, Iyad wrote:
>
> I’ve had jobs get stuck in the new state when one of the handler servers
> crashes.  If you have dedicated handlers, check to make sure they are still
> running.
>
> Restart the handler to see if the jobs get resumed automatically.
>
>
>
>
>
>
>
> Iyad Kandalaft
>
>
>
> From: galaxy-dev-boun...@lists.bx.psu.edu
> [mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Aaron Petkau
> Sent: Wednesday, October 01, 2014 5:32 PM
> To: Lance Parsons
> Cc: galaxy-dev@lists.bx.psu.edu
> Subject: Re: [galaxy-dev] Jobs stuck in "new" state - Data Library datasets
> to blame?
>
>
>
> Are you attempting to upload datasets to a Data Library, and then copy to a
> history and run jobs on them right away?  I've run into issues before where
> if I attempt to run a job on a dataset in a library before it is finished
> being uploaded and processed, then the job gets stuck in a queued state and
> never executes.
>
> Aaron
>
>
>
> On Wed, Oct 1, 2014 at 2:51 PM, Lance Parsons 
> wrote:
>
> Recently, I updated our Galaxy instance to use two processes (one for web,
> the other as a job handler).  This has been working well, except in a few
> cases.  I've noticed that a number of jobs get stuck in the "new" status.
>
> In a number of cases, I've resolved the issue by downloading and uploading
> one of the input files and rerunning the job using the newly uploaded file.
> In at least one of these cases, the offending input file was one that was
> copied from a Data Library.
>
> Can anyone point me to something to look for in the database, etc. that
> would cause a job to think a dataset was not ready for use as a job input?
> I'd very much like to fix these datasets since having to re-upload data
> libraries would be very tedious.
>
> Thanks in advance.
>
> --
> Lance Parsons - Scientific Programmer
> 134 Carl C. Icahn Laboratory
> Lewis-Sigler Institute for Integrative Genomics
> Princeton University
>
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>  http://lists.bx.psu.edu/
>
> To search Galaxy mailing lists use the unified search at:
>  http://galaxyproject.org/search/mailinglists/
>
>
>
>
> --
> Lance Parsons - Scientific Programmer
> 134 Carl C. Icahn Laboratory
> Lewis-Sigler Institute for Integrative Genomics
> Princeton University
>
>
> --
> Lance Parsons - Scientific Programmer
> 134 Carl C. Icahn Laboratory
> Lewis-Sigler Institute for Integrative Genomics
> Princeton University
>
>
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>   http://lists.bx.psu.edu/
>
> To search Galaxy mailing lists use the unified search at:
>   http://galaxyproject.org/search/mailinglists/

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

[galaxy-dev] DecoRNAi error

2014-11-10 Thread Rossella Rispoli
Hello,
I'm trying to analyse a 384-plate screen genome data set(67 plate size)with 
DecoRNAi tool to discover the over represented seed family, and I got this 
error:
An error occurred running this job: Error: cannot allocate vector of size 617.8 
Mb.

Any idea of why? Is there a limit on the input size dataset?

Thanks in advances,

Rossella

--
Rossella Rispoli,
High Throughput Screening
Cancer Research UK
44 Lincoln's Inn fields
London WC2A 3LY UK
Tel No. +44 (0)207 269 3151
Fax No. +44 (0)207 269 3581
--


NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for the above-named 
person(s). If you are not the intended recipient, notify the sender 
immediately, delete this email from your system and do not disclose or use for 
any purpose. 

We may monitor all incoming and outgoing emails in line with current 
legislation. We have taken steps to ensure that this email and attachments are 
free from any virus, but it remains your responsibility to ensure that viruses 
do not adversely affect you. 
Cancer Research UK
Registered charity in England and Wales (1089464), Scotland (SC041666) and the 
Isle of Man (1103)
A company limited by guarantee.  Registered company in England and Wales 
(4325234) and the Isle of Man (5713F).
Registered Office Address: Angel Building, 407 St John Street, London EC1V 4AD.___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Use of gzipped files in Galaxy Unit Tests

2014-11-10 Thread John Chilton
I have no issue with supporting this in general - but the
implementation I think is a bit more tricky than it would seem. The
test framework doesn't know if Galaxy would uncompress zipped files or
not - I think the only way to reason about that in the abstract is to
actually upload the file to Galaxy unfortunately. So I guess we could
try something like compare the contents - if not matching and the test
target is compressed - decompress and retry. I've created a Trello
card - https://trello.com/c/ebdBkezi.

Thanks for the suggestion,
-John

On Thu, Nov 6, 2014 at 12:32 PM, Peter Cock  wrote:
> Hello all,
>
> Because Galaxy uses the upload tool internally for the inputs for
> function tests, it is possible to bundle gzipped versions of test
> inputs which saves space. For example,
>
> https://github.com/peterjc/pico_galaxy/blob/master/tools/clc_assembly_cell/clc_mapper.xml
>
> However, what I have just found is that this does not work for
> gzipped files as test outputs. Rather Galaxy seems to compare
> the (uncompressed) output from the tool to the expected output
> file as it is (i.e. still compressed). e.g.
>
> https://github.com/peterjc/pico_galaxy/commit/c8adfdf5d1f48c00ac72df967d2be3c828400d45
>
> (I have only tested adding/removing the gzip compression on the
> output file locally)
>
> Is this deliberate?
>
> Peter
>
> P.S.
>
> I wanted point at the TravisCI output for that commit and its parent
> to show the output,
>
> - https://travis-ci.org/peterjc/pico_galaxy/builds/40203652
> - https://travis-ci.org/peterjc/pico_galaxy/builds/40202182
>
> However there appears to be an egg issue with the current
> galaxy-central branch:
>
> $ python scripts/fetch_eggs.py
> Warning: MarkupSafe (a dependent egg of Mako) cannot be fetched
> Traceback (most recent call last):
>   File "scripts/fetch_eggs.py", line 46, in 
> c.resolve() # Only fetch eggs required by the config
>   File 
> "/home/travis/build/peterjc/pico_galaxy/galaxy-central-master/lib/galaxy/eggs/__init__.py",
> line 347, in resolve
> egg.resolve()
>   File 
> "/home/travis/build/peterjc/pico_galaxy/galaxy-central-master/lib/galaxy/eggs/__init__.py",
> line 192, in resolve
> if e.args[1].key != e.args[0].key:
> IndexError: tuple index out of range
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] ulimit problems after update

2014-11-10 Thread John Chilton
Hmm... this is probably a script that works fine on your login node
but not on your worker nodes or vise versa? The fact that it is
writing to standard error and happens for each new shell is probably
what is causing Galaxy jobs to fail - Galaxy is thinking the
underlying applications are writing content to standard error and
failing the job as a result. You could probably just rework the line
to suppress standard error as follows (I think):

ulimit -v 60 2>&1 > /dev/null || true

Otherwise - you may want to consider figuring out which nodes this
should not be running on - and place an if before the ulimit. The
details of that are going to vary wildly based on your architecture
however.

Hope this helps.

-John




On Mon, Nov 10, 2014 at 5:44 AM, Ido Tamir  wrote:
> I simply removed the ulimit and the jobs complete successfully,
> but I still wonder why it does not work and would like to put the ulimit 
> again in place.
>
> It happens also with very simple jobs, like filter tool on a 7k region file 
> for “chr1”.
>
> thank you very much,
> ido
>
>
> On 10 Nov 2014, at 11:34, Ido Tamir  wrote:
>
>> Hi,
>> I updated to the latest galaxy distribution (after one year). And now
>> every job fails with:
>> /home/imba/solexa/.profile.sh: line 118: ulimit: virtual memory: cannot 
>> modify limit: Operation not permitted
>>
>> The limit is ridiculously high:
>> ulimit -v 6000
>>
>> Its just to prevent some badly programmed in house galaxy tools to crash the 
>> server.
>> I think the problem happens after the set_metadata stage.
>>
>> Any advice?
>>
>> thank you very much,
>> ido
>> ___
>> Please keep all replies on the list by using "reply all"
>> in your mail client.  To manage your subscriptions to this
>> and other Galaxy lists, please use the interface at:
>>  http://lists.bx.psu.edu/
>>
>> To search Galaxy mailing lists use the unified search at:
>>  http://galaxyproject.org/search/mailinglists/
>
>
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>   http://lists.bx.psu.edu/
>
> To search Galaxy mailing lists use the unified search at:
>   http://galaxyproject.org/search/mailinglists/

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] ulimit problems after update

2014-11-10 Thread Ido Tamir
I simply removed the ulimit and the jobs complete successfully,
but I still wonder why it does not work and would like to put the ulimit again 
in place.

It happens also with very simple jobs, like filter tool on a 7k region file for 
“chr1”.

thank you very much,
ido 


On 10 Nov 2014, at 11:34, Ido Tamir  wrote:

> Hi,
> I updated to the latest galaxy distribution (after one year). And now
> every job fails with:
> /home/imba/solexa/.profile.sh: line 118: ulimit: virtual memory: cannot 
> modify limit: Operation not permitted
> 
> The limit is ridiculously high:
> ulimit -v 6000
> 
> Its just to prevent some badly programmed in house galaxy tools to crash the 
> server.
> I think the problem happens after the set_metadata stage.
> 
> Any advice?
> 
> thank you very much,
> ido
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>  http://lists.bx.psu.edu/
> 
> To search Galaxy mailing lists use the unified search at:
>  http://galaxyproject.org/search/mailinglists/


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


[galaxy-dev] ulimit problems after update

2014-11-10 Thread Ido Tamir
Hi,
I updated to the latest galaxy distribution (after one year). And now
every job fails with:
/home/imba/solexa/.profile.sh: line 118: ulimit: virtual memory: cannot modify 
limit: Operation not permitted

The limit is ridiculously high:
 ulimit -v 6000

Its just to prevent some badly programmed in house galaxy tools to crash the 
server.
I think the problem happens after the set_metadata stage.

Any advice?

thank you very much,
ido
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/