Re: [galaxy-dev] "Job output not returned from cluster"

2011-07-29 Thread Edward Kirton
thanks for your comments, fellas.

permissions would certainly cause this problem, but that's not the cause for
me.

most wrappers just serve to redirect stderr, so i don't think it's the
wrapper script itself, but the stdout/stderr files are part of the problem.

the error message is thrown in the finish_job method when it can't open the
source/dest stdout/stderr for reading/writing.  i split the try statement to
add finer-grained error messages but i already verified the files do exist,
so it's seems to be a file system issue.

i suspect it's because the storage i'm using as a staging area has
flashdrives between the RAM and spinnning disks, so upon close, the file
buffers may get flushed out of RAM to the SSDs but not immediately be
available from the SCSI drives.  Or maybe the (inode) metadata table hasn't
finished updating yet.  if so, it's not the fact that the cluster is heavily
utilized, but the filesystem is.  this disk is expressly for staging cluster
jobs.  i'll see if adding a short sleep and retry once upon error solves
this problem... but i won't know immediately as the problem is intermittent.
 that's the problem with fancy toys; they often come with fancy problems!


On Fri, Jul 29, 2011 at 2:42 AM, Peter Cock wrote:

> also had this error message (I'm currently working out how to
> connect our Galaxy to our cluster), and in at least one case it was
> caused by a file permission problem - the tool appeared to run but
> could not write the output files.
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Galaxy public instances: data and job quotas

2011-08-24 Thread Edward Kirton
love the quotas; this was sorely needed.  thanks much.
ed

On Fri, Aug 19, 2011 at 4:44 PM, Jennifer Jackson  wrote:

> Galaxy public instances: data and job quotas
>
>
> User data and job quota limits are now implemented at the public Galaxy
> Test instance http://test.g2.bx.psu.edu:
>
> http://galaxyproject.org/Test#**Quotas
>
>
> While no quotas are currently implemented at the public Galaxy Main
> instance http://usegalaxy.org, we do ask that users stay within certain
> usage limits:
>
> http://galaxyproject.org/Main#**Quotas
>
>
> If you find that you require additional resources, please consider the
> alternative Galaxy options explained at:
>
> http://galaxyproject.org/Big%**20Picture/Choices
>
>
>
> Thanks for using Galaxy!
> __**_
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>  http://lists.bx.psu.edu/
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Galaxy public instances: data and job quotas

2011-08-24 Thread Edward Kirton
i created some quotas which were assigned to groups (set as for "yes,
registered users") but apparently the groups didn't initially stick; i had
to edit the quota after creation to reassign the group.

On Fri, Aug 19, 2011 at 4:44 PM, Jennifer Jackson  wrote:

> Galaxy public instances: data and job quotas
>
>
> User data and job quota limits are now implemented at the public Galaxy
> Test instance http://test.g2.bx.psu.edu:
>
> http://galaxyproject.org/Test#**Quotas
>
>
> While no quotas are currently implemented at the public Galaxy Main
> instance http://usegalaxy.org, we do ask that users stay within certain
> usage limits:
>
> http://galaxyproject.org/Main#**Quotas
>
>
> If you find that you require additional resources, please consider the
> alternative Galaxy options explained at:
>
> http://galaxyproject.org/Big%**20Picture/Choices
>
>
>
> Thanks for using Galaxy!
> __**_
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>  http://lists.bx.psu.edu/
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Galaxy public instances: data and job quotas

2011-08-24 Thread Edward Kirton
also, delete and purge a quota seems to have no effect.

On Wed, Aug 24, 2011 at 3:58 PM, Edward Kirton  wrote:

> i created some quotas which were assigned to groups (set as for "yes,
> registered users") but apparently the groups didn't initially stick; i had
> to edit the quota after creation to reassign the group.
>
> On Fri, Aug 19, 2011 at 4:44 PM, Jennifer Jackson  wrote:
>
>> Galaxy public instances: data and job quotas
>>
>>
>> User data and job quota limits are now implemented at the public Galaxy
>> Test instance http://test.g2.bx.psu.edu:
>>
>> http://galaxyproject.org/Test#**Quotas<http://galaxyproject.org/Test#Quotas>
>>
>>
>> While no quotas are currently implemented at the public Galaxy Main
>> instance http://usegalaxy.org, we do ask that users stay within certain
>> usage limits:
>>
>> http://galaxyproject.org/Main#**Quotas<http://galaxyproject.org/Main#Quotas>
>>
>>
>> If you find that you require additional resources, please consider the
>> alternative Galaxy options explained at:
>>
>> http://galaxyproject.org/Big%**20Picture/Choices<http://galaxyproject.org/Big%20Picture/Choices>
>>
>>
>>
>> Thanks for using Galaxy!
>> __**_
>> Please keep all replies on the list by using "reply all"
>> in your mail client.  To manage your subscriptions to this
>> and other Galaxy lists, please use the interface at:
>>
>>  http://lists.bx.psu.edu/
>>
>
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] downloading large files

2011-08-26 Thread Edward Kirton
i thought i recalled reading about downloading files from a history
via ftp, but i could been mistaken -- couldn't find anything on the
wiki or mailing list archives.  does this feature exist?

what's the best way for users to download many or large files other
than via the browser?
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Selective storage of galaxy files

2011-08-26 Thread Edward Kirton
An easy and immediate solution may be to:

(a) create a "Link data" tool.  The user specifies the data ID and
your tool queries a db to find the location and creates a symlink to
the data, which is stored on different groups'/projects' disks.  While
subsequent datafiles will still be stored in the common database/files
folder, at least the big raw data files will be stored outside of
galaxy's database/files folder.
(b) use user storage quotas to manage the galaxy database/files
storage.  For example, create a group for project X, which has 10TB
which is contributed to galaxy.  If there are 10 users associated with
that project, use to quotas to allocate +1TB addition storage to each
of their limits.

It may not be as elegant as some alternatives, but you could implement
this today.

On Mon, Aug 22, 2011 at 12:51 PM, Nate Coraor  wrote:
> Ravi Madduri wrote:
>> Nate
>> I brought this issue up at the users conference and I wanted to bring it up 
>> again. How does somebody like us keep track of new development like this and 
>> how can we contribute?
>
> Hi Ravi,
>
> The best way is probably to ask on the dev list whether we are, or have
> interest in working on something.  I do agree that it can be difficult
> to know what we're working on, but part of the reason (in my own case,
> anyway) is that not everything I work on makes it to the light of day in
> a timely manner, so I tend not to make a lot of noise about it until
> it's well along.
>
> --nate
>
>>
>> Regards
>> On Aug 22, 2011, at 11:57 AM, Nate Coraor wrote:
>>
>> > Dave Walton wrote:
>> >> Dear Galaxy developers,
>> >>
>> >> Our institution is trying solve our storage problem (we need lots,
>> >> especially for NGS data, and someone needs to fund it).  What we would 
>> >> like
>> >> to be able to do, is based on some criteria control in what location a 
>> >> file
>> >> gets written to disk.
>> >>
>> >> This criteria could be an individual user, a role or group they belong to,
>> >> or a project the file is associated with.
>> >>
>> >> What we'd like to know are the following 3 things:
>> >> 1) Is anyone already working on something like this?
>> >
>> > Hi Dave,
>> >
>> > We're working on an abstraction layer which will allow Galaxy data to
>> > live in multiple places instead of the single-point "files_path" that is
>> > currently used.  Enis Afgan wrote the initial implementation and I am
>> > hoping to complete it within the next few months.
>> >
>> > This won't have any per-user logic, but it should provide a piece of
>> > what you are hoping to do.
>> >
>> > --nate
>> >
>> >> 2) Are there other institutions that would be interested in this type of
>> >> functionality?
>> >>
>> >> 3) If we were to attempt to implement this ourselves, would anyone be
>> >> interested in giving us some input with respect to how to implement and 
>> >> how
>> >> to make it generic enough to meet the needs of most institutions?  If 
>> >> we're
>> >> going to do it, we'll need to be able to produce an estimate of what the
>> >> effort would be like so that we could get institutional funding to develop
>> >> the functionality.
>> >>
>> >> Thanks for any input you can provide.
>> >>
>> >> Dave
>> >>
>> >> --
>> >> Dave Walton
>> >> Computational Sciences
>> >> The Jackson Laboratory
>> >> Bar Harbor, Maine
>> >>
>> >>
>> >> ___
>> >> Please keep all replies on the list by using "reply all"
>> >> in your mail client.  To manage your subscriptions to this
>> >> and other Galaxy lists, please use the interface at:
>> >>
>> >>  http://lists.bx.psu.edu/
>> > ___
>> > Please keep all replies on the list by using "reply all"
>> > in your mail client.  To manage your subscriptions to this
>> > and other Galaxy lists, please use the interface at:
>> >
>> >  http://lists.bx.psu.edu/
>>
>> --
>> Ravi K Madduri
>> The Globus Alliance | Argonne National Laboratory | University of Chicago
>> http://www.mcs.anl.gov/~madduri
>>
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>  http://lists.bx.psu.edu/
>

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] using Galaxy for map/reduce

2011-08-26 Thread Edward Kirton
Not intending to hijack the thread, but in response to John's comment
-- I, too, made a general solution for embarassingly parallel problems
but instead of splitting the large files on disk, I just use seek to
move the file pointer so each task can grab it's part.

On Tue, Aug 2, 2011 at 10:54 AM, Duddy, John  wrote:
> I did something similar, but implemented as an evolution of the original 
> "basic" parallelism (see BWA), that:
> - Moved the splitting of input files into the datatype classes
> - Allowed any number of inputs to be split, as long as they were the same 
> datatype (so they were mutually consistent - think paired end fastq files)
> - Allowed other inputs to be shared among jobs
> - Merged any number of outputs, which merge code implemented in the datatype 
> classes
>
> This worked functionally, but the IO required to split large files has proved 
> too much for something like a whole genome (~500GB)
>
> I was thinking of something philosophically similar to your dataset container 
> idea, but more in the idea that a dataset is no longer a "file", so the jobs 
> running on subsets of the dataset would just ask for the parts they need. 
> Galaxy would take care of preserving the abstraction that the subset of the 
> dataset is a single input file, perhaps by extracting the subset to a 
> temporary file on local storage. Similarly, the merged outputs would just be 
> held in the target dataset, not copied, thus making the IO cost for the 
> "merge" 0 for the simple case where it is mere concatenation.
>
> John Duddy
> Sr. Staff Software Engineer
> Illumina, Inc.
> 9885 Towne Centre Drive
> San Diego, CA 92121
> Tel: 858-736-3584
> E-mail: jdu...@illumina.com
>
>
> -Original Message-
> From: galaxy-dev-boun...@lists.bx.psu.edu 
> [mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Andrew Straw
> Sent: Tuesday, August 02, 2011 7:13 AM
> To: galaxy-...@bx.psu.edu
> Subject: [galaxy-dev] using Galaxy for map/reduce
>
> Hi all,
>
> I've been investigating use of Galaxy for our lab and it has many
> attractive aspects -- a big thank you to all involved.
>
> We still have a couple of related sticking points, however, that I would
> like to get the Galaxy developers' feedback on. Basically, I want to use
> Galaxy to run Map/Reduce type analysis on many initial data files. What
> I mean is that I want to take many initial datasets (e.g. 250 or more),
> perhaps already stored in a library, and then apply a workflow to each
> and every one of them (the Map step). Then, on the many result datasets
> (one from each of the initial datasets), I want to run a Reduce step
> which creates a single dataset. I have achieved this in an imperfect and
> not-quite-working way with a few tricks, but I hope that with a little
> work, Galaxy could be much better for this type of use case.
>
> I have a couple of specific problems and a proposal for a general solution:
>
> 1) My first specific problem is that loading many datasets (e.g. 250)
> into history causes the javascript running locally withing a browser to
> be extremely slow.
>
> 2) My second specific problem is that applying a workflow with N steps
> to many datasets creates even more datasets (Nx250 additional datasets).
> In addition to the slow Javascript problem, there seems to be other
> issues I haven't diagnosed further, but the console in which I'm running
> run.sh indicates many errors of the type "Exception AssertionError:
> AssertionError('State  object at 0x7f5c18c47990> is not present in this identity map',) in
>   0x7f5c18c47990>> ignored". Furthermore the webserver gets slow and my
> nginx frontend proxy gives 504 gateway time-outs.
>
> 3) There's no good way to do reduce within Galaxy. Currently I work
> around this by having a tool type which takes as an input a dataset and
> then uploads this to a self-written webserver, which then collects such
> uploads, performs the reduce, and offers a download link for the user to
> collect the reduced dataset. The user must manually then upload this
> dataset back into Galaxy for further processing.
>
> My proposal for a general solution, and what I'd be interested in
> feedback on, is an idea of a "dataset container" (this is just a working
> name). It would look and act much like a dataset in the history, but
> would in fact be a logical construct that merely bundles together a
> homogeneous bunch of datasets. When a tool (or a workflow) is applied to
> a dataset container, Galaxy would automatically create a new container
> in which each dataset in this new container is the result of running the
> tool. (Workflows with N steps would thus generate N new containers.) The
> thing I like about this idea is that it preserves the ability to use
> tools and workflows on both individual datasets and, with some
> additional logic, on these new containers. In particular, I don't think
> the tools and workflows themselves would have to be modified. This would
> seemingly mitigate the slow Javascript

Re: [galaxy-dev] downloading large files

2011-08-26 Thread Edward Kirton
okay, thanks.  i'll create a tool to export files as a tarball in the
user's ftp folder, and couple it with a cron job to make sure the
files are deleted after a week.  i'll contribute it to the toolshed
when done.

On Fri, Aug 26, 2011 at 11:59 AM, Nate Coraor  wrote:
> Edward Kirton wrote:
>> i thought i recalled reading about downloading files from a history
>> via ftp, but i could been mistaken -- couldn't find anything on the
>> wiki or mailing list archives.  does this feature exist?
>>
>> what's the best way for users to download many or large files other
>> than via the browser?
>
> You can use wget/curl to avoid the browser, but it's still an http
> transfer.  Some people have written an "export" tool that writes a
> dataset to some specified location.  We've talked before about adding
> this sort of functionality directly into the interface but it hasn't
> been done yet.
>
> --nate
>
>> ___
>> Please keep all replies on the list by using "reply all"
>> in your mail client.  To manage your subscriptions to this
>> and other Galaxy lists, please use the interface at:
>>
>>   http://lists.bx.psu.edu/
>

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] using Galaxy for map/reduce

2011-08-26 Thread Edward Kirton
yes, many tools don't read from stdin, you're right.  in practice, i
actually have each task write it's part to the node's local scratch
disk and also do implicit conversions in this step as well (e.g.
scatter fastq as fasta).  but not all clusters have a local
scratchdisk.
also, as you mentioned, the seek solution wouldn't work for compressed infiles.

as i try to avoid working on the galaxy internals, i implemented this
as a command-line utility.
e.g.
psub --fastqToFasta $infile --cat $outfile qctool.py $infile
$outfile
instead of the nonparallel: qctool.py $infile $outfile

but it would be nice to see this functionality in galaxy.  i thought
about reimplementing this as a drmaa_epc.py job runner but noticed
there was already tasks.py.


On Fri, Aug 26, 2011 at 12:41 PM, Duddy, John  wrote:
> Many of the tools out there work on files, and assume they are supposed to 
> work on the whole file (or take arguments for subsets that vary from tool to 
> tool).
>
> I'm working on a way for Galaxy to handle all these tools transparently, even 
> if, as in my case, the files are compressed but the tools cannot read 
> compressed files.
>
> John Duddy
> Sr. Staff Software Engineer
> Illumina, Inc.
> 9885 Towne Centre Drive
> San Diego, CA 92121
> Tel: 858-736-3584
> E-mail: jdu...@illumina.com
>
>
> -Original Message-
> From: Edward Kirton [mailto:eskir...@lbl.gov]
> Sent: Friday, August 26, 2011 12:34 PM
> To: Duddy, John
> Cc: galaxy-...@bx.psu.edu
> Subject: Re: [galaxy-dev] using Galaxy for map/reduce
>
> Not intending to hijack the thread, but in response to John's comment
> -- I, too, made a general solution for embarassingly parallel problems
> but instead of splitting the large files on disk, I just use seek to
> move the file pointer so each task can grab it's part.
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-dev] database migration error 79->80 (NameError: name 'BigInteger' is not defined)

2011-08-31 Thread Edward Kirton
hi, we are getting the following error migrating from 79 to 80.
curiously, my own galaxy didn't have this problem, but another
developer here was getting this error.  i moved him from sqlite to
postgres but that didn't help.
thanks for any assistance,
ed

Migrating 79 -> 80...
galaxy.model.migrate.check INFO 2011-08-31 13:15:31,446
Traceback (most recent call last):
  File 
"/house/groupdirs/genetic_analysis/jfroula/Projects/Galaxy/galaxy-jeff/lib/galaxy/web/buildapp.py",
line 82, in app_factory
    app = UniverseApplication( global_conf = global_conf, **kwargs )
  File 
"/house/groupdirs/genetic_analysis/jfroula/Projects/Galaxy/galaxy-jeff/lib/galaxy/app.py",
line 32, in __init__
    create_or_verify_database( db_url, kwargs.get( 'global_conf', {}
).get( '__file__', None ), self.config.database_engine_options )
  File 
"/house/groupdirs/genetic_analysis/jfroula/Projects/Galaxy/galaxy-jeff/lib/galaxy/model/migrate/check.py",
line 67, in create_or_verify_database
    migrate_to_current_version( engine, db_schema )
  File 
"/house/groupdirs/genetic_analysis/jfroula/Projects/Galaxy/galaxy-jeff/lib/galaxy/model/migrate/check.py",
line 125, in migrate_to_current_version
    schema.runchange( ver, change, changeset.step )
  File 
"/house/groupdirs/genetic_analysis/jfroula/Projects/Galaxy/galaxy-depot/eggs/sqlalchemy_migrate-0.5.4-py2.6.egg/migrate/versioning/schema.py",
line 184, in runchange
  File 
"/house/groupdirs/genetic_analysis/jfroula/Projects/Galaxy/galaxy-depot/eggs/sqlalchemy_migrate-0.5.4-py2.6.egg/migrate/versioning/script/py.py",
line 100, in run
  File 
"/house/groupdirs/genetic_analysis/jfroula/Projects/Galaxy/galaxy-depot/eggs/sqlalchemy_migrate-0.5.4-py2.6.egg/migrate/versioning/script/py.py",
line 112, in _func
  File 
"/house/groupdirs/genetic_analysis/jfroula/Projects/Galaxy/galaxy-depot/eggs/sqlalchemy_migrate-0.5.4-py2.6.egg/migrate/versioning/script/py.py",
line 108, in module
  File 
"/house/groupdirs/genetic_analysis/jfroula/Projects/Galaxy/galaxy-depot/eggs/sqlalchemy_migrate-0.5.4-py2.6.egg/migrate/versioning/script/py.py",
line 65, in verify_module
  File 
"/house/groupdirs/genetic_analysis/jfroula/Projects/Galaxy/galaxy-depot/eggs/sqlalchemy_migrate-0.5.4-py2.6.egg/migrate/versioning/util/importpath.py",
line 12, in import_path
  File "lib/galaxy/model/migrate/versions/0080_quota_tables.py", line
29, in 
    Column( "bytes", BigInteger ),
NameError: name 'BigInteger' is not defined

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] database migration error 79->80 (NameError: name 'BigInteger' is not defined)

2011-08-31 Thread Edward Kirton
please disregard my previous message, it was the developer's error; he
accidentally had deleted or reverted something in the libs folder.

On Wed, Aug 31, 2011 at 3:36 PM, Edward Kirton  wrote:
> hi, we are getting the following error migrating from 79 to 80.
> curiously, my own galaxy didn't have this problem, but another
> developer here was getting this error.  i moved him from sqlite to
> postgres but that didn't help.
> thanks for any assistance,
> ed
>
> Migrating 79 -> 80...
> galaxy.model.migrate.check INFO 2011-08-31 13:15:31,446
> Traceback (most recent call last):
>   File 
> "/house/groupdirs/genetic_analysis/jfroula/Projects/Galaxy/galaxy-jeff/lib/galaxy/web/buildapp.py",
> line 82, in app_factory
>     app = UniverseApplication( global_conf = global_conf, **kwargs )
>   File 
> "/house/groupdirs/genetic_analysis/jfroula/Projects/Galaxy/galaxy-jeff/lib/galaxy/app.py",
> line 32, in __init__
>     create_or_verify_database( db_url, kwargs.get( 'global_conf', {}
> ).get( '__file__', None ), self.config.database_engine_options )
>   File 
> "/house/groupdirs/genetic_analysis/jfroula/Projects/Galaxy/galaxy-jeff/lib/galaxy/model/migrate/check.py",
> line 67, in create_or_verify_database
>     migrate_to_current_version( engine, db_schema )
>   File 
> "/house/groupdirs/genetic_analysis/jfroula/Projects/Galaxy/galaxy-jeff/lib/galaxy/model/migrate/check.py",
> line 125, in migrate_to_current_version
>     schema.runchange( ver, change, changeset.step )
>   File 
> "/house/groupdirs/genetic_analysis/jfroula/Projects/Galaxy/galaxy-depot/eggs/sqlalchemy_migrate-0.5.4-py2.6.egg/migrate/versioning/schema.py",
> line 184, in runchange
>   File 
> "/house/groupdirs/genetic_analysis/jfroula/Projects/Galaxy/galaxy-depot/eggs/sqlalchemy_migrate-0.5.4-py2.6.egg/migrate/versioning/script/py.py",
> line 100, in run
>   File 
> "/house/groupdirs/genetic_analysis/jfroula/Projects/Galaxy/galaxy-depot/eggs/sqlalchemy_migrate-0.5.4-py2.6.egg/migrate/versioning/script/py.py",
> line 112, in _func
>   File 
> "/house/groupdirs/genetic_analysis/jfroula/Projects/Galaxy/galaxy-depot/eggs/sqlalchemy_migrate-0.5.4-py2.6.egg/migrate/versioning/script/py.py",
> line 108, in module
>   File 
> "/house/groupdirs/genetic_analysis/jfroula/Projects/Galaxy/galaxy-depot/eggs/sqlalchemy_migrate-0.5.4-py2.6.egg/migrate/versioning/script/py.py",
> line 65, in verify_module
>   File 
> "/house/groupdirs/genetic_analysis/jfroula/Projects/Galaxy/galaxy-depot/eggs/sqlalchemy_migrate-0.5.4-py2.6.egg/migrate/versioning/util/importpath.py",
> line 12, in import_path
>   File "lib/galaxy/model/migrate/versions/0080_quota_tables.py", line
> 29, in 
>     Column( "bytes", BigInteger ),
> NameError: name 'BigInteger' is not defined
>

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] HOW TO RETRIEVE DATA FROM HISTORY??!!

2011-09-01 Thread Edward Kirton
why not create a simple "export" tool?  perhaps with the option to cp
or symlink.

On Thu, Aug 4, 2011 at 9:57 PM, colin molter  wrote:
> Hi all,
> i am still stuck with the same problem.
> Is there a way to directly move/copy data from your galaxy history to a
> given location in the filesystem of the same galaxy server?
> Said differently, there is a nice way to import data from the server to
> galaxy, is it possible to do the reverse?
> So far, I am obliged to download the file from galaxy to my client machine
> and then back to the server with huge bam files of 3Gb it is not so
> convenient!!
> thank you all
> colin
>
>
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>  http://lists.bx.psu.edu/
>

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] disk space and file formats

2011-09-01 Thread Edward Kirton
Read QC intermediate files account for most of the storage used on our
galaxy site. And it's a real problem that I must solve soon.

My first attempt at taming the beast was to try to create a single read QC
tool that did such things as convert qual encoding, qual-end trimming, etc.
(very basic functions).  Such a tool could simply be a wrapper around your
favorite existing tools, but doesn't keep the intermediate files.  The added
benefit is that it runs faster because it only has to queue onto the cluster
once.

Sure, one might argue that it's nice to have all the intermediate files just
in case you wish to review them, but in practice, I have found this happens
relatively infrequently and is too expensive.  If you're a small lab maybe
that's fine, but if you generate a lot of sequence, a more production-line
approach is reasonable.

I've been toying with the idea of replacing all the fastq datatypes with a
single fastq datatype that is sanger-encoded and gzipped.  I think gzipped
reads files are about 1/4 of the unpacked version.  Of course, many tools
will require a wrapper if they don't accept gzipped input, but that's
trivial (and many already support compressed reads).

However the import tool automatically uncompressed uploaded files so I'd
need to do some hacking there to prevent this.

Heck, what we really need is a nice compact binary format for reads, perhaps
which doesn't even store ids (although pairing would need to be recorded).

Thoughts?

On Fri, Aug 19, 2011 at 11:43 AM, Jelle Scholtalbers <
j.scholtalb...@gmail.com> wrote:

> Hi Patrick,
>
> the issue you are having is partly related to the idea of Galaxy to
> ensure reproducible science and saving each intermediate step and
> output files. For example in your current workflow in Galaxy you can
> easily do something else with each intermediate file - feed it to a
> different tool just to check what the average read length is after
> filtering - you can do that even 2 months after your run.
> If you how ever insist on keeping disk usage low and don't want to
> start programming - as your provided solutions will require - and
> aren't too afraid of the commandline you might want to start there.
>
> The thing is, a lot of tools accept either an input file or an input
> stream. These same tools also have the ability to either write to an
> output file or to an output stream. This way you can "pipe" these
> tools together.
> e.g. "trimMyFq -i rawinput.fq | removebarcode -i - -n optionN |
> filterJunk -i - -o finalOutput.fq"
>
> I don't know which programs you actually use, but the principle is
> probably the same ( as long as the tools actually accept streams ).
> This example saved you diskspace because from the 3 tools run, only
> one actually writes to the disk. On the downside, this also means you
> don't have an output file from removeBarcode which you can look at to
> see if everything went ok.
>
> If you do want to program or someone else wants to do it, I could
> think of a tool that combines your iterative steps and can be run as
> one tool - you could even wrap up your 'pipeline' in a script and put
> that as a tool in your Galaxy instance and/or in the toolshed.
>
> Cheers,
> Jelle
>
>
>
> On Fri, Aug 19, 2011 at 6:29 PM, Patrick Page-McCaw
>  wrote:
> > I'm not a bioinformaticist or programmer so apologies if this is a silly
> question. I've been occasionally running galaxy on my laptop and on the
> public server and I love it. The issue that I have is that my workflow
> requires many steps (what I do is probably very unusual). Each step creates
> a new large fastq file as the sequences are iteratively trimmed of junk.
> This fills my laptop and fills the public server with lots of unnecessary
> very large files.
> >
> > I've been thinking about the structure of the files and my workflow and
> it seems to me that a more space efficient system would be to have a single
> file (or a sql database) on which each tool can work. Most of what I do is
> remove adapter sequences, extract barcodes, trim by quality, map to the
> genome and then process my hits by type (exon, intron etc). Since the clean
> up tools in FASTX aren't written with my problem in mind, it takes several
> passes to get the sequences trimmed up before mapping.
> >
> > If I had a file that had a format something like (here as tab delimited):
> > Header  Seq Phred   Start   Len Barcode etc
> > Each tool could read the Seq and Phred starting at Start and running Len
> nucleotides and work on that. The tool could then write a new Start and Len
> to reflect the trimming it has done[1]. For convenience let me call this an
> HSPh format.
> >
> > So it would be a real pain, no doubt, to rewrite all the tools. The
> little that I can read the tools it seems that the way the input is handled
> internally varies quite a bit. But it seems to me (naively?) that it would
> be relatively easy to write a conversion tool that would take the HSPh
> format and turn it into fas

Re: [galaxy-dev] disk space and file formats

2011-09-02 Thread Edward Kirton
> What, like a BAM file of unaligned reads? Uses gzip compression, and
> tracks the pairing information explicitly :) Some tools will already take
> this as an input format, but not all.

ah, yes, precisely.  i actually think illumina's pipeline produces
files in this format now.
wrappers which create a temporary fastq file would need to be created
but that's easy enough.
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] disk space and file formats

2011-09-02 Thread Edward Kirton
>>> i actually think illumina's pipeline produces files in this format 
>>>(unaligned-bam) now.

> Oh do they? - that's interesting. Do you have a reference/link?

i caught wind of this at the recent illumina user's conference but i
asked someone in our sequencing team to confirm and he hadn't heard of
this.  it must be limited to the forthcoming miseq sequencer for the
timebeing, but may make it's way to the big sequencers later.
apparently illumina is thinking about storage as well.  i seem to
recall the speaker saying they won't produce srf files anymore, but
again, this was a talk about the miseq so may not apply to the other
sequencers.

>>> wrappers which create a temporary fastq file would need to be created
>>> but that's easy enough.

>> My argument against that is the cost of going from BAM -> temp
>> fastq may be prohibitive, e.g. the need to generate very large
>> temp fastq files on the fly as input for various applications may
>> lead one back to just keeping a permanent FASTQ around anyway.

> True - if you can't update the tools you need to take BAM.
> In some cases at least you can pipe the gzipped FASTQ
> into alignment tools which accepts FASTQ on stdin, so
> there is no temp file per se.

the tools really do need to support the format; the tmpfile was simply
a workaround.  some tools already support bam, more currently support
fastq.gz.  (someone here made the wrong bet years ago and had adopted
a site-wide fastq.bz2 standard which only recently changed to
fastq.gz.)  but if illumina does start producing bam files in the
future, then we can expect more tools to support that format.  until
they do, probably fastq.gz is a safe bet.

of course there is a computational cost to compressing/uncompressing
files but that's probably better than storing unnecessarily huge
files.  it's a trade-off.

similarly, there's a trade-off involved in limiting read qc tools to a
single/few big tools which wrap several tools, with many options.
users can't play around with read qc but that may be too expensive
(computationally and storage-wise).  for the most part, a standard qc
will do.  one can spend a lot of time and effort to squeeze a bit more
useful data out of a bad library, for example, when they probably
should have just sequenced another library.  i favor leaving the
playing around to the r&d/development/qc team and just offering a
canned/vetted qc solution to the average user.

>> I recall hdf5 was planned as an alternate format (PacBio uses
>> it, IIRC), and of course there is NCBI's .sra format.  Anyone
>> using the latter two?
> Moving from the custom BGZF modified gzip format used in
> BAM to HD5 has been proposed on the samtools mailing list
> (as Chris knows), and there is a proof of principle implementation
> too in BioHDF, http://www.hdfgroup.org/projects/biohdf/
> The SAM/BAM group didn't seem overly enthusiastic though.
> For the NCBI's .sra format, there is no open specification, just
> their public domain source code:
> http://seqanswers.com/forums/showthread.php?t=12054

i believe hdf5 is an indexed data structure which, as you mentioned,
isn't required for unprocessed reads.

since i'm rapidly running out of storage, i think the best immediate
solution for me is to deprecate all the fastq datatypes in favor of a
new fastqsangergz and to bundle the read qc tools to eliminate
intermediate files.  sure, users won't be able to play around with
their data as much, but my disk is 88% full and my cluster has been
100% occupied for 2-months straight, so less choice is probably
better.

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] disk space and file formats

2011-09-02 Thread Edward Kirton
> In your position I agree that is a pragmatic choice.

Thanks for helping me muddle through my options.

> You might be able to
> modify the file upload code to gzip any FASTQ files... that would prevent
> uncompressed FASTQ getting into new histories.

Right!

> I wonder if Galaxy would benefit from a new fastqsanger-gzip (etc) datatype?
> However this seems generally useful (not just for FASTQ) so perhaps a more
> general mechanism would be better where tool XML files can say which file
> types they accept and which of those can/must be compressed (possily not
> just gzip format?).

Perhaps we can flesh-out what more general solutions would look like...

Imagine the fastq datatypes were left alone and instead there's a
mechanism by which files which haven't been used as input for x days
get compressed by a cron job.  the file server knows how to uncompress
such files on the fly when needed.  For the most part, files are
uncompressed during analysis and are compressed when the files exist
as an archive within galaxy.

An even simpler solution would be an archive/compress button which
users could use when they're done with a history.  Users could still
copy (uncompressed) datasets into a new history for further analysis.

Of course there's also the solution mentioned in the 2010 galaxy
developer's conference about automatic compression at the system
level.  Not a possibility for me, but is attractive.
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Tophat non Sanger input

2011-09-07 Thread Edward Kirton
seems unnecessary since illumina switched over to fastqsanger now.

http://www.illumina.com/truseq/quality_101/quality_scores.ilmn

On Wed, Aug 31, 2011 at 12:45 AM, Stephen Taylor <
stephen.tay...@imm.ox.ac.uk> wrote:

> Hi,
>
> Is there any plans to enhance the tophat wrapper to accept non Sanger
> fastqs, as for bowtie?
>
> https://bitbucket.org/galaxy/**galaxy-central/changeset/**7a9476924daf
>
> ?
>
> Kind regards and thanks,
>
> Steve
> __**_
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>  http://lists.bx.psu.edu/
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Automatically removing items from history

2011-09-07 Thread Edward Kirton
i'm resurrecting this thread to see if there's any more support for the idea
of deleting intermediate files in a workflow.  i think this is an important
feature to have.  oftentimes a workflow creates many intermediate files no
one will ever look at.  and leaving it up to the user to cleanup their data
files is asking too much.  there's another ticket regarding allow users to
still be able to preview the metadata of deleted workflow history items and
together these would go together nicely.

On Wed, Jan 26, 2011 at 1:36 PM, Dannon Baker  wrote:

> Marcel,
>
> It isn't currently possible to delete datasets from within workflows.  If
> I'm understanding your situation correctly, to simplify the view of a
> history after running a workflow, you could use the workflow output toggle
> (the asterisk next to each output in the workflow editor view) and select
> only the particular outputs you actually want to see.  The rest of the
> outputs are still in the destination history, but hidden from view by
> default.  This won't help you with disk space, though, so you might also
> want to use the new functionality available for placing the results of a
> workflow in a new history to accomplish what you want, deleting the previous
> history that was cluttered with inputs after the workflow has run.
>
> Regarding the Galaxy API, you can look at the README file in  the
> scripts/api/ directory of your galaxy installation (also at
> https://bitbucket.org/galaxy/galaxy-central/src/76c18b38a3b8/scripts/api/README)
> for a some examples and hints to get you started.  More detailed
> documentation should be available soon though.
>
> -Dannon
>
>
>
>
> On Jan 20, 2011, at 4:15 AM, Kempenaar, M (med) wrote:
>
> > Hello,
> >
> > Is it possible to let a tool in a workflow manage the users history? For
> instance, I have a tool that takes a variable number of user-uploaded CSV
> files as input and merges them into one file. Before running this tool, the
> user thus has a number of CSV files in the history which I would like to
> 'replace' with the single merged file.
> > I was hoping that when I remove a file from disk that it would also be
> removed from the history, but that only results in a 'The resource could not
> be found.' error.
> > Alternatively, is it possible to create a composite datafile/history item
> from uploaded files? In any case, I would like to remove the separately
> uploaded files from the users history which makes selecting files for the
> next step(s) easier.
> >
> > (this is a bit of a followup for:
> http://gmod.827538.n3.nabble.com/Creating-new-datatype-with-variable-number-of-input-files-td2248444.html
> )
> >
> > One other question, is there (any) documentation available on the Galaxy
> API or a set of example files?
> >
> > Thanks for your input!
> >
> > Regards,
> >
> > Marcel.
> >
> >
> > De inhoud van dit bericht is vertrouwelijk en alleen bestemd voor de
> geadresseerde(n). Anderen dan de geadresseerde(n) mogen geen gebruik maken
> van dit bericht, het niet openbaar maken of op enige wijze verspreiden of
> vermenigvuldigen. Het UMCG kan niet aansprakelijk gesteld worden voor een
> incomplete aankomst of vertraging van dit verzonden bericht.
> >
> > The contents of this message are confidential and only intended for the
> eyes of the addressee(s). Others than the addressee(s) are not allowed to
> use this message, to make it public or to distribute or multiply this
> message in any way. The UMCG cannot be held responsible for incomplete
> reception or delay of this transferred message.
> >
> > ___
> > galaxy-dev mailing list
> > galaxy-dev@lists.bx.psu.edu
> > http://lists.bx.psu.edu/listinfo/galaxy-dev
>
> ___
> galaxy-dev mailing list
> galaxy-dev@lists.bx.psu.edu
> http://lists.bx.psu.edu/listinfo/galaxy-dev
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] disk space and file formats

2011-09-08 Thread Edward Kirton
copied from another thread:

On Thu, Sep 8, 2011 at 7:30 AM, Anton Nekrutenko  wrote:

> What we are thinking of lately is switching to unaligned BAM for
> everyting. One of the benefits here is the ability to add readgroups from
> day 1 simplifying multisample analyses down the road.
>

this seems to be the simplest solution; i like it a lot.  really, only the
reads need to be compressed, most other outfiles are tiny by comparison, so
a more general solution may be overkill.  and if compression of everything
is desired, zfs works well -- another of our sites (LANL) uses this and
recommended it to me too.  i just haven't been able to convince my own IT
people to go this route for technical reason beyond my attention span.

On Tue, Sep 6, 2011 at 9:05 AM, Peter Cock wrote:

> On Tue, Sep 6, 2011 at 5:00 PM, Nate Coraor  wrote:
> > Peter Cock wrote:
> >> On Tue, Sep 6, 2011 at 3:24 PM, Nate Coraor  wrote:
> >> > Ideally, there'd just be a column on the dataset table indicating
> >> > whether the dataset is compressed or not, and then tools get a new
> >> > way to indicate whether they can directly read compressed inputs, or
> >> > whether the input needs to be decompressed first.
> >> >
> >> > --nate
> >>
> >> Yes, that's what I was envisioning Nate.
> >>
> >> Are there any schemes other than gzip which would make sense?
> >> Perhaps rather than a boolean column (compressed or not), it
> >> should specify the kind of compression if any (e.g. gzip).
> >
> > Makes sense.
> >
> >> We need something which balances compression efficiency (size)
> >> with decompression speed, while also being widely supported in
> >> libraries for maximum tool uptake.
> >
> > Yes, and there's a side effect of allowing this: you may decrease
> > efficiency if the tools used downstream all require decompression,
> > and you waste a bunch of time decompressing the dataset multiple
> > times.
>
> While decompression wastes CPU time and makes things slower,
> there is less data IO from disk (which may be network mounted)
> which makes things faster. So overall, depending on the setup
> and the task at hand, it could be faster.
>
> Is it time to file an issue on bitbucket to track this potential
> enhancement?
>
> Peter
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] default quotas by mail domain

2011-09-09 Thread Edward Kirton
with the new quotas, there's nothing to prevent a user from generating
multiple accounts using assorted free email accounts.  may i suggest the
ability to have default quotas by mail domain?  i would like to give
gmail/yahoo or *.com users tiny quotas, larger quotas for *.edu, *.gov
users, and largest quotas for users with our own mail domains e.g. @lbl.gov.


alternatively i'd like to limit user registration by mail domain (disallow
gmail, etc.).

comments?
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] The need for wrappers

2011-10-13 Thread Edward Kirton
wrappers are also used when a tool produces several files in an output
folder (i.e. executable takes outdir parameter, not explicitly named outfile
paths) and you would like to move these to the desired paths under files/
(i.e. a composite datatype is not desired) and/or cleanup unnecessary files.

On Mon, Sep 19, 2011 at 11:41 PM, Peter Cock wrote:

> Hi Timothy,
>
>
> On Tuesday, September 20, 2011, Timothy Wu <2hug...@gmail.com> wrote:
> > Hi,
> >
> > Looking under the tools directory I realized that some tools
> > comes with *_wrapper.py. I wonder under what circumstances
> > is this needed?
>
> Mostly messing about with stderr (there is a long standing
> bug open about this handling this in Galaxy itself) or with
> filenames (moving/renaming outout, sometimes copying/
> linking input) for annoying tools which insist on fixed
> names or extensions.
>
>
> > I'm trying to do a quickie for RepeatMasker. It looks to
> > me like the tool does not allow one to specify the name
> > of the output files. It also appears to me that tools should
> > have an $output variable specifies within the 
> > tag. So I was wondering if this is one case where a
> > wrapper.py is needed.
>
> Yes. Unless the tool can write it's output to stout,
> In which case the  can use that to
> capture stout to the filename Galaxy selects for
> the output file.
>
> Peter
>
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>  http://lists.bx.psu.edu/
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] FTP UPLOAD

2011-10-24 Thread Edward Kirton
it shouldn't copy the file, it should move it, so there's no duplication.
what's the ownership and permission of the file after upload?
and don't use the database/files/ folder!

On Thu, Oct 13, 2011 at 9:09 AM, alessandro albiero <
alessandroalbi...@gmail.com> wrote:

> galaxy copy the file in the dir file_path (defined in universe_wgsi.uni).
>
> In this manner we have two copy of the same file,
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] only showing first lines of large output file

2011-11-28 Thread Edward Kirton
hi matthias,

i've also been getting metadata errors (below) for several weeks, but it
works with the set_metadata_externally option (in the universe config file)
set to False.  for large files, it could take many minutes for your job
server to check the metadata, and no new jobs will be run or completed jobs
picked up during this delay, but at least it works reliably.

fyi, the time these errors appeared roughly coincided with when i moved the
data files (database/) to a different disk than the galaxy code.  i don't
know for sure if that's the issue or not, but may be a clue; i will
experiment when i get a chance.

galaxy.datatypes.metadata DEBUG 2011-11-28 12:39:01,129 setting metadata
externally failed for Hist
oryDatasetAssociation 449: External set_meta() not called
galaxy.jobs DEBUG 2011-11-28 12:39:01,221 job 355 ended
galaxy.datatypes.metadata DEBUG 2011-11-28 12:39:01,357 Cleaning up
external metadata files
galaxy.datatypes.metadata DEBUG 2011-11-28 12:39:01,496 Failed to cleanup
MetadataTempFile temp fil
es from
../../../groupdirs/Galaxy/dev/database/tmp/metadata_out_HistoryDatasetAssociation_449_sB_cT
B: No JSON object could be decoded: line 1 column 0 (char 0)


On Wed, Nov 16, 2011 at 3:00 AM, Mattias de Hollander <
m.dehollan...@nioo.knaw.nl> wrote:

> **
> Hello,
>
> I just noticed I get metadata errors in the paster.log file. Can someone
> tell me what they are and if they are related to above problem?
>
> galaxy.jobs.runners.drmaa DEBUG 2011-11-16 11:55:28,037 (840/588) state 
> change: job finished normally
> galaxy.datatypes.metadata DEBUG 2011-11-16 11:55:28,231 setting metadata 
> externally failed for HistoryDatasetAssociation 1680: [Errno 2] No such file 
> or directory: 
> '../../galaxyData/tmp/metadata_in_HistoryDatasetAssociation_1680_N7JVtq'
> galaxy.datatypes.metadata DEBUG 2011-11-16 11:55:28,325 setting metadata 
> externally failed for HistoryDatasetAssociation 1679: [Errno 2] No such file 
> or directory: 
> '../../galaxyData/tmp/metadata_in_HistoryDatasetAssociation_1679_oTzTG_'
> galaxy.datatypes.metadata DEBUG 2011-11-16 11:55:28,634 setting metadata 
> externally failed for HistoryDatasetAssociation 1678: [Errno 2] No such file 
> or directory: 
> '../../galaxyData/tmp/metadata_in_HistoryDatasetAssociation_1678_MgW15e'galaxy.jobs
>  DEBUG 2011-11-16 11:55:29,103 job 840 ended
> galaxy.datatypes.metadata DEBUG 2011-11-16 11:55:29,103 Cleaning up external 
> metadata files
> galaxy.datatypes.metadata DEBUG 2011-11-16 11:55:29,152 Failed to cleanup 
> MetadataTempFile temp files from 
> ../../galaxyData/tmp/metadata_out_HistoryDatasetAssociation_1678_UTGZz8: 
> [Errno 2] No such file or directory: 
> '../../galaxyData/tmp/metadata_out_HistoryDatasetAssociation_1678_UTGZz8'
> galaxy.datatypes.metadata DEBUG 2011-11-16 11:55:29,152 Failed to cleanup 
> external metadata file (filename_in) for HistoryDatasetAssociation_1678: 
> [Errno 2] No such file or directory: 
> '../../galaxyData/tmp/metadata_in_HistoryDatasetAssociation_1678_MgW15e'
> galaxy.datatypes.metadata DEBUG 2011-11-16 11:55:29,152 Failed to cleanup 
> external metadata file (filename_out) for HistoryDatasetAssociation_1678: 
> [Errno 2] No such file or directory: 
> '../../galaxyData/tmp/metadata_out_HistoryDatasetAssociation_1678_UTGZz8'
> galaxy.datatypes.metadata DEBUG 2011-11-16 11:55:29,153 Failed to cleanup 
> external metadata file (filename_kwds) for HistoryDatasetAssociation_1678: 
> [Errno 2] No such file or directory: 
> '../../galaxyData/tmp/metadata_kwds_HistoryDatasetAssociation_1678_3SDMLo'
>
>
> And of course, how to solve it :)
>
> Any help is appreciated.
>
>
> Mattias
>
> --
> Bioinformatician
> Netherlands Institute of Ecology (NIOO-KNAW)
> Wageningen, the Netherlands
>
>
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>  http://lists.bx.psu.edu/
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Job output not returned from cluster

2011-11-28 Thread Edward Kirton
hi, we've had this issue too -- in short, the cluster node(s) finish
writing outfiles to disk, but the file system (inode metadata) isn't
updated at the galaxy server yet when galaxy checks for the files.

turning the metadata caching off (as recommended on the galaxy wiki) isn't
an option for me (and the performance hit would be significant), so i added
some loops around the file checking (5sec sleep and retry up to 6 times).
 there were a couple of places this probably should be done (not just
.[eo]* log files but also the outfiles).

i am testing these hacks now but due to the intermittent nature of these
errors, it'll be a few days before i know if this is working as expected.
 once vetted, i will put these minor edits in a clone of galaxy-central so
the changes can be picked up.

ed

On Mon, Oct 24, 2011 at 10:24 AM, Nate Coraor  wrote:

> Joseph Hargitai wrote:
> > Nate,
> >
> > this error is intermittent. You resubmit the same job twice or three
> time and then it works.  Once we are over the midterm exams - which use
> galaxy - we will try to switch the filesystem from autofs to hard mount. We
> suspect this to be the issue.
>
> Ah, I suspect this is attribute caching in NFS.  Try mounting with the
> option 'noac' and see if it solves the problem.
>
> > Could we suppress e and o SGE style to resolve this issue, or Galaxy
> wants the o?
>
> The filename is unimportant, but I doubt it's the cause.
>
> > Do you have an idea about the url build for galaxy - ucsc page return
> when the url is :8080/galaxy and not just /galaxy?
>
> Not off the top of my head.  I have this message marked, I'll take a
> look as soon as I have time.
>
> --nate
>
> >
> > thanks,
> > joe
> >
> > 
> > From: Nate Coraor [n...@bx.psu.edu]
> > Sent: Friday, October 21, 2011 10:26 AM
> > To: Joseph Hargitai
> > Cc: galaxy-dev@lists.bx.psu.edu
> > Subject: Re: [galaxy-dev] Job output not returned from cluster
> >
> > Joseph Hargitai wrote:
> > >
> > > Hi,
> > >
> > > i was browsing through the list and found many entries for this issue
> but not a definite answer.
> > >
> > > We are actually running into this error for simple file uploads from
> the internal filesystem.
> >
> > Hi Joe,
> >
> > This error occurs when the job's standard output and error files are not
> > found where Galaxy expects them, namely:
> >
> > /.o
> > /.e
> >
> > Please check your queueing system to make sure it can correctly deliver
> > these back from the execution hosts to the specified filesystem.
> >
> > --nate
> >
> > >
> > > thanks,
> > > joe
> > >
> >
> > > ___
> > > Please keep all replies on the list by using "reply all"
> > > in your mail client.  To manage your subscriptions to this
> > > and other Galaxy lists, please use the interface at:
> > >
> > >   http://lists.bx.psu.edu/
> >
> >
> >
> >
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>  http://lists.bx.psu.edu/
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] lims integration

2011-12-01 Thread Edward Kirton
Hi Chris, unfortunately none of us here have played around with the API
yet.  I would recommend inquiring on the galaxy-central's mailing list (
mailto:galaxy-dev@lists.bx.psu.edu ).

- workflows, histories, libraries, and datasets have IDs in the database
but they may be obscured in the URLs used in galaxy; in the db they're just
integer primary keys
- histories must exist to do any work, but you can create a new history

On Thu, Dec 1, 2011 at 4:32 PM, Craig Blackhart  wrote:

> I am newish to Galaxy and trying to learn how I might integrate it with
> our workflows and LIMS for automated data handling.  I am aware of the API
> and have looked up all the documentation that I could find.  However, there
> are many things I cannot make sense of, and have not been able to find
> information to help me out.  I think a good place to start asking questions
> is with how to run workflow_execute.py and ask what each of the parameters
> are and where to get the information from them
>
> ** **
>
> Arguments
>
>*API key – got this and understand
>
>*url – got this and understand
>
>*workflow_id – I have created workflows and have been able
> to find what looks to be a workflow_id by clicking on the workflow name and
> selecting “Download or Export”.  It seems this may be correct, is it?
>
>*history – a named history to use?  Should this already
> exist?  I have no idea here.
>
>*step=src=dataset_id - ??? I have no idea ???  I have seen
> how to create data libraries manually at the command line; does this factor
> in?
>
> ** **
>
> If anyone has information they can help me out with, it would be much
> appreciated.
>
> ** **
>
> Thanks
>
> ** **
>
> Craig Blackhart
>
> Computer Scientist
>
> Applied Engineering Technologies
>
> Los Alamos National Laboratory
>
> 505-665-6588
>
> *This message contains no information that requires ADC review*
>
> ** **
>
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>  http://lists.bx.psu.edu/
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Job output not returned from cluster

2011-12-01 Thread Edward Kirton
yes, i think galaxy already grabs these files.  i seem to recall this
process would get stuck if the output was too large (i was running
something with a --debug/verbose option and galaxy would not finish the job
even though it was off the cluster -- had to redirect to a log file).

so i guess others aren't having the same problems as i had, which is good
news

On Thu, Dec 1, 2011 at 10:01 AM, Nate Coraor  wrote:

>
> On Nov 29, 2011, at 9:22 PM, Fields, Christopher J wrote:
>
> > On Nov 29, 2011, at 3:13 AM, Peter Cock wrote:
> >
> >> On Monday, November 28, 2011, Joseph Hargitai <
> joseph.hargi...@einstein.yu.edu> wrote:
> >>> Ed,
> >>>
> >>> we had the classic goof on our cluster with this. 4 nodes could not
> see the /home/galaxy folder due to a missing entry in /etc/fstab. When the
> jobs hit those nodes (which explains the randomness) we got the error
> message.
> >>>
> >>> Bothersome was the lack of good logs to go on. The error message was
> too generic - however I discovered that Galaxy was depositing the error and
> our messages in the /pbs folder and you could briefly read them before they
> got deleted. There the message was the classic SGE input/output message -
> /home/galaxy file not found.
> >>>
> >>> Hence my follow up question - how can I have galaxy NOT to delete
> these SGE error and out files?
> >>>
> >>> best,
> >>> joe
> >>
> >> Better yet, Galaxy should read the SGE o and e files and record their
> contents as it would for a directly executed tools stdout and stderr.
> >>
> >> Peter
> >
> > ...or at least have the option to do so, maybe a level of verbosity.  I
> have been bitten by lack of stderr output myself, where having it might
> have saved some manual debugging.
>
> Unless I'm misunderstanding, this is what Galaxy already does.
>  stdout/stderr up to 32K are read from .o and .e and stored in
> job.stdout/job.stderr.  We do need to just store them as files and make
> them accessible for each tool run, this will hopefully happen sometime
> soonish.
>
> --nate
>
> >
> > chris
> > ___
> > Please keep all replies on the list by using "reply all"
> > in your mail client.  To manage your subscriptions to this
> > and other Galaxy lists, please use the interface at:
> >
> >  http://lists.bx.psu.edu/
>
>
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>  http://lists.bx.psu.edu/
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Galaxy Hang after DrmCommunicationException

2012-01-12 Thread Edward Kirton
sometimes the scheduler can't keep up with all the work in it's 15sec
cycle, so it doesn't respond to some messages.  here's a fix i've been
trying that seems to work.

in lib/galaxy/jobs/runners/drmaa.py:

def check_watched_items( self ):
"""
Called by the monitor thread to look at each watched job and deal
with state changes.
"""
new_watched = []
for drm_job_state in self.watched:
job_id = drm_job_state.job_id
galaxy_job_id = drm_job_state.job_wrapper.job_id
old_state = drm_job_state.old_state
try:
state = self.ds.jobStatus( job_id )
# InternalException was reported to be necessary on some DRMs,
but
# this could cause failures to be detected as completion!
 Please
# report if you experience problems with this.
except ( drmaa.InvalidJobException, drmaa.InternalException ),
e:
# we should only get here if an orphaned job was put into
the queue at app star
tup
log.debug("(%s/%s) job left DRM queue with following
message: %s" % ( galaxy_jo
b_id, job_id, e ) )
self.work_queue.put( ( 'finish', drm_job_state ) )
continue
# BEGIN DRM TIMEOUT: Don't fail on scheduler communication
error (probably just too busy)
except ( drmaa.DrmCommunicationException ), e:
 log.warning("(%s/%s) DRM Communication Exception" % (
galaxy_job_id, job_id
 ))
continue
# END DRM TIMEOUT

On Wed, Jan 11, 2012 at 9:18 AM, Ann Black  wrote:

> Good Morning galaxy group!
>
> I was hoping that someone might have some ideas on a problem we have
> experienced a handful of times running galaxy on our local cluster.
>
> Occasionally we experience some communication timeouts between out cluster
> head node and a compute node which will self heal. However, this in turn
> will hang galaxy.  Below you will see output from our galaxy log file.
>  When the ERROR happens (which is not often) it consistently seems to hang
> galaxy.  We have to kill it off and restart it. We are running galaxy as a
> single PID at this time (we are still just testing it out, etc) and it is
> running on our head node (which we plan to move off of in the future).
>
> galaxy.jobs.runners.drmaa DEBUG 2012-01-10 19:19:58,800 (1654/698075)
> state change: job is running
> galaxy.jobs.runners.drmaa ERROR 2012-01-10 20:57:47,021 (1654/698075)
> Unable to check job status
> Traceback (most recent call last):
>   File "/data/galaxy-dist/lib/galaxy/jobs/runners/drmaa.py", line 236, in
> check_watched_items
> state = self.ds.jobStatus( job_id )
>   File "/data/galaxy-dist/eggs/drmaa-0.4b3-py2.7.egg/drmaa/__init__.py",
> line 522, in jobStatus
> _h.c(_w.drmaa_job_ps, jobName, _ct.byref(status))
>   File "/data/galaxy-dist/eggs/drmaa-0.4b3-py2.7.egg/drmaa/helpers.py",
> line 213, in c
> return f(*(args + (error_buffer, sizeof(error_buffer
>   File "/data/galaxy-dist/eggs/drmaa-0.4b3-py2.7.egg/drmaa/errors.py",
> line 90, in error_check
> raise _ERRORS[code-1]("code %s: %s" % (code, error_buffer.value))
> DrmCommunicationException: code 2: failed receiving gdi request response
> for mid=24442 (got syncron message receive timeout error).
> galaxy.jobs.runners.drmaa WARNING 2012-01-10 20:58:05,090 (1654/698075)
> job will now be errored
> galaxy.jobs.runners.drmaa DEBUG 2012-01-10 20:59:06,396 (1654/698075)
> User killed running job, but error encountered removing from DRM queue:
> code 2: failed receiving gdi request response for mid=2 (got syncron
> message receive timeout error).
> galaxy.datatypes.metadata DEBUG 2012-01-10 20:59:06,896 Cleaning up
> external metadata files
> galaxy.datatypes.metadata DEBUG 2012-01-10 20:59:06,947 Failed to cleanup
> MetadataTempFile temp files from
> database/tmp/metadata_out_HistoryDatasetAssociation_2913_ZUTgBy: No JSON
> object could be decoded: line 1 column 0 (char 0)
> galaxy.datatypes.metadata DEBUG 2012-01-10 20:59:09,640 Cleaning up
> external metadata files
> galaxy.jobs INFO 2012-01-10 20:59:09,697 job 1656 unable to run: one or
> more inputs in error state
> galaxy.datatypes.metadata DEBUG 2012-01-10 20:59:10,121 Cleaning up
> external metadata files
> galaxy.jobs INFO 2012-01-10 20:59:10,159 job 1655 unable to run: one or
> more inputs in error state
> galaxy.datatypes.metadata DEBUG 2012-01-10 20:59:12,076 Cleaning up
> external metadata files
> galaxy.jobs INFO 2012-01-10 20:59:12,126 job 1657 unable to run: one or
> more inputs in error state
> galaxy.datatypes.metadata DEBUG 2012-01-10 20:59:13,601 Cleaning up
> external metadata files
> galaxy.jobs INFO 2012-01-10 20:59:13,650 job 1658 unable to run: one or
> more inputs in error state
>
>
> Has anyone else experienced this or have some ideas on how we can further
> debug to figure out why galaxy hangs?
>
> Thanks much!
>
> Ann Black-Ziegelbein
>
> 

[galaxy-dev] how to use projects for fair-share on compute-cluster

2012-01-12 Thread Edward Kirton
Galaxy sites usually do all work a compute cluster, with all jobs submitted
as a "galaxy" unix user, so there isn't any "fair-share" accounting between
users.

Other sysops have created a solution to run jobs as the actual unix user,
which may be feasible for an intranet site but is undesirable for a site
accessible via the internet due to security reasons.

A simpler and more secure method to enable fair-share is by using projects.

Here's a simple scenario and straightforward solution:  Multiple groups in
an organization use the same galaxy site and it is desirable to enable
fair-share accounting between the groups.  All users in a group consume the
same fair-share, which is generally acceptable.

1) configure scheduler with a project for each group, configure each user
to use their group's project by default, and grant galaxy user access to
submit jobs to any project; all users should be associated with a project.
 There's a good chance your grid is already configured this way.

2) create a database which maps galaxy user id to a project; i use a cron
job to create a standalone sqlite3 db.  since this is site-specific, code
is not provided but hints are given below.  Rather than having a separate
database, the proj could have been added to the galaxy db, but i sought to
minimize my changes.

3) add a snippet of code to drmaa.py's queue_job method to lookup proj from
job_wrapper.user_id and append to jt.nativeSpecification; see below

Here are the changes required.  It's small enough that I didn't do this as
a clone/patch.

(1) lib/galaxy/jobs/runners/drmaa.py:

 11 import sqlite3
 12
...
155 native_spec = self.get_native_spec( runner_url )
156
157 # BEGIN ADD USER'S PROJ
158 if self.app.config.user_proj_map_db is not None:
159 try:
160 conn = sqlite3.connect(self.app.config.user_proj_map_db)
161 c = conn.cursor()
162 c.execute('SELECT PROJ FROM USER_PROJ WHERE GID=?',
[job_wrapper.user_id])
163 row = c.fetchone()
164 c.close
165 native_spec += ' -P ' + row[0]
166 except:
167 log.debug("Cannot look up proj of user %s" %
job_wrapper.user_id)
168 # END ADD USER'S PROJ

(2) lib/galaxy/config.py: add support for user_proj_map_db variable

self.user_proj_map_db = resolve_path( kwargs.get(
"user_proj_map_db", None ), self.root )

(3) universe_wsgi.ini:

user_proj_map_db = /some/path/to/user_proj_map_db.sqlite

(4) here's some suggestions to help get you started on a script to make the
sqlite3 db.

a) parse ldap tree example: (to get uid:email)
ldapsearch -LLL -x -b 'ou=aliases,dc=jgi,dc=gov'

b) parse scheduler config: (to get uid:proj)
qconf -suserl | /usr/bin/xargs -I '{}' qconf -suser '{}' | egrep
'name|default_project'

c) query galaxy db: (to get gid:email)
select id, email from galaxy_user;

The limitation of this method is that all jobs submitted by a user will
always be charged to the same project (which may be okay, depending on how
your organization uses projects).  However a user may have access to
several projects and may wish to associate some jobs with a particular
project.  This could be accomplished by adding an option to the user
preferences; a user would chose a project from their available projects and
any jobs submitted would have to record their currently chosen project.
 Alternatively, histories could be associated with a particular project.
 This solution would require significant changes to galaxy, so i haven't
implemented it (and the simple solution works well enough for me).

Edward Kirton
US DOE JGI
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Galaxy Hang after DrmCommunicationException

2012-01-13 Thread Edward Kirton
i had seen the job process die with this error:

if state != old_state:
UnboundLocalError: local variable 'state' referenced before assignment

since the DRM timeout is an intermittent error, i'm not absolutely positive
i have nailed it (a bit more time will tell -- i'm keeping an eye on it
with debug log messages), but it seems so.  i intended to share this as a
patched clone when i became sure but when i read your email i just sent
what i had immediately.  let us know if that seems to solve the problem for
you, so we'll have some confirmation.

glad to help,
ed

p.s. i have another patch for "output not returned from cluster" errors
that i'm also still validating, due to NFS race since we don't have inode
metadata caching turned off as the galaxy developers suggest.

On Fri, Jan 13, 2012 at 8:06 AM, Ann Black  wrote:

> Thanks so much! I have applied the fix to our env.
>
> In looking over the logic, there was an existing exception block that
> would have caught the communication exception generically – but the job
> moved (in this scenario erroneously) the job into a failure workflow.  I
> would like to understand what ended up hanging galaxy – so it must be
> related to a valid job being moved into failure state ? Did you follow it
> down the rabbit hole by any chance to see what caused the hang in your env
> ?
>
> Thanks again,
>
> Ann
>
> From: Edward Kirton 
> Date: Thu, 12 Jan 2012 13:00:27 -0800
> To: Ann Black 
> Cc: "galaxy-dev@lists.bx.psu.edu" 
> Subject: Re: [galaxy-dev] Galaxy Hang after DrmCommunicationException
>
> sometimes the scheduler can't keep up with all the work in it's 15sec
> cycle, so it doesn't respond to some messages.  here's a fix i've been
> trying that seems to work.
>
> in lib/galaxy/jobs/runners/drmaa.py:
>
> def check_watched_items( self ):
> """
> Called by the monitor thread to look at each watched job and deal
> with state changes.
> """
> new_watched = []
> for drm_job_state in self.watched:
> job_id = drm_job_state.job_id
> galaxy_job_id = drm_job_state.job_wrapper.job_id
> old_state = drm_job_state.old_state
> try:
> state = self.ds.jobStatus( job_id )
> # InternalException was reported to be necessary on some DRMs,
> but
> # this could cause failures to be detected as completion!
>  Please
> # report if you experience problems with this.
> except ( drmaa.InvalidJobException, drmaa.InternalException ),
> e:
> # we should only get here if an orphaned job was put into
> the queue at app star
> tup
> log.debug("(%s/%s) job left DRM queue with following
> message: %s" % ( galaxy_jo
> b_id, job_id, e ) )
> self.work_queue.put( ( 'finish', drm_job_state ) )
> continue
> # BEGIN DRM TIMEOUT: Don't fail on scheduler communication
> error (probably just too busy)
> except ( drmaa.DrmCommunicationException ), e:
> log.warning("(%s/%s) DRM Communication Exception" % (
> galaxy_job_id, job_id
>  ))
> continue
> # END DRM TIMEOUT
>
> On Wed, Jan 11, 2012 at 9:18 AM, Ann Black  wrote:
>
>> Good Morning galaxy group!
>>
>> I was hoping that someone might have some ideas on a problem we have
>> experienced a handful of times running galaxy on our local cluster.
>>
>> Occasionally we experience some communication timeouts between out
>> cluster head node and a compute node which will self heal. However, this in
>> turn will hang galaxy.  Below you will see output from our galaxy log file.
>>  When the ERROR happens (which is not often) it consistently seems to hang
>> galaxy.  We have to kill it off and restart it. We are running galaxy as a
>> single PID at this time (we are still just testing it out, etc) and it is
>> running on our head node (which we plan to move off of in the future).
>>
>> galaxy.jobs.runners.drmaa DEBUG 2012-01-10 19:19:58,800 (1654/698075)
>> state change: job is running
>> galaxy.jobs.runners.drmaa ERROR 2012-01-10 20:57:47,021 (1654/698075)
>> Unable to check job status
>> Traceback (most recent call last):
>>   File "/data/galaxy-dist/lib/galaxy/jobs/runners/drmaa.py", line 236, in
>> check_watched_items
>> state = self.ds.jobStatus( job_id )
>>   File "/data/galaxy-dist/eggs/drmaa-0.4b3-py2.7.egg/drmaa/__init__.py",
>> line 522, in jobStatus
>> _h.c(_w.d

Re: [galaxy-dev] how to use projects for fair-share on compute-cluster

2012-01-13 Thread Edward Kirton
correction: i didn't adequately test what happens if the user_proj_map_db
was not defined in the universe file; here's the changes:

157 # BEGIN ADD USER'S PROJ
158 try:
159 conn = sqlite3.connect(self.app.config.user_proj_map_db)
160 c = conn.cursor()
161 c.execute('SELECT PROJ FROM USER_PROJ WHERE GID=?',
[job_wrapper.user_id])
162 row = c.fetchone()
163 c.close
164 native_spec += ' -P ' + row[0]
165 except:
166 log.debug("Cannot look up proj of user %s" %
job_wrapper.user_id)
167 # /END ADD USER PROJ

also, in the config, define a default instead of using None:
self.user_proj_map_db = resolve_path( kwargs.get(
"user_proj_map_db", "database/user_proj_map.sqlite" ), self.root )

one last note: there doesn't seem to be any error displayed to the user if
the job cannot be scheduled because the galaxy user doesn't have
permissions to use the user's project (although there is a log entry), but
the job will never be scheduled.  so be sure the galaxy user has
permissions to submit to all possible projects.
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Galaxy Hang after DrmCommunicationException

2012-01-18 Thread Edward Kirton
excellent, thanks for the correction!

On Wed, Jan 18, 2012 at 8:59 AM, Shantanu Pavgi  wrote:

>
>  Ed,
>
>  I think you may want to add job with DrmCommunicationException error
> back into the watched_jobs list.
> {{{
>  except ( drmaa.DrmCommunicationException ), e:
>   # Catch drmaa communication exception, log a warning message and
>   # continue to monitor jobs.
>   log.warning("(%s/%s) Couldn't communicate with the cluster scheduler to
> check job status." %( galaxy_job_id, job_id ))
>   # keep old job state
>   new_watched.append( drm_job_state )
>   continue
> }}}
>
>  Here is a patch and related discussion on it:
> https://bitbucket.org/galaxy/galaxy-central/pull-request/23/catch-drmcommunication-exception
>
>
>  --
> Shantanu
>
>
>  On Jan 13, 2012, at 2:11 PM, Edward Kirton wrote:
>
>  i had seen the job process die with this error:
>
>  if state != old_state:
> UnboundLocalError: local variable 'state' referenced before assignment
>
>  since the DRM timeout is an intermittent error, i'm not absolutely
> positive i have nailed it (a bit more time will tell -- i'm keeping an eye
> on it with debug log messages), but it seems so.  i intended to share this
> as a patched clone when i became sure but when i read your email i just
> sent what i had immediately.  let us know if that seems to solve the
> problem for you, so we'll have some confirmation.
>
>  glad to help,
> ed
>
>  p.s. i have another patch for "output not returned from cluster" errors
> that i'm also still validating, due to NFS race since we don't have inode
> metadata caching turned off as the galaxy developers suggest.
>
> On Fri, Jan 13, 2012 at 8:06 AM, Ann Black  wrote:
>
>>  Thanks so much! I have applied the fix to our env.
>>
>>  In looking over the logic, there was an existing exception block that
>> would have caught the communication exception generically – but the job
>> moved (in this scenario erroneously) the job into a failure workflow.  I
>> would like to understand what ended up hanging galaxy – so it must be
>> related to a valid job being moved into failure state ? Did you follow it
>> down the rabbit hole by any chance to see what caused the hang in your env
>> ?
>>
>>  Thanks again,
>>
>>  Ann
>>
>>   From: Edward Kirton 
>> Date: Thu, 12 Jan 2012 13:00:27 -0800
>> To: Ann Black 
>> Cc: "galaxy-dev@lists.bx.psu.edu" 
>> Subject: Re: [galaxy-dev] Galaxy Hang after DrmCommunicationException
>>
>>  sometimes the scheduler can't keep up with all the work in it's 15sec
>> cycle, so it doesn't respond to some messages.  here's a fix i've been
>> trying that seems to work.
>>
>>  in lib/galaxy/jobs/runners/drmaa.py:
>>
>>  def check_watched_items( self ):
>> """
>> Called by the monitor thread to look at each watched job and deal
>> with state changes.
>> """
>> new_watched = []
>> for drm_job_state in self.watched:
>> job_id = drm_job_state.job_id
>> galaxy_job_id = drm_job_state.job_wrapper.job_id
>> old_state = drm_job_state.old_state
>> try:
>> state = self.ds.jobStatus( job_id )
>> # InternalException was reported to be necessary on some
>> DRMs, but
>> # this could cause failures to be detected as completion!
>>  Please
>> # report if you experience problems with this.
>> except ( drmaa.InvalidJobException, drmaa.InternalException
>> ), e:
>> # we should only get here if an orphaned job was put into
>> the queue at app star
>> tup
>> log.debug("(%s/%s) job left DRM queue with following
>> message: %s" % ( galaxy_jo
>> b_id, job_id, e ) )
>> self.work_queue.put( ( 'finish', drm_job_state ) )
>> continue
>> # BEGIN DRM TIMEOUT: Don't fail on scheduler communication
>> error (probably just too busy)
>> except ( drmaa.DrmCommunicationException ), e:
>> log.warning("(%s/%s) DRM Communication Exception" % (
>> galaxy_job_id, job_id
>>  ))
>> continue
>> # END DRM TIMEOUT
>>
>> On Wed, Jan 11, 2012 at 9:18 AM, Ann Black wrote:
>>
>>>  Good Morning galaxy group!
>>>
>>>  I was hoping that someone might have some ideas on a 

Re: [galaxy-dev] Galaxy Hang after DrmCommunicationException

2012-01-18 Thread Edward Kirton
perhaps a 15sec sleep should also be added, since the scheduler is
overwhelmed

{{{
except ( drmaa.DrmCommunicationException ), e:
  # Catch drmaa communication exception, log a warning message and
  # continue to monitor jobs.
  log.warning("(%s/%s) Couldn't communicate with the cluster scheduler to
check job status." %( galaxy_job_id, job_id ))
  # give scheduler time to catch up
  time.sleep( 15 )
  # keep old job state
  new_watched.append( drm_job_state )
  continue
}}}


On Wed, Jan 18, 2012 at 1:07 PM, Edward Kirton  wrote:

> excellent, thanks for the correction!
>
>
> On Wed, Jan 18, 2012 at 8:59 AM, Shantanu Pavgi  wrote:
>
>>
>>  Ed,
>>
>>  I think you may want to add job with DrmCommunicationException error
>> back into the watched_jobs list.
>> {{{
>>  except ( drmaa.DrmCommunicationException ), e:
>>   # Catch drmaa communication exception, log a warning message and
>>   # continue to monitor jobs.
>>   log.warning("(%s/%s) Couldn't communicate with the cluster scheduler to
>> check job status." %( galaxy_job_id, job_id ))
>>   # keep old job state
>>   new_watched.append( drm_job_state )
>>   continue
>> }}}
>>
>>  Here is a patch and related discussion on it:
>> https://bitbucket.org/galaxy/galaxy-central/pull-request/23/catch-drmcommunication-exception
>>
>>
>>  --
>> Shantanu
>>
>>
>>  On Jan 13, 2012, at 2:11 PM, Edward Kirton wrote:
>>
>>  i had seen the job process die with this error:
>>
>>  if state != old_state:
>> UnboundLocalError: local variable 'state' referenced before assignment
>>
>>  since the DRM timeout is an intermittent error, i'm not absolutely
>> positive i have nailed it (a bit more time will tell -- i'm keeping an eye
>> on it with debug log messages), but it seems so.  i intended to share this
>> as a patched clone when i became sure but when i read your email i just
>> sent what i had immediately.  let us know if that seems to solve the
>> problem for you, so we'll have some confirmation.
>>
>>  glad to help,
>> ed
>>
>>  p.s. i have another patch for "output not returned from cluster" errors
>> that i'm also still validating, due to NFS race since we don't have inode
>> metadata caching turned off as the galaxy developers suggest.
>>
>> On Fri, Jan 13, 2012 at 8:06 AM, Ann Black wrote:
>>
>>>  Thanks so much! I have applied the fix to our env.
>>>
>>>  In looking over the logic, there was an existing exception block that
>>> would have caught the communication exception generically – but the job
>>> moved (in this scenario erroneously) the job into a failure workflow.  I
>>> would like to understand what ended up hanging galaxy – so it must be
>>> related to a valid job being moved into failure state ? Did you follow it
>>> down the rabbit hole by any chance to see what caused the hang in your env
>>> ?
>>>
>>>  Thanks again,
>>>
>>>  Ann
>>>
>>>   From: Edward Kirton 
>>> Date: Thu, 12 Jan 2012 13:00:27 -0800
>>> To: Ann Black 
>>> Cc: "galaxy-dev@lists.bx.psu.edu" 
>>> Subject: Re: [galaxy-dev] Galaxy Hang after DrmCommunicationException
>>>
>>>  sometimes the scheduler can't keep up with all the work in it's 15sec
>>> cycle, so it doesn't respond to some messages.  here's a fix i've been
>>> trying that seems to work.
>>>
>>>  in lib/galaxy/jobs/runners/drmaa.py:
>>>
>>>  def check_watched_items( self ):
>>> """
>>> Called by the monitor thread to look at each watched job and deal
>>> with state changes.
>>> """
>>> new_watched = []
>>> for drm_job_state in self.watched:
>>> job_id = drm_job_state.job_id
>>> galaxy_job_id = drm_job_state.job_wrapper.job_id
>>> old_state = drm_job_state.old_state
>>> try:
>>> state = self.ds.jobStatus( job_id )
>>> # InternalException was reported to be necessary on some
>>> DRMs, but
>>> # this could cause failures to be detected as completion!
>>>  Please
>>> # report if you experience problems with this.
>>> except ( drmaa.InvalidJobException, drmaa.InternalException
>>> ), e:
>>> # we should only get he

Re: [galaxy-dev] Galaxy Hang after DrmCommunicationException

2012-01-20 Thread Edward Kirton
yes, nate but that fails the job but it is, in fact, still running and
the error should be ignored

except Exception, e:
# so we don't kill the monitor thread
log.exception("(%s/%s) Unable to check job status" % (
galaxy_job_id, job_id ) )
log.warning("(%s/%s) job will now be errored" % (
galaxy_job_id, job_id ) )
drm_job_state.fail_message = "Cluster could not complete job"
self.work_queue.put( ( 'fail', drm_job_state ) )
continue

On Fri, Jan 20, 2012 at 9:40 AM, Nate Coraor  wrote:
>
> Hi Ann,
>
> The cause of the exception aside, this should be caught by the except block 
> below it in drmaa.py (in check_watched_items()):
>
>            except Exception, e:
>                # so we don't kill the monitor thread
>                log.exception("(%s/%s) Unable to check job status" % ( 
> galaxy_job_id, job_id ) )
>
> What changeset are you running?
>
> --nate

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] how to use projects for fair-share on compute-cluster

2012-01-20 Thread Edward Kirton
Great idea, Nate (hint! hint!).

On Thu, Jan 19, 2012 at 10:27 AM, Nate Coraor  wrote:
> Hey Ed,
>
> This is a neat approach.  You could possibly also do this in the Galaxy 
> database by associating users and groups with roles that match project names. 
>  A select list or history default that allowed users to select their 
> "current" project/role would remove the single-project-per-user limitation.
>
> --nate
>
> On Jan 13, 2012, at 3:17 PM, Edward Kirton wrote:
>
>> correction: i didn't adequately test what happens if the user_proj_map_db 
>> was not defined in the universe file; here's the changes:
>>
>> 157         # BEGIN ADD USER'S PROJ
>> 158         try:
>> 159             conn = sqlite3.connect(self.app.config.user_proj_map_db)
>> 160             c = conn.cursor()
>> 161             c.execute('SELECT PROJ FROM USER_PROJ WHERE GID=?', 
>> [job_wrapper.user_id])
>> 162             row = c.fetchone()
>> 163             c.close
>> 164             native_spec += ' -P ' + row[0]
>> 165         except:
>> 166                 log.debug("Cannot look up proj of user %s" % 
>> job_wrapper.user_id)
>> 167         # /END ADD USER PROJ
>>
>> also, in the config, define a default instead of using None:
>>         self.user_proj_map_db = resolve_path( kwargs.get( 
>> "user_proj_map_db", "database/user_proj_map.sqlite" ), self.root )
>>
>> one last note: there doesn't seem to be any error displayed to the user if 
>> the job cannot be scheduled because the galaxy user doesn't have permissions 
>> to use the user's project (although there is a log entry), but the job will 
>> never be scheduled.  so be sure the galaxy user has permissions to submit to 
>> all possible projects.
>> ___
>> Please keep all replies on the list by using "reply all"
>> in your mail client.  To manage your subscriptions to this
>> and other Galaxy lists, please use the interface at:
>>
>>  http://lists.bx.psu.edu/
>

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Timeout and Galaxy - Cluster could not complete job

2012-02-28 Thread Edward Kirton
i believe the latest stable update of galaxy included changes to drmaa.py
which allows a job to be rechecked indefinitely with regard to scheduler
communication errors, so perhaps your "cluster could not complete job"
errors are due to a filesystem race condition, whereby the cluster node
completes the job but the inode metadata table updates haven't propagated
completely so the files appear to be missing to the job runner, on a
different server.  in this case, the config variable you want to increase
is the new "retry_job_output_collection", also part of the last update to
stable.

On Wed, Feb 22, 2012 at 5:52 AM, Aurélien Bernard <
aurelien.bern...@univ-montp2.fr> wrote:

> Hello everybody :)
>
>
> Today, I have a question related to timeout management  in Galaxy.
>
> More particularly, I'm searching for a way to set (in a configuration file
> if possible) all timeouts related to DRMAA and timeouts related to
> communication between Galaxy and SGE.
>
>
> My goal is to increase current timeouts to avoid the "Cluster could not
> complete job" error on successful jobs when there is a temporary problem of
> "job status checking" (due to heavy write load on the hard drive or
> whatever).
>
>
> Is this possible ?
>
>
> Thank you in advance,
>
> Have a nice day
>
> A. Bernard
>
> --
> Aurélien Bernard
> IE Bioprogrammeur - CNRS
> Université des sciences Montpellier II
> Institut des Sciences de l'Evolution
> France
>
> __**_
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>  http://lists.bx.psu.edu/
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] HMMER wrappers

2012-02-29 Thread Edward Kirton
hi, peter - i will fix the description and upload hmmsearch and infernal
today.

On Wed, Feb 29, 2012 at 2:05 AM, Peter Cock wrote:

> On Tue, Feb 28, 2012 at 10:31 PM, Dave Clements
>  wrote:
> > Hi Peter,
> >
> > I think hmmerscan has been wrapped, but there is a missing "e" in the
> > repository name.  Look for "hmmscan" in the toolshed.  The description
> for
> > it is:
> >
> > hmmscan, for searching pfam with AA seqs. also included is hmm datatypes
> > although hmmbuild, etc. isnt included yetX ill try to finish the rest of
> the
> > hmmer suite soon
> >
> > Does that help?
> >
> > Dave C.
>
> Very helpful Dave, thanks.
>
> Actually hmmscan is the name of one of the HMMER binaries, rather than
> hmmerscan (with an 'er' in the middle, which was a typo on my part).
>
> I've included Edward directly in this thread hoping he can clarify which
> version of HMMER he was wrapping (2 vs 3).
>
> Also I'd like to ask Edward if he could add the word HMMER to the tool's
> description to facilitate future searchers (and HMMER2 or HMMER3 as
> appropriate).
>
> I might be interested in helping to extend the wrapper for more of the
> HMMER3 toolkit. In particular hmmsearch might be relevant, which
> is designed for one (or a few) profiles vs. a big seq database while
> hmmscan is designed for one (or a few) sequences vs. a big profile
> database. See http://selab.janelia.org/people/eddys/blog/?p=424
>
> Regards,
>
> Peter
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] HMMER wrappers

2012-02-29 Thread Edward Kirton
great suggestion; i'll make those changes

On Wed, Feb 29, 2012 at 10:05 AM, Peter Cock wrote:

> On Wed, Feb 29, 2012 at 5:56 PM, Edward Kirton  wrote:
> > hi, peter - i will fix the description and upload hmmsearch and infernal
> > today.
>
> Great.
>
> Assuming hmmscan and hmmsearch have (almost) the same command
> line API, there is something to be said for presenting them as one tool
> in Galaxy, with a drop down selection between them (with help text
> about which is recommend adapted from the HMMER blog post). One
> could even have an automatic selection by a wrapper script based on
> the number of query sequences and the number of HMMs. My thinking
> here is the detail of hmmscan vs hmmsearch is purely an implementation
> detail that the end user shouldn't have to worry about.
>
> Or just duplicate most of the XML code and have two wrappers. But
> as far as I know there isn't (yet) a nice way of reusing XML snippets
> between tool wrappers... which would be handy.
>
> Thanks,
>
> Peter
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] HMMER wrappers

2012-03-05 Thread Edward Kirton
i created a new toolshed repo, hmmer since i couldn't rename it.  as
suggested, it has the hmmscan/hmmsearch as one tool, plus hmmpress.  will
add hmmbuild, hmmalign asap; others upon request.

dave, is there a way to delete an old tool?  (hmmscan)

On Wed, Feb 29, 2012 at 10:12 AM, Edward Kirton  wrote:

> great suggestion; i'll make those changes
>
> On Wed, Feb 29, 2012 at 10:05 AM, Peter Cock wrote:
>
>> On Wed, Feb 29, 2012 at 5:56 PM, Edward Kirton  wrote:
>> > hi, peter - i will fix the description and upload hmmsearch and infernal
>> > today.
>>
>> Great.
>>
>> Assuming hmmscan and hmmsearch have (almost) the same command
>> line API, there is something to be said for presenting them as one tool
>> in Galaxy, with a drop down selection between them (with help text
>> about which is recommend adapted from the HMMER blog post). One
>> could even have an automatic selection by a wrapper script based on
>> the number of query sequences and the number of HMMs. My thinking
>> here is the detail of hmmscan vs hmmsearch is purely an implementation
>> detail that the end user shouldn't have to worry about.
>>
>> Or just duplicate most of the XML code and have two wrappers. But
>> as far as I know there isn't (yet) a nice way of reusing XML snippets
>> between tool wrappers... which would be handy.
>>
>> Thanks,
>>
>> Peter
>>
>
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] HMMER wrappers

2012-03-06 Thread Edward Kirton
i gave it a bad name previously and peter didn't find it in a search at
first, but i couldn't change the name, so i created a new repository.  i
put a note in the old repo (hmmscan), referring to the new repo (hmmer).
 that may suffice. however you folks how you want to handle it is fine.

On Tue, Mar 6, 2012 at 3:38 AM, Greg Von Kuster  wrote:

> Hi Ed,
>
> Do you want the hmmscan repository itself deleted?  It's been downloaded /
> cloned 86 times, although it is never been automatically installed into a
> local Galaxy since the contained tool does not properly load into Galaxy.
>  We generally do not like to delete things like this because doing do
> prevents reproducibility.  I'm looking for feedback from the community on
> this one - does eliminating this repository affect anyone?
>
> Thanks for the new contributions tot he tool shed!
>
> Greg Von Kuster
>
> On Mar 6, 2012, at 1:47 AM, Edward Kirton wrote:
>
> i created a new toolshed repo, hmmer since i couldn't rename it.  as
> suggested, it has the hmmscan/hmmsearch as one tool, plus hmmpress.  will
> add hmmbuild, hmmalign asap; others upon request.
>
> dave, is there a way to delete an old tool?  (hmmscan)
>
> On Wed, Feb 29, 2012 at 10:12 AM, Edward Kirton  wrote:
>
>> great suggestion; i'll make those changes
>>
>> On Wed, Feb 29, 2012 at 10:05 AM, Peter Cock 
>> wrote:
>>
>>> On Wed, Feb 29, 2012 at 5:56 PM, Edward Kirton  wrote:
>>> > hi, peter - i will fix the description and upload hmmsearch and
>>> infernal
>>> > today.
>>>
>>> Great.
>>>
>>> Assuming hmmscan and hmmsearch have (almost) the same command
>>> line API, there is something to be said for presenting them as one tool
>>> in Galaxy, with a drop down selection between them (with help text
>>> about which is recommend adapted from the HMMER blog post). One
>>> could even have an automatic selection by a wrapper script based on
>>> the number of query sequences and the number of HMMs. My thinking
>>> here is the detail of hmmscan vs hmmsearch is purely an implementation
>>> detail that the end user shouldn't have to worry about.
>>>
>>> Or just duplicate most of the XML code and have two wrappers. But
>>> as far as I know there isn't (yet) a nice way of reusing XML snippets
>>> between tool wrappers... which would be handy.
>>>
>>> Thanks,
>>>
>>> Peter
>>>
>>
>>
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>  http://lists.bx.psu.edu/
>
>
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] HMMER wrappers

2012-03-07 Thread Edward Kirton
good tip

On Tue, Mar 6, 2012 at 3:28 PM, Peter Cock wrote:

> marked as hidden in the XML
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Merging BLAST database support into Galaxy?

2012-04-18 Thread Edward Kirton
sounds great, thanks peter.  i granted you access to my toolshed repo, but
perhaps we want only one tool in the toolshed when all done.

On Wed, Apr 18, 2012 at 3:20 AM, Peter Cock wrote:

> On Wed, Apr 18, 2012 at 10:53 AM, Peter Cock 
> wrote:
> > Hi Edward,
> >
> > We're now running BLAST+ searches on our local Galaxy via our cluster,
> > and some of the cluster nodes have relatively small amounts of RAM.
> > This means I've become more aware of limitations in the NCBI BLAST+
> > tools' support for using a subject FASTA file (instead of making a local
> > BLAST database), which turns out to be surprisingly RAM hungry.
> >
> > The logical step is to allow users to build a BLAST database as a new
> > datatype in Galaxy - which is what you (Edward) did some time ago as
> > a fork, later posted to the Galaxy Tool Shed.
> >
> > Edward - are you happy for me to merge your work into the main
> > wrappers? I mentioned idea this a couple of months ago:
> > http://lists.bx.psu.edu/pipermail/galaxy-dev/2012-February/008544.html
>
> Note this will take some extra work - we need to support protein
> BLAST databases as well, not just nucleotide database.
>
> Peter
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Merging BLAST database support into Galaxy?

2012-04-26 Thread Edward Kirton
your suggestion for blastdbn and blastdbp sounds fine.
it's okay if a few of our users need to edit the metadata of the dbs in
their history.
thanks for asking and doing this.

On Thu, Apr 26, 2012 at 5:37 AM, Peter Cock wrote:

> Hi Edward,
>
> I've started work on this in earnest now. I see you only defined one
> new datatype, blastdb, which worked for nucleotide databases.
> I want to handle protein databases too, so I think two datatypes
> makes sense - which I am currently calling blastdbn and blastdbp.
>
> That won't be compatible with your existing tools & history, but
> other than that seems sensible to me. I suppose we could use
> blastdb and blastdb_p which would match the *.loc files?
>
> What do you think?
>
> Peter
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Task Manager: This Galaxy instance is not the job manager.

2012-06-06 Thread Edward Kirton
yes, i've had the same error ever since the last galaxy-dist release.  i
previously had multiple servers and switched to the one manager, two
handlers.  rewrite rules didn't need to be changed.

On Thu, May 24, 2012 at 8:14 AM, Sarah Diehl  wrote:

> **
> Hi all,
>
> I have a similar, maybe related problem. I'm running a configuration as
> described at
> http://wiki.g2.bx.psu.edu/Admin/Config/Performance/Web%20Application%20Scaling.
> I have three webservers, one manager and two handlers. Everything is behind
> an Apache and the rewrite rules are set accordingly.
>
> When I try to access "Manage Jobs", I also get the error "This Galaxy
> instance is not the job manager. If using multiple servers, please directly
> access the job manager instance to manage jobs.". I have set the rewrite
> rule for admin/jobs to point to the manager server. When I access the
> manager directly from localhost I get the same error, while all other
> servers (web and handler) throw a server error:
>
> 127.0.0.1 - - [24/May/2012:15:37:50 +0200] "GET /admin/jobs HTTP/1.1" 500
> - "-" "Mozilla/5.0 (X11; Linux x86_64; rv:10.0.4) Gecko/20120424
> Firefox/10.0.4"
> Error - : 'NoopQueue' object has no
> attribute 'job_lock'
> URL: http://localhost:8080/admin/jobs
> File
> '/galaxy/galaxy_server/eggs/Paste-1.6-py2.7.egg/paste/exceptions/errormiddleware.py',
> line 143 in __call__
>   app_iter = self.application(environ, start_response)
> File '/galaxy/galaxy_server/eggs/Paste-1.6-py2.7.egg/paste/recursive.py',
> line 80 in __call__
>   return self.application(environ, start_response)
> File
> '/galaxy/galaxy_server/eggs/Paste-1.6-py2.7.egg/paste/httpexceptions.py',
> line 632 in __call__
>   return self.application(environ, start_response)
> File '/galaxy/galaxy_server/lib/galaxy/web/framework/base.py', line 160 in
> __call__
>   body = method( trans, **kwargs )
> File '/galaxy/galaxy_server/lib/galaxy/web/framework/__init__.py', line
> 184 in decorator
>   return func( self, trans, *args, **kwargs )
> File '/galaxy/galaxy_server/lib/galaxy/web/base/controller.py', line 2428
> in jobs
>   job_lock = trans.app.job_manager.job_queue.job_lock )
> AttributeError: 'NoopQueue' object has no attribute 'job_lock'
>
> Before the update everything worked fine (I also ran multiple servers
> then).
>
> Best regards,
> Sarah
>
>
>
> On 05/16/2012 10:27 PM, Dave Lin wrote:
>
> Dear Galaxy Team,
>
> I've been getting the following error for some time when I try to access
> the Manage Jobs Task.
>
> "This Galaxy instance is not the job manager. If using multiple servers,
> please directly access the job manager instance to manage jobs."
>
> For debugging purposes, I'm only running the single master instance. I'm
> using CloudMan/Amazon EC2.
>
> I traced the code and suspect it might have something to do with an Amazon
> IP Address/hostname discrepancy, but am not sure how to go about fixing
> this.
>
> On a related note, if I can't access this page, what is the best way to
> clear/cancel jobs via the command line?
>
> Thanks in advance,
> Dave
> Thanks,
> Dave
>
>
>
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>   http://lists.bx.psu.edu/
>
>
>
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>  http://lists.bx.psu.edu/
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] scripts in welcome.html?

2012-06-27 Thread Edward Kirton
yes, javascript and ssi work and i use them for other purposes (grid load
meter) but i've found it convenient to have two welcome.html files -- e.g.
welcome.html.dev, welcome.html.main -- a use a symlink to point to the
correct one.

On Wed, Jun 13, 2012 at 3:37 PM, Smithies, Russell <
russell.smith...@agresearch.co.nz> wrote:

> To stop users forgetting whether they’re on our development or production
> servers I’d like to put a simple script in welcome.html to echo the
> hostname (or some other stuff)
>
> I’ve tried both javascript and php but no luck. Is this supported or
> should I dig further into our Apache config?
>
> Or is there a better way to do it?
>
> ** **
>
> Thanx,
>
> ** **
>
> Russell Smithies
>
>
>
> --
>
> *Attention: *The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities to
> which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
>
> --
>
>
>
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>  http://lists.bx.psu.edu/
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] HMMER wrappers

2012-06-27 Thread Edward Kirton
sorry for the late reply, i've been away for almost a month.  thanks for
your work on this, i'll update the tools and upload to toolshed asap.

On Tue, May 29, 2012 at 3:35 AM, Peter Cock wrote:

> On Fri, May 25, 2012 at 4:54 PM, Peter Cock 
> wrote:
> > Hi Edward,
> >
> > It has taken me a while but I'm how trying to use HMMER3, and do so
> > from within Galaxy.
> >
> > I've realised that the per sequence and per domain tables from hmmscan
> > and hmmsearch (via the --tblout and --domtblout switches) are NOT tab
> > separated, but space separated to give an eye pleasing column based
> > layout.
> >
> > However, your wrapper tells Galaxy they are "tabular". As a result,
> > Galaxy treats them like tables with one column, which means all the
> > table operations like filtering on a particular column are not possible.
> > Has this not affected your users?
> >
> > I ran into some similar problems wrapping other tools giving table based
> > output, and used a wrapper script to make them into tab separated tables
> > for use in Galaxy. e.g. SignalP 3 (spaces), EffectiveT3 (semi-colons).
> >
> > Would you agree that a wrapper script to reformat the HMMER3 tables
> > into tab-separated tables would be the best solution? Would you accept
> > a code contribution to do this?
> >
> > Regards,
> >
> > Peter
>
> Hi Edward,
>
> I've written a simple HMMER3 table to tabular script in Python,
> https://github.com/peterjc/picobio/blob/master/hmmer/hmmer_table2tabular.py
>
> Would you prefer to:
>
> (a) amend the XML to call HMMER, and then call the conversion script
>twice.
>
> (b) turn this into a single wrapper script which calls HMMER3 and
>then converts the two tables (using a multiple command line call
>with shell semi-colon separators).
>
> (c) do something else?
>
> I favour the wrapper script option as more flexible in the long run (e.g.
> for
> error handling and splitting jobs over multiple machines), and the multiple
> command approach may lead to overly long command line strings.
>
> Peter
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Port Config SMTP Server

2012-06-28 Thread Edward Kirton
in the universe_wsgi.ini file, define the server:port like thus:

smtp_server = smtp.gmail.com:587

On Thu, Jun 21, 2012 at 3:23 PM, CHEBBI Mohamed Amine
wrote:

> Hi Galxy-team !
> I have to use the smtp-server in Galaxy. However I don't see how to set
> the port to 587 in the universe_wsgi.ini to enable sending mails from my
> own server. I saw the documetation but I didn't find any response for this.
> Thank-you in advance for your help.
> Regards
> Amine
>
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>  http://lists.bx.psu.edu/
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] output file

2012-07-19 Thread Edward Kirton
change your tool to use explicitly named output folder and files

bash mytool.sh $input $output1.extra_files_path $output1 $output2

where $output1.extra_files_path will be a folder (e.g. working dir)

$output1, $output2, etc. are files that are to be brought into your history
in galaxy, to be used by other tools or viewable by the user.  mv files
from working dir to these locations before exiting.

files in the folder that are not moved to the $output1 location will be
saved but not visible to the user

also see compound datatypes (although they're usually not necessary).

On Sun, May 6, 2012 at 2:56 AM, 張詩婷  wrote:

> Hi,
>
> After execution, my tool created many files in a directory , how can I to
> find this directory?
>
> command:  bash mytool.sh $input
>
>
>
>
>
>
>
>
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>   http://lists.bx.psu.edu/
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Modules

2012-08-29 Thread Edward Kirton
galaxy already has tool-dependencies which can be used.

for example, velvet's tool XML would include:


  velvet


the above will use the default version; or to specify a specific version:


  velvet


in your tool-dependencies dir (as defined in universe_wsgi.ini):

mkdir velvet
cd velvet
mkdir 1.2.07
mkdir DEFAULT
ln -s DEFAULT/ default
echo "module load velvet/1.2.07" > 1.2.07/env.sh
echo "module load velvet" > DEFAULT/env.sh

notes:
- type must be "package"
- "default" must be a symlink
- we had to add "-S /bin/bash -shell y" to DRMAA native spec to get the
bash module function we use to work; may not be an issue with you.

Ed

On Tue, Aug 28, 2012 at 7:28 AM, Fabien Mareuil
wrote:

> Dear Colleagues,
>
> Many teams use module, a user interface for the Modules package. The
> Modules package enable to manage easily several versions of a program with
> a dynamic modification of the user's environment via modulefiles.
> I would like to know if it is planned to integrate the ability to load
> modules via galaxy?
> For example, we could imagine a configuration file that would link the
> name and version of a tool with the module name and allowing to load the
> module before running the tool locally or on a cluster.
>
> Thank you for your answer,
> Best regards,
>
> Fabien Mareuil
>
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>   http://lists.bx.psu.edu/
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] disk space warning

2012-09-10 Thread Edward Kirton
i use a cron job to monitor my disks and email me if they approach capacity.
i'm not sure of the benefit of informing your users; better for a sysop to
take care of it before it becomes a problem.
i also use cron jobs to purge older files from the ftp and tmp dirs.


On Thu, Sep 6, 2012 at 6:48 AM, Mutlu Dogruel wrote:

> Hi, is there a way to warn users when the disk becomes full during a
> URL-based file upload (and local file upload)? I had to manually delete the
> temp files created during a large upload that hanged, after which Galaxy
> could not be re-started due to insufficient disk space.
>
> Thanks.
>
> --
> Mutlu
>
>
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>   http://lists.bx.psu.edu/
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] [galaxy-user] operating on, and getting at, large files in galaxy...

2011-02-17 Thread Edward Kirton
Hi Nick,

Yes, these nextgen reads files are huge and getting bigger every quarter!
 But there will be storage issues nomatter whether you use Galaxy or not.
 In fact, i think users are more likely to cleanup files and histories in
galaxy than they are to cleanup NFS folders -- out of sight, out of mind!

Firstly, I think unnecessary intermediate files are more of a problem than
whether or not the file is compressed or not.  Indeed, just transferring
these files back and forth from the cluster takes a while, not to mention
the delay in waiting to be rescheduled for each step.  And so I created a
tool which would do the job of fastq groomer, end-trimmer, process pairs,
and a few other simple tasks -- all in one shot.  I haven't uploaded it to
the toolshed yet but I will.  I hate to duplicate existing tools, but i have
a lot of seq data.  I will also create a fastqilluminabz2 datatype as well
and include it with the tool.

For getting files into galaxy, I created a simple tool which would allow
staff to enter NFS paths and the option to either copy or symlink if the
location was considered stable.  I allowed only certain folders (e.g. /home,
/storage) and added a password, for security.  Similarly, for getting a file
out, all you need is a dinky tool for users to provide a destination path.
since i've got galaxy running as a special galaxy user in a special galaxy
group, file access is restricted (as it should be), so i tell users to
create a dropbox folder in their homedir (and chmod 777).  by creating a
tool like this, you don't need to care how galaxy names the files.  i
deliberately try to not mess around under the hood.  i can upload these to
galaxy toolshed, but like i said, there isn't much to them.

Ed

On Wed, Feb 9, 2011 at 4:17 AM, Nick Schurch  wrote:

>
> Hi all,
>
> I've recently encountered a few problems when trying to use Galaxy which
> are really driving me away from using it as a bioinformatics platform for
> NGS. I was wonderinf if there are any simple solutions that I've missed...
>
> Firstly, It seems that while there are a few solutions for getting large
> files (a few GB) into a local install of galaxy without going through HTTP,
> many tools that operate on these files produces multiple, uncompressed large
> files which quickly eat up the disk allocation. This is particularly
> significant in a workflow that has multiple processing steps which each
> leave behind a large file. With no way to compress or archive files produced
> by intermediate steps in a workflow, and no desire to delete them since I
> may need to go back to them and they can take hours to re-run, the only two
> remaining options seem to be to save them and then delete them.
>
> And this brings me to the second problem. Getting large files out of
> Galaxy. The only way to save large files from Galaxy (that I can see) is the
> save icon, which downloads the file via http. This take *ages* for a large
> file and also causes big headaches for my firefox browser. I've taken a
> quick peek at the Galaxy file system to see if I could just copy a file, but
> its almost completely indecipherable if you want to find out what file in
> the file system corresponds to a file saved from a tool. Is there some way
> to get the location of a particular file on the galaxy file system, that I
> can just copy?
>
> --
> Cheers,
>
> Nick Schurch
>
> Data Analysis Group (The Barton Group),
> School of Life Sciences,
> University of Dundee,
> Dow St,
> Dundee,
> DD1 5EH,
> Scotland,
> UK
>
> Tel: +44 1382 388707
> Fax: +44 1382 345 893
>
>
>
> --
> Cheers,
>
> Nick Schurch
>
> Data Analysis Group (The Barton Group),
> School of Life Sciences,
> University of Dundee,
> Dow St,
> Dundee,
> DD1 5EH,
> Scotland,
> UK
>
> Tel: +44 1382 388707
> Fax: +44 1382 345 893
>
> ___
> galaxy-user mailing list
> galaxy-u...@lists.bx.psu.edu
> http://lists.bx.psu.edu/listinfo/galaxy-user
>
>
___
To manage your subscriptions to this and other Galaxy lists, please use the 
interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Tools to organize long histories

2011-02-17 Thread Edward Kirton
you can have your workflow hide intermediate files.  also, renaming files
helps.  i usually recommend to users to split their work into several
histories rather than trying to do everything in one huge history (e.g. read
qc, assembly, gene annotation, read mapping, expression analysis)

On Wed, Feb 2, 2011 at 4:54 AM, Leandro Hermida
wrote:

> Hi everyone,
>
> Are there any capabilities or tools in Galaxy to organize long histories
> with many datasets?
>
> best,
> Leandro
>
> ___
> galaxy-dev mailing list
> galaxy-dev@lists.bx.psu.edu
> http://lists.bx.psu.edu/listinfo/galaxy-dev
>
>
___
To manage your subscriptions to this and other Galaxy lists, please use the 
interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Creating new datatype with variable number of input files

2011-02-17 Thread Edward Kirton
use the  tag to accept a list of infiles (and perhaps
labels/timepoint/etc for each).

On Thu, Jan 13, 2011 at 6:22 AM, Kempenaar, M (med)  wrote:

>  Hello,
>
> This is the first question I ask on this list, please let me know if I'm
> 'doing it wrong'.
>
> Currently I'm trying to implement a workflow that uses a sort of microarray
> data which consists of one CSV file for each experiment done. The first step
> in the workflow is to preprocess the data and merge all files into a single
> CSV file (using an R script). Now my question is, is it possible to supply
> the user a Galaxy 'upload file' interface where he/she can:
> - enter the number of experiments done (thus number of files to upload)
> - depending on the above, present one 'select file' button and a new
> metavalue used for naming the experiment
> - directly execute an R script after pressing the 'Execute' button.
> - add the single resulting file from this R script to the users history
> pane (an intermediate composite datatype holding all uploaded files and
> manually executing the script is fine as well)
>
> If this is possible, could you please add a hint in the right direction,
> that would be very helpful! Also, just giving an example of a Galaxy
> workflow that does any of the mentioned would be great too!
> However if this seems to be impossible to do, what are the alternatives?
> Maybe create my own webtool that does the above and link that to my Galaxy?
>
> Thanks in advance for any replies.
>
> - Marcel
>
> --
> De inhoud van dit bericht is vertrouwelijk en alleen bestemd voor de
> geadresseerde(n). Anderen dan de geadresseerde(n) mogen geen gebruik maken
> van dit bericht, het niet openbaar maken of op enige wijze verspreiden of
> vermenigvuldigen. Het UMCG kan niet aansprakelijk gesteld worden voor een
> incomplete aankomst of vertraging van dit verzonden bericht.
>
> The contents of this message are confidential and only intended for the
> eyes of the addressee(s). Others than the addressee(s) are not allowed to
> use this message, to make it public or to distribute or multiply this
> message in any way. The UMCG cannot be held responsible for incomplete
> reception or delay of this transferred message.
>
>
> ___
> galaxy-dev mailing list
> galaxy-dev@lists.bx.psu.edu
> http://lists.bx.psu.edu/listinfo/galaxy-dev
>
>
___
To manage your subscriptions to this and other Galaxy lists, please use the 
interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] MarkupSafe egg cannot be fetched

2011-04-23 Thread Edward Kirton
hello, after pulling the latest changes from galaxy-central, i get the
following error:


WARNING:galaxy.eggs:Warning: MarkupSafe (a dependent egg of Mako)
cannot be fetched


$ python scripts/fetch_eggs.py
Warning: MarkupSafe (a dependent egg of Mako) cannot be fetched
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] quick question: how can i supply the user's email address to a tool?

2011-05-05 Thread Edward Kirton
is there a variable i can use in the tool config xml file?
thanks!
ed
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] quick question: how can i supply the user's email address to a tool?

2011-05-06 Thread Edward Kirton
Hi Nate, can I access the values in the universe_wsgi.ini file?  How would I
retrieve the value of "smtp_server" for example?  (Or do i just parse the
file myself?)
Thanks!

On Fri, May 6, 2011 at 9:06 AM, Nate Coraor  wrote:

> __tool_data_path__ = GALAXY_DATA_INDEX_DIR = universe_wsgi.ini
>tool_data_path value
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] quick question: how can i supply the user's email address to a tool?

2011-05-09 Thread Edward Kirton
awesome; thanks everyone for your help.

On Fri, May 6, 2011 at 9:03 PM, Nate Coraor  wrote:

> Edward Kirton wrote:
> > Hi Nate, can I access the values in the universe_wsgi.ini file?  How
> would I
> > retrieve the value of "smtp_server" for example?  (Or do i just parse the
> > file myself?)
>
> $__app__.config.smtp_server should do it.  Most variables should match
> the name in the config file - a few may not, when in doubt, look in
> lib/galaxy/config.py.
>
> --nate
>
> > Thanks!
> >
> > On Fri, May 6, 2011 at 9:06 AM, Nate Coraor  wrote:
> >
> > > __tool_data_path__ = GALAXY_DATA_INDEX_DIR = universe_wsgi.ini
> > >tool_data_path value
> > >
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] galaxy with SGE cluster

2011-06-17 Thread Edward Kirton
yes, your web server needs to be configured as an sge submit host to work
seamlessly with galaxy.  alternatives include submitting the jobs to the
cluster outside of galaxy using another script that will either ssh or use
expect.  these alternatives are messy and to be avoided unless necessary.
 conditions which would require these solutions include if you wish to
submit to multiple clusters or queues (e.g. user-specific queues,
application-specific clusters) or require cluster jobs to be submitted as
individual users rather than as the galaxy user (eg. for accounting).

On Mon, May 16, 2011 at 10:44 AM, Shantanu Pavgi  wrote:

> Just want to confirm SGE configuration again. As mentioned earlier we
> started with a separate galaxy VM without any SGE installation. The SGE
> master node is installed on a separate system altogether. As I understand
> from your reply, we will need to install SGE on the galaxy VM first and
> configure it as a submit host with the main SGE master node. Is that
> correct? Are their any alternative approaches?
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] fyi: can't upload to toolshed

2011-07-14 Thread Edward Kirton
Server ErrorAn error occurred. See the error logs for more information.
(Turn debug on to display exception reports here)
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Existing efforts to convert the QIIME pipeline to Galaxy?

2011-07-19 Thread Edward Kirton
I took a look and it's a great start.  I can work on it 1d/wk.  I'll email
you again after I push my revisions to the toolshed.  Or email me anytime
you want me to push whatever I've got at that time.
Thanks,
Ed

On Sun, Jul 17, 2011 at 8:39 AM, Jim Johnson  wrote:

>
> Ed,
>
> I pushed what code I had to the toolshed repository "qiime"  and gave you
> push permissions.
> I'm not sure when I'll have time to work on it again.
>
> Thanks,
>
> JJ
>
>
>
> On 7/15/11 3:18 PM, Edward Kirton wrote:
>
> fantastic, thanks!
> i'm also available to help out, so perhaps you can find me something to
> work on.
> ed
>
> On Fri, Jul 15, 2011 at 1:09 PM, Jim Johnson  wrote:
>
>>  I haven't had time to work on it for a couple months.
>> I remember getting stuck on the output of some scripts being a HTML with a
>> file hierarchy underneath.
>> I proposed setting the galaxy routes to accept those, and that has since
>> been incorporated.
>>
>> I'll try to do a run through of what I had, and update what was working to
>> the current galaxy-central tip.
>> Then, maybe I'll bundle that and put it in the toolshed for common
>> development.
>>
>> Does that work for you?
>>
>> Thanks,
>>
>> JJ
>>
>>
>> On 7/15/11 1:58 PM, Edward Kirton wrote:
>>
>> Hello James, I was wondering if you're still tackling the task of adding
>> QIIME to galaxy?  I have some users in my group asking about this and I'd
>> rather not duplicate any efforts.  If you're still working on this, do you
>> have a ballpark estimate of when this will appear in the toolshed?
>> Many thanks,
>> Ed Kirton
>> US DOE JGI (http://galaxy.jgi-psf.org)
>>
>>  p.s. great job on the mothur suite; thanks!  i just found it in the
>> toolshed yesterday.
>>
>> On Wed, Nov 24, 2010 at 7:31 AM, Jim Johnson  wrote:
>>
>>>
>>> Tim,
>>>
>>> I hope to also look at incorporating QIIME into our local Galaxy instance
>>> at the University of Minnesota, but probably won't be able to start for a
>>> couple weeks.  It would be good to develop that in coordination with others.
>>>
>>> I just finished incorporating "Mothur" metagenomics suite
>>> http://www.mothur.org/ (Dr. Patrick Schloss,  Department of Microbiology
>>> & Immunology at The University of Michigan) into our Galaxy server at the
>>> University of Minnesota.   I hope to contribute that to
>>> http://community.g2.bx.psu.edu/ after some testing by our researchers.
>>>  If the Galaxy wrappings for Mothur are of any interest to you, I can send
>>> you a copy any time.
>>>
>>> Thanks,
>>>
>>> JJ
>>>
>>> James E Johnson
>>> Minnesota Supercomputing Institute
>>> University of Minnesota
>>>
>>>
>>> On Nov 23, 2010 at 07:22 AM, Tim te Beek wrote:
>>>
>>>  Hello all,
>>>>
>>>> Is anyone aware of any existing efforts to port the QIIME sequencing
>>>> pipeline (http://qiime.sourceforge.net/) to Galaxy? I would like to run
>>>> QIIME analyses through Galaxy to get better control of
>>>> intermediate processing steps, but before I start to convert (a subset
>>>> of)
>>>> some 90 scripts, I'd first like to make sure this has not been done
>>>> before
>>>> by anyone willing to share their work.
>>>>
>>>> So: has anyone converted the QIIME pipeline to Galaxy before, and would
>>>> they
>>>> be willing to share their scripts?
>>>>
>>>> Best regards,
>>>> Tim
>>>>
>>>
>>> ___
>>> galaxy-dev mailing list
>>> galaxy-dev@lists.bx.psu.edu
>>> http://lists.bx.psu.edu/listinfo/galaxy-dev
>>>
>>
>>
>>
>
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] use remote data without duplication

2011-07-19 Thread Edward Kirton
besides using libraries, you can create a tool to add files from nfs path
which merely symlinks the file.  while libraries are nice, i haven't figured
out how to automatically import libraries, so i created a tool for users to
import sequence data using their "run id" (i.e. in our lims).

On Tue, Jun 28, 2011 at 8:58 AM, Caroline Prenoveau <
caroline.prenovea...@ulaval.ca> wrote:

> Hi,
>
> We are a small lab and we would like to use galaxy to do our analysis.
> However we have a very large amount of data that is stored on several
> machines and that we cannot afford to duplicate. Our galaxy server is
> set up on a different machine. We are looking for a way to use our
> remote data inside galaxy without copying it locally. Any ideas/hints?
>
> Thanks,
>
> Caroline
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>  http://lists.bx.psu.edu/
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] workflows are broken in current galaxy-central version

2011-07-21 Thread Edward Kirton
URL: https://galaxy.jgi-psf.org/workflow/editor?id=3918559ba2a747d8
Module paste.exceptions.errormiddleware:*143* in __call__
Module paste.debug.prints:*98* in __call__
Module paste.wsgilib:*539* in intercept_output
Module paste.recursive:*80* in __call__
Module paste.httpexceptions:*632* in __call__
Module galaxy.web.framework.base:*145* in __call__
>>   body *
=* method*(* trans*,* kwargs *)*
Module galaxy.web.framework:*84* in decorator
>>   *
return* func*(* self*,* trans*,* ***args*,* kwargs *)*
Module galaxy.web.controllers.workflow:*686* in editor
>>   *
return* trans*.*fill_template*(* "workflow/editor.mako"*,* stored*=*stored*,
* annotation*=*self*.*get_item_annotation_str*(* trans*.*sa_session*,* trans
*.*user*,* stored *)* *)*
Module galaxy.web.framework:*661* in fill_template
>>   *
return* self*.*fill_template_mako*(* filename*,* kwargs *)*
Module galaxy.web.framework:*672* in fill_template_mako
>>   *
return* template*.*render*(* data *)*
Module mako.template:*296* in render
>>   *
return* runtime*.*_render*(*self*,* self*.*callable_*,* args*,* data*)*
Module mako.runtime:*660* in _render
>>   
_kwargs_for_callable*(*callable_*,* data*)**)*
Module mako.runtime:*692* in _render_context
>>   
_exec_template*(*inherit*,* lclcontext*,* args*=*args*,* kwargs*=*kwargs*)*
Module mako.runtime:*718* in _exec_template
>>   
callable_*(*context*,* ***args*,* kwargs*)*
Module _base_panels_mako:*87* in render_body
>>   
__M_writer*(*unicode*(*self*.*overlay*(*visible*=*self*.*overlay_visible*)**
)**)*
*TypeError: render_overlay() got an unexpected keyword argument 'visible'*
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] workflows are broken in current galaxy-central version

2011-07-21 Thread Edward Kirton
yup, works.  thanks for the prompt reply

On Thu, Jul 21, 2011 at 10:15 AM, Kanwei Li  wrote:

> Fixed on trunk, thanks for reporting!
>
> K
>
> On Thu, Jul 21, 2011 at 12:41 PM, Edward Kirton  wrote:
>
>>
>> URL: https://galaxy.jgi-psf.org/workflow/editor?id=3918559ba2a747d8
>> Module paste.exceptions.errormiddleware:*143* in __call__
>> Module paste.debug.prints:*98* in __call__
>> Module paste.wsgilib:*539* in intercept_output
>> Module paste.recursive:*80* in __call__
>> Module paste.httpexceptions:*632* in __call__
>> Module galaxy.web.framework.base:*145* in __call__
>> >>   <https://galaxy.jgi-psf.org/workflow/editor?id=3918559ba2a747d8#>
>> body *=* method*(* trans*,* kwargs *)*
>> Module galaxy.web.framework:*84* in decorator
>> >>   <https://galaxy.jgi-psf.org/workflow/editor?id=3918559ba2a747d8#>*
>> return* func*(* self*,* trans*,* ***args*,* kwargs *)*
>> Module galaxy.web.controllers.workflow:*686* in editor
>> >>   <https://galaxy.jgi-psf.org/workflow/editor?id=3918559ba2a747d8#>*
>> return* trans*.*fill_template*(* "workflow/editor.mako"*,* stored*=*
>> stored*,* annotation*=*self*.*get_item_annotation_str*(* trans*.*
>> sa_session*,* trans*.*user*,* stored *)* *)*
>> Module galaxy.web.framework:*661* in fill_template
>> >>   <https://galaxy.jgi-psf.org/workflow/editor?id=3918559ba2a747d8#>*
>> return* self*.*fill_template_mako*(* filename*,* kwargs *)*
>> Module galaxy.web.framework:*672* in fill_template_mako
>> >>   <https://galaxy.jgi-psf.org/workflow/editor?id=3918559ba2a747d8#>*
>> return* template*.*render*(* data *)*
>> Module mako.template:*296* in render
>> >>   <https://galaxy.jgi-psf.org/workflow/editor?id=3918559ba2a747d8#>*
>> return* runtime*.*_render*(*self*,* self*.*callable_*,* args*,* data*)*
>> Module mako.runtime:*660* in _render
>> >>   <https://galaxy.jgi-psf.org/workflow/editor?id=3918559ba2a747d8#>***
>> *_kwargs_for_callable*(*callable_*,* data*)**)*
>> Module mako.runtime:*692* in _render_context
>> >>   <https://galaxy.jgi-psf.org/workflow/editor?id=3918559ba2a747d8#>
>> _exec_template*(*inherit*,* lclcontext*,* args*=*args*,* kwargs*=*kwargs*
>> )*
>> Module mako.runtime:*718* in _exec_template
>> >>   <https://galaxy.jgi-psf.org/workflow/editor?id=3918559ba2a747d8#>
>> callable_*(*context*,* ***args*,* kwargs*)*
>> Module _base_panels_mako:*87* in render_body
>> >>   <https://galaxy.jgi-psf.org/workflow/editor?id=3918559ba2a747d8#>
>> __M_writer*(*unicode*(*self*.*overlay*(*visible*=*self*.*overlay_visible*
>> )**)**)*
>> *TypeError: render_overlay() got an unexpected keyword argument 'visible'
>> *
>>
>> ___
>> Please keep all replies on the list by using "reply all"
>> in your mail client.  To manage your subscriptions to this
>> and other Galaxy lists, please use the interface at:
>>
>>  http://lists.bx.psu.edu/
>>
>
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] possible to undelete a history?

2011-07-22 Thread Edward Kirton
An unfortunate user emailed me and said he accidentally deleted his
histories. :((
How can I help him recover these?
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Existing efforts to convert the QIIME pipeline to Galaxy?

2011-07-27 Thread Edward Kirton
here's what i have; i started categorizing the tools under labels but hadn't
finished.  i'm also working on getting these tools working in galaxy (albeit
slowly as we won't need them until we get a miseq machine in oct), but maybe
we can be in communication with each other and jjohnson.






































































 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 




On Wed, Jul 27, 2011 at 7:01 AM, Amanda Zuzolo  wrote:

> Hello all,
>
> I'm working on a local instance of Galaxy at George Mason University. We'd
> been looking into integrating Qiime and I've found the toolkit very helpful,
> thanks! I did have one question, though: would you mind uploading the
> tool_conf file or chunk of text with the qiime functions in it? This would
> be helpful to myself, as well as others interested in the toolkit!
>
> Thank you,
>
> Amanda Zuzolo
> Bioengineering Major, George Mason University
>
>
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>  http://lists.bx.psu.edu/
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] organizing histories in galaxy

2011-07-28 Thread Edward Kirton
i usually prefix the history names with an identifier so i can search for
them (e.g. "AmiMT: read QC").  but i agree, folders similar to the data
libraries would be useful, so i created a  ticket.

https://bitbucket.org/galaxy/galaxy-central/issue/621/folders-to-organize-saved-histories

On Fri, Jul 22, 2011 at 8:32 PM, Chaolin Zhang wrote:

> Hi,
>
> Is there a way to organize related histories together into a "project" or
> folder?
>
> I realize in a lot of cases, a pipeline is designed for processing of a
> single sample, while a study typically consists of multiple samples that go
> through the same processing. They are then put together for more downstream
> analysis.  It seems to me that it would be logical to put the processing of
> each individual file into a history, and the combined analysis into another,
> which would be bundled together.  I, and other users here have a growing
> list of histories, and it is becoming more and more difficult to organize
> them ...
>
> Thanks!
>
> Chaolin
>
>
>
>
>
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>  http://lists.bx.psu.edu/
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] "Job output not returned from cluster"

2011-07-28 Thread Edward Kirton
I've been getting these errors sometimes lately, particularly when the
cluster is heavily loaded.  The jobs have completed successfully, as I can
see the output if I click the pen icon, but the job is in a failed state.

Have any other sites been experiencing this problem?
Or can the galaxy developers help shed some light on the issue?

FYI, I use the outputs_to_working_directory option in universe_wsgi.ini so
that i can use a faster/more reliable filesystem to collect output from the
cluster.  I'm not using the recently discussed patch to run jobs as the unix
user.

I'll continue to experiment with different filesystems and software
settings.
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/