[galaxy-dev] Galaxy+Toolshed Docker image?

2014-07-21 Thread Kyle Ellrott
I know there is a base docker image for Galaxy stable (
https://registry.hub.docker.com/u/bgruening/galaxy-stable/), but is there a
docker image that will start both a server and a toolshed sever and link
them?
I was hoping that I could use something like that for testing shed based
tool deployment.

Kyle
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Using Mesos to Enable distributed computing under Galaxy?

2014-06-17 Thread Kyle Ellrott
Glad to see someone else is playing around with Mesos.
I have a mesos branch that is getting a little long in the tooth. I'd like
to get a straight job runner (non-LWR, with a shared file system) running
under mesos for Galaxy before I submit that work for a pull request.

The hackathon is only 12 days away! Hopefully we'll be able to make some
progress on these sorts of projects.

Kyle



On Sun, Jun 15, 2014 at 4:06 PM, John Chilton jmchil...@gmail.com wrote:

 Hey Kyle, all,

   If anyone wants to play with running Galaxy jobs within an Apache
 Mesos environment I have added a prototype of this feature to the LWR.


 https://bitbucket.org/jmchilton/lwr/commits/555438d2fe266899338474b25c540fef42bcece7

 https://bitbucket.org/jmchilton/lwr/commits/9748b3035dbe3802d4136a6a1028df8395a9aeb3

 This work distributes jobs across a Mesos cluster and injects a
 MESOS_URL environment variable into the job runtime environment in
 case the jobs themselves want to take advantage of Mesos.

 The advantage of the LWR versus a traditional Galaxy runner is that
 the job can be staged to remote resources without shared disk. Prior
 to this I was imaging the LWR to be useful in cases where Galaxy and
 remote cluster don't share common disk but where there is in fact a
 shared scratch directory or something across the remote cluster as
 well a resource manager. The LWR Mesos framework however has the
 actual compute servers themselves stage the job up and down - so you
 could imagine distributing Galaxy across large clusters without any
 shared disk whatsoever - that could be very cool and help scale say
 cloud applications.

 Downsides of an LWR-based approach versus a Galaxy approach is that it
 is less mature and there is more stuff to configure - need to
 configure a Galaxy job_conf plugin and destination, need to configure
 the LWR itself, need to configure a message queue (for this variant of
 LWR operation anyway - it should be possible to drive this via the LWR
 in web server mode but I haven't added it yet). I would be more than
 happy to continue to see progress toward Mesos support in Galaxy
 proper.

 It is strictly a prototype so far - a sort of playground if anyone
 wants to play with these ideas and build something cool. It really is
 a framework right - not so much a job scheduler so I am not sure it
 is very immediately useful - but I imagine one could build cool stuff
 on top of it.

 Next, I think I would like to add Apache Aurora
 (http://aurora.incubator.apache.org/) support - because it seems like
 a much more traditional resource manager but built on top of Mesos so
 it would be more practical for traditional Galaxy-style jobs. Doesn't
 buy you anything in terms of parallelization but it would fit better
 with Galaxy.

 -John


 On Sat, Oct 26, 2013 at 2:43 PM, Kyle Ellrott kellr...@soe.ucsc.edu
 wrote:
  I think one of the aspects where Galaxy is a bit soft is the ability to
 do
  distributed tasks. The current system of split/replicate/merge tasks
 based
  on file type is a bit limited and hard for tool developers to expand
 upon.
  Distributed computing is a non-trival thing to implement and I think it
  would be a better use of our time to use an already existing framework.
 And
  it would also mean one less API for tool writers to have to develop for.
  I was wondering if anybody has looked at Mesos (
 http://mesos.apache.org/ ).
  You can see an overview of the Mesos architecture at
  https://github.com/apache/mesos/blob/master/docs/Mesos-Architecture.md
  The important thing about Mesos is that it provides an API for C/C++,
  Java/Scala and Python to write distributed frameworks. There are already
  implementations of frameworks for common parallel programming systems
 such
  as:
   - Hadoop (https://github.com/mesos/hadoop)
   - MPI
  (
 https://github.com/apache/mesos/blob/master/docs/Running-torque-or-mpi-on-mesos.md
 )
   - Spark (http://spark-project.org)
  And you can find example Python framework at
  https://github.com/apache/mesos/tree/master/src/examples/python
 
  Integration with Galaxy would have three parts:
  1) Add a system config variable to Galaxy called 'MESOS_URL' that is then
  passed to tool wrappers and allows them to contact the local mesos
  infrastructure (assuming the system has been configured) or pass a null
 if
  the system isn't available.
  2) Write a tool runner that works as a mesos framework to executes single
  cpu jobs on the distributed system.
  3) For instances where mesos is not available at a system wide level (say
  they only have access to an SGE based cluster), but the user wants to run
  distributed jobs, write a wrapper that can create a mesos cluster using
 the
  existing queueing system. For example, right now I run a Mesos system
 under
  the SGE queue system.
 
  I'm curious to see what other people think.
 
  Kyle
 
  ___
  Please keep all replies on the list by using reply all
  in your

[galaxy-dev] Docker based tools

2014-01-03 Thread Kyle Ellrott
I know a few people have talked about using docker to package and
distribute Galaxy tools. The most recent version now runs on all major
linux distributions, so it's a bit more inclusive now (but there probably
won't ever be a windows or mac version). One strategy would be for tool
configs to define an option Docker archive name and possibly an alternate
command line to execute if the docker archive is being used.

How far has that work gotten, and what are people thinking about it?

Kyle
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Using Mesos to Enable distributed computing under Galaxy?

2013-10-28 Thread Kyle Ellrott
I don't think implementation will be very difficult. The bigger question is
this a technology people are open to?
The nearest competitor is YARN (
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html).
Mesos seems a bit more geared toward general purpose usage (with several
existing frameworks), while YARN seems more specific to Hadoop. But I'd be
glad to hear some other thoughts.

Kyle


On Mon, Oct 28, 2013 at 12:55 PM, Ravi K Madduri madd...@mcs.anl.govwrote:

 Kyle
 This is something I am very interested in. The three parts below make
 sense to me. I would be very happy to discuss further and provide any help
 to move this forward.

 Regards
 On Oct 26, 2013, at 2:43 PM, Kyle Ellrott kellr...@soe.ucsc.edu wrote:

 I think one of the aspects where Galaxy is a bit soft is the ability to do
 distributed tasks. The current system of split/replicate/merge tasks based
 on file type is a bit limited and hard for tool developers to expand upon.
 Distributed computing is a non-trival thing to implement and I think it
 would be a better use of our time to use an already existing framework. And
 it would also mean one less API for tool writers to have to develop for.
 I was wondering if anybody has looked at Mesos ( http://mesos.apache.org/). 
 You can see an overview of the Mesos architecture at
 https://github.com/apache/mesos/blob/master/docs/Mesos-Architecture.md
 The important thing about Mesos is that it provides an API for C/C++,
 Java/Scala and Python to write distributed frameworks. There are already
 implementations of frameworks for common parallel programming systems such
 as:
  - Hadoop (https://github.com/mesos/hadoop)
  - MPI (
 https://github.com/apache/mesos/blob/master/docs/Running-torque-or-mpi-on-mesos.md
 )
  - Spark (http://spark-project.org)
 And you can find example Python framework at
 https://github.com/apache/mesos/tree/master/src/examples/python

 Integration with Galaxy would have three parts:
 1) Add a system config variable to Galaxy called 'MESOS_URL' that is then
 passed to tool wrappers and allows them to contact the local mesos
 infrastructure (assuming the system has been configured) or pass a null if
 the system isn't available.
 2) Write a tool runner that works as a mesos framework to executes single
 cpu jobs on the distributed system.
 3) For instances where mesos is not available at a system wide level (say
 they only have access to an SGE based cluster), but the user wants to run
 distributed jobs, write a wrapper that can create a mesos cluster using the
 existing queueing system. For example, right now I run a Mesos system under
 the SGE queue system.

 I'm curious to see what other people think.

 Kyle
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

 To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


 --
 Ravi K Madduri
 MCS, Argonne National Laboratory
 Computation Institute, University of Chicago


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Using Mesos to Enable distributed computing under Galaxy?

2013-10-28 Thread Kyle Ellrott
You probably are a good person to get an opinion from. My plan isn't to
write new frameworks, but rather use existing libraries that can
communicate with Mesos to setup their parallel environments.
But for Swift, you would probably want to write a new framework. Just
looking at Swift, I imagine one of the harder parts is just getting the
system setup on a cluster (ie distributing out files to remote nodes,
making sure that you have a way to start processes on those nodes and have
them know where to find the master), it seems like Swift could benefit from
having a Mesos based framework. Do you think it would enable you to have a
'zero-config' startup of a distributed Swift application?

Kyle



On Mon, Oct 28, 2013 at 1:51 PM, Ketan Maheshwari 
ketancmaheshw...@gmail.com wrote:

 Hi Kyle,

 We have a similar ongoing development wherein we are working on
 integrating our Swift framework ( swift-lang.org ) with Galaxy. The goal
 is to enable Galaxy based applications to run on a variety of distributed
 resources via various integration schemes as suitable to application and
 underlying execution environment.

 Here is an abstract of a paper (co-authored with Ravi, who responded on
 this thread) we will be presenting in a workshop at the upcoming SC 13
 conference:

 The Galaxy platform is a web-based science portal for scientific
 computing supporting Life Sciences users community. While user-friendly and
 intuitive for doing small to medium scale computations, it currently has a
 limited support for large-scale, parallel and distributed computing. The
 Swift parallel scripting framework is capable of composing ordinary
 applications into parallel scripts that can be run on multi-scale
 distributed and performance computing platforms. In complex distributed
 environments, often the user end of application lifecycle slows down
 because of the technical complexities brought in by the scale, access
 methods and resource management nuances. Galaxy offers a simple way of
 designing, composing, executing, reusing, and reproducing application runs.
 An integration between Swift and Galaxy systems can accelerate science as
 well as bring the respective user communities together in an interactive,
 user-friendly, parallel and distributed data analysis environment enabled
 on a broad range of computational infrastructures.

 Kindly let us know if you need a hands on for the various tools we have
 already developed.


 Best,
 Ketan



 On Mon, Oct 28, 2013 at 3:07 PM, Kyle Ellrott kellr...@soe.ucsc.eduwrote:

 I don't think implementation will be very difficult. The bigger question
 is this a technology people are open to?
 The nearest competitor is YARN (
 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html).
 Mesos seems a bit more geared toward general purpose usage (with several
 existing frameworks), while YARN seems more specific to Hadoop. But I'd be
 glad to hear some other thoughts.

 Kyle


 On Mon, Oct 28, 2013 at 12:55 PM, Ravi K Madduri madd...@mcs.anl.govwrote:

 Kyle
 This is something I am very interested in. The three parts below make
 sense to me. I would be very happy to discuss further and provide any help
 to move this forward.

 Regards
 On Oct 26, 2013, at 2:43 PM, Kyle Ellrott kellr...@soe.ucsc.edu wrote:

 I think one of the aspects where Galaxy is a bit soft is the ability to
 do distributed tasks. The current system of split/replicate/merge tasks
 based on file type is a bit limited and hard for tool developers to expand
 upon. Distributed computing is a non-trival thing to implement and I think
 it would be a better use of our time to use an already existing framework.
 And it would also mean one less API for tool writers to have to develop for.
 I was wondering if anybody has looked at Mesos (
 http://mesos.apache.org/ ). You can see an overview of the Mesos
 architecture at
 https://github.com/apache/mesos/blob/master/docs/Mesos-Architecture.md
 The important thing about Mesos is that it provides an API for C/C++,
 Java/Scala and Python to write distributed frameworks. There are already
 implementations of frameworks for common parallel programming systems such
 as:
  - Hadoop (https://github.com/mesos/hadoop)
  - MPI (
 https://github.com/apache/mesos/blob/master/docs/Running-torque-or-mpi-on-mesos.md
 )
  - Spark (http://spark-project.org)
 And you can find example Python framework at
 https://github.com/apache/mesos/tree/master/src/examples/python

 Integration with Galaxy would have three parts:
 1) Add a system config variable to Galaxy called 'MESOS_URL' that is
 then passed to tool wrappers and allows them to contact the local mesos
 infrastructure (assuming the system has been configured) or pass a null if
 the system isn't available.
 2) Write a tool runner that works as a mesos framework to executes
 single cpu jobs on the distributed system.
 3) For instances where mesos is not available at a system wide level
 (say they only have access to an SGE based

[galaxy-dev] Using Mesos to Enable distributed computing under Galaxy?

2013-10-26 Thread Kyle Ellrott
I think one of the aspects where Galaxy is a bit soft is the ability to do
distributed tasks. The current system of split/replicate/merge tasks based
on file type is a bit limited and hard for tool developers to expand upon.
Distributed computing is a non-trival thing to implement and I think it
would be a better use of our time to use an already existing framework. And
it would also mean one less API for tool writers to have to develop for.
I was wondering if anybody has looked at Mesos (
http://mesos.apache.org/). You can see an overview of the Mesos
architecture at
https://github.com/apache/mesos/blob/master/docs/Mesos-Architecture.md
The important thing about Mesos is that it provides an API for C/C++,
Java/Scala and Python to write distributed frameworks. There are already
implementations of frameworks for common parallel programming systems such
as:
 - Hadoop (https://github.com/mesos/hadoop)
 - MPI (
https://github.com/apache/mesos/blob/master/docs/Running-torque-or-mpi-on-mesos.md
)
 - Spark (http://spark-project.org)
And you can find example Python framework at
https://github.com/apache/mesos/tree/master/src/examples/python

Integration with Galaxy would have three parts:
1) Add a system config variable to Galaxy called 'MESOS_URL' that is then
passed to tool wrappers and allows them to contact the local mesos
infrastructure (assuming the system has been configured) or pass a null if
the system isn't available.
2) Write a tool runner that works as a mesos framework to executes single
cpu jobs on the distributed system.
3) For instances where mesos is not available at a system wide level (say
they only have access to an SGE based cluster), but the user wants to run
distributed jobs, write a wrapper that can create a mesos cluster using the
existing queueing system. For example, right now I run a Mesos system under
the SGE queue system.

I'm curious to see what other people think.

Kyle
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

[galaxy-dev] Job listing exception

2013-07-06 Thread Kyle Ellrott
I'm getting an exception when trying to look at the admin job management
screen. It looks like SQLAlchemy doesn't like non-ascii characters. Any
ideas about what to do?


Error - type 'exceptions.UnicodeDecodeError': 'ascii' codec can't decode
byte 0x8b in position 404: ordinal not in range(128)
URL: http://pk.kilokluster.ucsc.edu:8079/admin/jobs
File '/inside/depot4/galaxy/lib/galaxy/web/framework/middleware/error.py',
line 149 in __call__
  app_iter = self.application(environ, sr_checker)
File
'/inside/depot4/galaxy/eggs/Paste-1.7.5.1-py2.7.egg/paste/recursive.py',
line 84 in __call__
  return self.application(environ, start_response)
File
'/inside/depot4/galaxy/eggs/Paste-1.7.5.1-py2.7.egg/paste/httpexceptions.py',
line 633 in __call__
  return self.application(environ, start_response)
File '/inside/depot4/galaxy/lib/galaxy/web/framework/base.py', line 132 in
__call__
  return self.handle_request( environ, start_response )
File '/inside/depot4/galaxy/lib/galaxy/web/framework/base.py', line 190 in
handle_request
  body = method( trans, **kwargs )
File '/inside/depot4/galaxy/lib/galaxy/web/framework/__init__.py', line 221
in decorator
  return func( self, trans, *args, **kwargs )
File '/inside/depot4/galaxy/lib/galaxy/web/base/controllers/admin.py', line
1053 in jobs
  for job in jobs:
File
'/inside/depot4/galaxy/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/orm/query.py',
line 2341 in instances
File
'/inside/depot4/galaxy/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/engine/base.py',
line 3204 in fetchall
File
'/inside/depot4/galaxy/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/engine/base.py',
line 3171 in _fetchall_impl
UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 404:
ordinal not in range(128)
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Job listing exception

2013-07-06 Thread Kyle Ellrott
This also took out my job handlers (exception below). So the introduction
of non-ascii characters into the table (via the job stdout/stderr capture)
can make a galaxy instance pretty useless.

I was able to find the offending records using SELECT count(*) FROM job
WHERE stderr similar to '%\x8b%';
Turns out it was the byproduct of doing a path paste of some gzipped fastq
files (so they where never decompressed, just passed along), and then
fastq_groomer.py complains about the file having an invalid header (line
22?) reports back an invalid header and prints it out, thus the non-ascii
characters in the stderr.

I 'cleaned' the database with 'update job set stderr =
regexp_replace(stderr, '\x8b', '\x5f');'. But there should probably be some
safe guards put in place to stop this from happening.

Kyle



galaxy.jobs.handler INFO 2013-07-06 12:35:32,033 job handler stop queue
started
Traceback (most recent call last):
  File /inside/depot4/galaxy/lib/galaxy/webapps/galaxy/buildapp.py, line
35, in app_factory
app = UniverseApplication( global_conf = global_conf, **kwargs )
  File /inside/depot4/galaxy/lib/galaxy/app.py, line 164, in __init__
self.job_manager = manager.JobManager( self )
  File /inside/depot4/galaxy/lib/galaxy/jobs/manager.py, line 36, in
__init__
self.job_handler.start()
  File /inside/depot4/galaxy/lib/galaxy/jobs/handler.py, line 34, in start
self.job_queue.start()
  File /inside/depot4/galaxy/lib/galaxy/jobs/handler.py, line 77, in start
self.__check_jobs_at_startup()
  File /inside/depot4/galaxy/lib/galaxy/jobs/handler.py, line 92, in
__check_jobs_at_startup
 ( model.Job.handler == self.app.config.server_name ) ):
  File
/inside/depot4/galaxy/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/orm/query.py,
line 2341, in instances
fetch = cursor.fetchall()
  File
/inside/depot4/galaxy/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/engine/base.py,
line 3204, in fetchall
l = self.process_rows(self._fetchall_impl())
  File
/inside/depot4/galaxy/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/engine/base.py,
line 3171, in _fetchall_impl
return self.cursor.fetchall()
UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 404:
ordinal not in range(128)



On Fri, Jul 5, 2013 at 11:43 PM, Kyle Ellrott kellr...@soe.ucsc.edu wrote:

 I'm getting an exception when trying to look at the admin job management
 screen. It looks like SQLAlchemy doesn't like non-ascii characters. Any
 ideas about what to do?


 Error - type 'exceptions.UnicodeDecodeError': 'ascii' codec can't decode
 byte 0x8b in position 404: ordinal not in range(128)
 URL: http://pk.kilokluster.ucsc.edu:8079/admin/jobs
 File '/inside/depot4/galaxy/lib/galaxy/web/framework/middleware/error.py',
 line 149 in __call__
   app_iter = self.application(environ, sr_checker)
 File
 '/inside/depot4/galaxy/eggs/Paste-1.7.5.1-py2.7.egg/paste/recursive.py',
 line 84 in __call__
   return self.application(environ, start_response)
 File
 '/inside/depot4/galaxy/eggs/Paste-1.7.5.1-py2.7.egg/paste/httpexceptions.py',
 line 633 in __call__
   return self.application(environ, start_response)
 File '/inside/depot4/galaxy/lib/galaxy/web/framework/base.py', line 132 in
 __call__
   return self.handle_request( environ, start_response )
 File '/inside/depot4/galaxy/lib/galaxy/web/framework/base.py', line 190 in
 handle_request
   body = method( trans, **kwargs )
 File '/inside/depot4/galaxy/lib/galaxy/web/framework/__init__.py', line
 221 in decorator
   return func( self, trans, *args, **kwargs )
 File '/inside/depot4/galaxy/lib/galaxy/web/base/controllers/admin.py',
 line 1053 in jobs
   for job in jobs:
 File
 '/inside/depot4/galaxy/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/orm/query.py',
 line 2341 in instances
 File
 '/inside/depot4/galaxy/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/engine/base.py',
 line 3204 in fetchall
 File
 '/inside/depot4/galaxy/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs2.egg/sqlalchemy/engine/base.py',
 line 3171 in _fetchall_impl
 UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 404:
 ordinal not in range(128)


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Loading a library of bam files

2013-01-28 Thread Kyle Ellrott
It looks like if I set 'retry_metadata_internally = False' it stops trying
to index them on the queue node. The datasets get added into the library,
without a BAM index file, but without error.
I guess the index files can be generated on demand later on.

Kyle


On Mon, Jan 28, 2013 at 12:42 PM, Greg Von Kuster g...@bx.psu.edu wrote:

 Hi Kyle,

 I'm hoping I can help you a bit on this, although i am not very familiar
 with the code that is producing this behavior.  Your previous reply
 mentions the following:

 During job cleanup,
 galaxy.jobs.__init__.py:412, because
 external_metadata_set_successfully returns false.
 An external set_metadata.sh job was run, but it doesn't seem to call
 samtools. Maybe if I figure out why set_metadata.sh isn't working,
 this problem will go away.

 Based on your comments, there are a few things you can do:

 1. If setting external metadata results in an error, the error should be
 printed out in your paster log.  Do you see anything relevant there?

 2. You also may be able to discover the error if you perform the following
 sql manually - make sure your have the correct job_id:

 select filename_results_code from job_external_output_metadata where
 job_id = job_id;

 3. Make sure you have the following config setting uncommented and set to
 False in your universe_wsgi.ini (the default is set to True):

 # Although it is fairly reliable, setting metadata can occasionally fail.
 In
 # these instances, you can choose to retry setting it internally or leave
 it in
 # a failed state (since retrying internally may cause the Galaxy process
 to be
 # unresponsive).  If this option is set to False, the user will be given
 the
 # option to retry externally, or set metadata manually (when possible).
 retry_metadata_internally = False

 Let me know if any of this helps you resolve the problem, and if not,
 we'll figure out next steps if possible.

 Thanks,

 Greg Von Kuster


 On Jan 24, 2013, at 4:36 PM, Kyle Ellrott wrote:

 I'm willing to put in the coding time, but I'd need some pointers on the
 best way to go about making the changes.

 Kyle


 On Wed, Jan 23, 2013 at 6:35 PM, Anthonius deBoer thondeb...@me.comwrote:

 I also second this request to get it addressed (Where can we vote on bug
 fixes ?! :) ...It is very weird that samtools is run on the local machine
 and it even does the indexing sequentially...
 Thon


 On Jan 23, 2013, at 03:28 PM, Kyle Ellrott kellr...@soe.ucsc.edu wrote:

 I'm currently in the process of loading (path paste) a large library of
 BAM files (1) into the shared Data Libraries of our local galaxy
 installation, but I'm finding this process to be very slow.
 I'm doing a path paste, and not actually copying the files. I have
 disabled local running of 'upload1', so that it will run on the cluster,
 and set 'set_metadata_externally' to true.
 It looks like the job handlers are calling 'samtools index' directly.
 Looking through the code, that seems to happen in galaxy/datatypes/binary
 in Bam.dataset_content_needs_grooming, where it calls 'samtools index' and
 then waits.
 What would be the most efficient way to start changing the code so that
 this process can be done by an external script, at a deferred time out on
 the cluster?

 Kyle
 ___
 Please keep all replies on the list by using reply all
 in your mail client. To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/



___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Transfer Manager

2013-01-09 Thread Kyle Ellrott
It sounds like the deferred job plugin will work as a mechanism to manage
the jobs, once there is a tool that communicate to two different galaxy
instances via the API and order them to start transferring/loading files. I
think the tools '__EXPORT_HISTORY__' and '__IMPORT_HISTORY__' (under
lib/galaxy/tools/imp_exp/), but with some options to select subsets of
elements from a history, would be useful in the case. Then I just need to
command them via the API.

But there still may be a problem with mapping IDs between the systems...

Kyle



On Wed, Jan 9, 2013 at 9:48 AM, Nate Coraor n...@bx.psu.edu wrote:

 On Jan 7, 2013, at 1:52 PM, Kyle Ellrott wrote:

  I'm trying to figure out if I can do this all through the API (so I can
 skip setting up FTP servers and sharing database servers).
  I can scan one system, and initiate downloads on the destination system
 (using upload1). So as far as moving files from one machine to another, it
 should be fine. I could push all the data, with correct names, annotations
 and tags.
  But then it becomes a matter of the pushing the metadata onto the object
 on the destination system. But there are two problems,
  1) No way to push tool info and input parameter data, everything would
 be a product of 'upload1'.
  2) No global IDs. If I try to sync a second time, it may be difficult to
 figure out which elements have been previously pushed. Comparing names
 could lead to problems...

 Hi Kyle,

 The transfer manager was designed as a very general method of downloading
 data to a temporary location in a Galaxy-independent-but-manageable
 fashion.  To make it more useful, it's designed to be used with deferred
 jobs.  The idea is that you write a deferred job plugin that can be told
 when to initiate a task (like adding things to the transfer manager),
 create deferred jobs that depend on those transfers, and then take certain
 actions that you specify once those transfers have completed.  There's no
 documentation, but if you have a look at the existing plugins in
 lib/galaxy/jobs/deferred, that should give you an idea of how it works.

 --nate

 
  Kyle
 
 
 
  On Fri, Jan 4, 2013 at 4:54 AM, Rémy Dernat remy...@gmail.com wrote:
  Hi,
 
  A good practise is to create a ftp server
 http://wiki.galaxyproject.org/Admin/Config/Upload%20via%20FTP and use a
 tool to send / retrieve informations to this ftp server :
  http://toolshed.g2.bx.psu.edu/
  - Data Source / data_nfs
 
  Then export your ftp directory by NFS to your galaxy installations.
 
  For databases, it is a little bit complex. If you have a database
 server, you could share access to a single database, but your installation
 between your server should be the same, and all working directories must be
 shared with NFS on all galaxy servers...
 
  Regards
 
 
  2012/12/6 Kyle Ellrott kellr...@soe.ucsc.edu
  Is there any documentation on the transfer manager?
  Is this a mechanism that I could use to synchronize data libraries
 between two different Galaxy installations?
 
  Kyle
 
  ___
  Please keep all replies on the list by using reply all
  in your mail client.  To manage your subscriptions to this
  and other Galaxy lists, please use the interface at:
 
http://lists.bx.psu.edu/
 
 
  ___
  Please keep all replies on the list by using reply all
  in your mail client.  To manage your subscriptions to this
  and other Galaxy lists, please use the interface at:
 
   http://lists.bx.psu.edu/


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] DRMAA runner weirdness

2013-01-08 Thread Kyle Ellrott
I'm running a test Galaxy system on a cluster (merged galaxy-dist on
Janurary 4th). And I've noticed some odd behavior from the DRMAA job
runner.
I'm running a multithread system, one web server, one job_manager, and
three job_handlers. DRMAA is the default job runner (the command for
tophat2 is drmaa://-V -l mem_total=7G -pe smp 2/), with SGE 6.2u5 being the
engine underneath.

My test involves trying to run three different Tophat2 jobs. The first two
seem to start up (and get put on the SGE queue), but the third stays grey,
with the job manager listing it in state 'new' with command line 'None'. It
doesn't seem to leave this state. Both of the jobs that actually got onto
the queue die (reasons unknown, but much to early, probably some
tophat/bowtie problem), but one job is listed in error state with stderr as
'Job output not returned from cluster', while the other job (which is no
longer in the SGE queue) is still listed as running.

Any ideas?

Kyle
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Uploading large files to history through API (some random thoughts)

2013-01-07 Thread Kyle Ellrott
The pull request for this patch is still on the queue. Is anything
happening with this?

Kyle


On Tue, Nov 27, 2012 at 8:56 AM, John Chilton chil...@msi.umn.edu wrote:

 I went down something of a rabbit hole last night, I thought that
 uploading files through the API using multipart/form-data worked. It
 is only at the very end of my adventure that I realized it only works
 for libraries uploads. I think it would be great to get that working
 to histories as well. I cannot spend much more time on this right now
 (I thought I only needed to spend an hour to implement the client side
 stuff), but I thought I would post the progress I made here in case
 someone wants to take up the fight someday:

 The first thing is that when issuing multipart/form-data requests,
 inputs are not currently being deserialized from JSON (so for instance
 in the tools api code the inputs variable would be the string
 containing the json, not the python dictionary).

 To bring multipart/form-data requests in line with other content
 requests I modified this in ilb/galaxy/web/framework/__init__.py

 payload = kwargs.copy()
 named_args, _, _, _ = inspect.getargspec(func)
 for arg in named_args:
 payload.pop(arg, None)

 with this:

 payload = kwargs.copy()
 named_args, _, _, _ = inspect.getargspec(func)
 for arg in named_args:
 payload.pop(arg, None)
 for k, v in payload.iteritems():
 if isinstance(v, (str, unicode)):
 try:
 payload[k] = simplejson.loads(v)
 except:
 # may not actually be json, just continue
 pass
 payload =
 util.recursively_stringify_dictionary_keys( payload )

 One could also imagine doing this replacement in the tools api
 controller directly on payload['input'] instead.

 After that change, files still weren't being matched up with inputs
 properly:

 This debug statement I added to upload_common.py, demonstrates that
 file_data is None.

 galaxy.tools.actions.upload_common INFO 2012-11-27 00:22:17,585
 Uploaded datasets is {'NAME': u'galxtest694734465387762969.txt',
 'file_data': None, 'space_to_tab': None, 'url_paste': None,
 '__index__': 0, 'ftp_files': None}

 To address this, the files_*|file_data parameters need to be moved
 inside of the inputs dict.

 That means in lib/galaxy/webapps/galaxy/api/tools.py, changing this:

 inputs = payload[ 'inputs' ]

 To this:

 inputs = payload[ 'inputs' ]
 for k, v in payload.iteritems():
 if k.startswith(files_):
 inputs[k] = v

 Then the debug line becomes this:

 galaxy.tools.actions.upload_common INFO 2012-11-27 00:33:15,168
 Uploaded datasets is {'NAME': u'galxtest2484272839208214846.txt',
 'file_data': FieldStorage('files_0|file_data',
 u'galxtest2484272839208214846.txt'), 'space_to_tab': None,
 'url_paste': None, '__index__': 0, 'ftp_files': None}

 Which matches pretty well with the same line coming from a web browser
 request:

 galaxy.tools.actions.upload_common INFO 2012-11-27 00:23:57,590
 Uploaded datasets is {'NAME': u'', 'file_data':
 FieldStorage('files_0|file_data', u'second_step.png'), 'space_to_tab':
 None, 'url_paste': u'', '__index__': 0, 'ftp_files': None}

 These changes aside I still wasn't it sill didn't work, that is why
 this is a rambling e-mail and not a pull request. I got this exception
 http://pastebin.com/hhs1pjtP. I guess something to do with the session
 handling stuff being different between API calls and normal web calls.

 Anyway, I've inspected the actual requests I was generating and I
 think they are reasonable, Galaxy just needs to be augmented to accept
 them :). If someone does land up taking a look at this, I have
 committed my test case to blend4j so it can be used to really quickly
 test such a client request (requires git, Java, and maven):

 % git checkout g...@github.com:jmchilton/blend4j.git
 % cd blend4j
 % mvn test -Dtest=ToolsTest
 -Dtest.galaxy.instance=http://localhost:8080/
 -Dtest.galaxy.key=testapikey

 Thanks,
 -John
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:

   http://lists.bx.psu.edu/

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Transfer Manager

2013-01-07 Thread Kyle Ellrott
I'm trying to figure out if I can do this all through the API (so I can
skip setting up FTP servers and sharing database servers).
I can scan one system, and initiate downloads on the destination system
(using upload1). So as far as moving files from one machine to another, it
should be fine. I could push all the data, with correct names, annotations
and tags.
But then it becomes a matter of the pushing the metadata onto the object on
the destination system. But there are two problems,
1) No way to push tool info and input parameter data, everything would be a
product of 'upload1'.
2) No global IDs. If I try to sync a second time, it may be difficult to
figure out which elements have been previously pushed. Comparing names
could lead to problems...

Kyle



On Fri, Jan 4, 2013 at 4:54 AM, Rémy Dernat remy...@gmail.com wrote:

 Hi,

 A good practise is to create a ftp server
 http://wiki.galaxyproject.org/Admin/Config/Upload%20via%20FTP and use a
 tool to send / retrieve informations to this ftp server :
 http://toolshed.g2.bx.psu.edu/
 - Data Source / data_nfs

 Then export your ftp directory by NFS to your galaxy installations.

 For databases, it is a little bit complex. If you have a database server,
 you could share access to a single database, but your installation between
 your server should be the same, and all working directories must be shared
 with NFS on all galaxy servers...

 Regards


 2012/12/6 Kyle Ellrott kellr...@soe.ucsc.edu

 Is there any documentation on the transfer manager?
 Is this a mechanism that I could use to synchronize data libraries
 between two different Galaxy installations?

 Kyle

 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:

   http://lists.bx.psu.edu/



___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] File import recovery

2013-01-07 Thread Kyle Ellrott
Editing the DB and manually fixing all the errors in the UI doesn't seem
like a long term solution.
Is there a way to do this through the API? If not, where should I start
working to add it in?

Kyle


On Wed, Jan 2, 2013 at 7:41 AM, Nate Coraor n...@bx.psu.edu wrote:

 On Dec 31, 2012, at 8:18 PM, Kyle Ellrott wrote:

  I'm currently adding a large number of files into my Galaxy instance's
 dataset library. During the import some of the files (a small percentage)
 failed with:
 
  /inside/depot4/galaxy/set_metadata.sh: line 4: 14790 Segmentation fault
  (core dumped) python ./scripts/set_metadata.py $@
 
  I think it's probably standard cluster shenanigans, and may work just
 fine if run again. But there doesn't seem to be a way retry. Is there a way
 to deal with this that is easier than manually deleting and re-uploading
 the offending files?

 Hi Kyle,

 Unfortunately, there's not going to be a way to do this entirely in the
 UI.  Your best shot is to change the state of the datasets in the database
 from 'error' to 'ok' and then try using the metadata auto-detect button in
 the UI.

 --nate

 
  Kyle
  ___
  Please keep all replies on the list by using reply all
  in your mail client.  To manage your subscriptions to this
  and other Galaxy lists, please use the interface at:
 
   http://lists.bx.psu.edu/


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] Toolshed install 404

2012-12-16 Thread Kyle Ellrott
Is anybody else getting 404 errors during toolshed installs? My instance is
trying to access 
http://toolshed.g2.bx.psu.edu/repository/get_readme_files?name=synapse_interfaceowner=kellrottchangeset_revision=2925d82b84fc;
and gets a 404.

Kyle
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] Transfer Manager

2012-12-05 Thread Kyle Ellrott
Is there any documentation on the transfer manager?
Is this a mechanism that I could use to synchronize data libraries between
two different Galaxy installations?

Kyle
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Tags API

2012-11-12 Thread Kyle Ellrott
While we're talking about the tags API, I'd also like to see if we can
think about the annotations API.
Can it be formulated in the same manner as the proposal for the tags api?

Kyle

On Mon, Nov 12, 2012 at 7:49 AM, James Taylor ja...@jamestaylor.org wrote:
 On Mon, Nov 12, 2012 at 9:43 AM, John Chilton chil...@msi.umn.edu wrote:
 From IRC (weeks ago):
 (03:15:01 PM) jmchilton: Ideally, what would the API uri for assigning
 a tag to a histories dataset be? POST to
 api/tags/tag_name/item_class(e.g.
 HistoryDatasetAssociation)/encoded_id or POST to
 api/histories/history_id/contents/dataset_id/tags/tag_name or

 I personally prefer this one. Any object that supports tags gets a /tags/...

 (03:16:56 PM) jmchilton: How about grabbing the unique tags in a
 history, what should that API call look like?

 Just a GET to .../entity_id/tags ?
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:

   http://lists.bx.psu.edu/
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-dev] Tags API

2012-11-09 Thread Kyle Ellrott
Are there any docs or examples about how to add/remove tags from items
via the galaxy web-api?

Thanks,
Kyle Ellrott
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Set metadata from within the tool

2012-08-13 Thread Kyle Ellrott
I was wondering if this question had been answered.
I'd like to be able to set metadata from within a tool as well.

Kyle

On Tue, Jul 17, 2012 at 10:01 AM,  roberto.dila...@uniparthenope.it wrote:
 Hi everyone,
 How can I set dataset's metadata from within the tool?
 For example: From the command tag (in the xml tool specification) i call
 myscript.sh:
 command
 myscript.sh -input=$input -output=$output
  /command
 My script.sh produce a value that i want to store as metadata of $input.

 I've checked the tool config syntax
 http://wiki.g2.bx.psu.edu/Admin/Tools/Tool%20Config%20Syntax and seems that
 i can read metadata (${input.metadata.somemetadata}) but not write.
 Maybe code tag can help me but it is deprecated.
 Thanks for the help,
  Roberto

 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/