Re: [galaxy-dev] Downloading UCSC complete database

2012-10-07 Thread Sean Davis
On Sun, Oct 7, 2012 at 10:38 AM, Perez, Ricardo  wrote:
> Dear all,
>
> I am currently working on downloading the genome data from the UCSC database.
> I have figured out how to obtain the genome of one species at a time, however 
> this would take a bit of time if I have to type every command by hand.
> Is there any command that would download the all the data from the UCSC 
> databases?
> If not, how would I go to start in writing a script that would do so.

We mirror directly out of the mysql data directory.  In this script,
the /var/local/mysql directory is where the actual server files are
kept.

https://gist.github.com/3848717

Note, this does not download the .txt and .sql files.  Instead, it is
reading and writing the mysql files directly and may break if the
server versions are too dissimilar.  Also, be sure that you test this
a bit before trying it on your production database to make sure that
it is working as expected.

Sean
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Automatic citation list from a tool, workflow, or history

2011-12-15 Thread Sean Davis
On Thu, Dec 15, 2011 at 6:16 AM, Peter Cock wrote:

> Dear all,
>
> It has become a convention that each tool/wrapper in
> Galaxy includes citation instructions in their help text
> (although not all the tools do this - I think they should).
>
> It occurred to me this could be formalised, with explicit
> markup in the tool XML file, embedding the citation
> (at very least with an identifier like the DOI or ISBN,
> there is probably a good existing XML standard
> that could be followed).
>
> Then, Galaxy would be able to automatically pull out
> a list of citations the tool authors have requested be
> cited, removing duplicates (e.g. matching DOI), from
> a history or a workflow.
>
> The aim of this is (a) to make it easier to write up your
> methods by supplying all the references, and (b) to help
> ensure tool authors get the acknowledgement they
> deserve.
>
> Does this sound like a good idea?
>
>
Hi, Peter.

I think this can be useful for tool authors, developers, and users.  As for
markup, bibtex has low barriers to entry, is a stable format, and could
easily be included in a  tag as text and used semantically when
available.

Sean
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Staged Method for cluster running SGE?

2011-04-26 Thread Sean Davis
On Tue, Apr 26, 2011 at 5:11 AM, Peter Cock  wrote:
> Hi all,
>
> So far we've been running our local Galaxy instance on
> a single machine, but I would like to be able to offload
> (some) jobs onto our local SGE cluster. I've been reading
> https://bitbucket.org/galaxy/galaxy-central/wiki/Config/Cluster
>
> Unfortunately in our setup the SGE cluster head node is
> a different machine to the Galaxy server, and they do not
> (currently) have a shared file system. Once on the cluster,
> the head node and the compute nodes do have a shared
> file system.
>
> Therefore we will need some way of copying input data
> from the Galaxy server to the cluster, running the job,
> and once the job is done, copying the results back to the
> Galaxy server.
>
> The "Staged Method" on the wiki sounds relevant, but
> appears to be for TORQUE only (via pbs_python), not
> any of the other back ends (via DRMAA).
>
> Have I overlooked anything on the "Cluster" wiki page?
>
> Has anyone attempted anything similar, and could you
> offer any guidance or tips?

Hi, Peter.

You might consider setting up a separate queue for SGE jobs.  Then,
you could specify a prolog and epilog script that will copy files from
the galaxy machine into the cluster (in the prolog) and back to galaxy
(in the epilog).  This assumes that there is a way to map from one
file system to the other, but for Galaxy, that is probably the case
(galaxy files on the galaxy server are "under" the galaxy instance and
galaxy files on the cluster will probably be run as a single user in
that home directory).  I have not done this myself, but the advantage
to using prolog and epilog scripts is that galaxy jobs then do not
need any special configuration--all the work is done transparently by
SGE.

Sean
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] postgresql to galaxy

2011-04-12 Thread Sean Davis
Hi, Hari.

You should probably make sure that you can connect to postgres from
the command line before trying to connect using galaxy.  In
particular, it looks like you need to set up this file correctly:

http://www.postgresql.org/docs/8.2/static/auth-pg-hba-conf.html

Sean


On Tue, Apr 12, 2011 at 7:50 AM, hari krishna  wrote:
>
> Hi,
>   I am planning to change database from sqlite to postgresql .
>   for this i installed postgresql 8.1.2 and created user and database at
> my home location.
>   from my home i can able to login to that database
>
>
>   psql -d galaxy -U galaxy -h 192.168.65.8
>
> where database and user name as galaxy and hostname
> I modified universal.wsgi.ini file as
> database_connection
> =postgres:///galaxy
> database_engine_option_strategy = threadlocal
> database_engine_option_server_side_cursors = True
> database_engine_option_pool_size = 5
>
> database_engine_option_max_overflow = 10
>
> after these modification when i ran the server am getting error like this:
>
>
> *
> Traceback (most recent call last):
>
>   File
> "/home/gridmon/hari/galaxy_new/galaxy-central/lib/galaxy/web/buildapp.py",
> line 82, in app_factory
> app = UniverseApplication( global_conf = global_conf, **kwargs )
>   File "/home/gridmon/hari/galaxy_new/galaxy-central/lib/galaxy/app.py",
> line 30, in __init__
>
> create_or_verify_database( db_url, self.config.database_engine_options )
>   File
> "/home/gridmon/hari/galaxy_new/galaxy-central/lib/galaxy/model/migrate/check.py",
> line 54, in create_or_verify_database
> dataset_table = Table( "dataset", meta, autoload=True )
>
>   File
> "/home/gridmon/hari/galaxy_new/galaxy-central/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.5.egg/sqlalchemy/schema.py",
> line 108, in __call__
> return type.__call__(self, name, metadata, *args, **kwargs)
>   File
> "/home/gridmon/hari/galaxy_new/galaxy-central/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.5.egg/sqlalchemy/schema.py",
> line 236, in __init__
>
> _bind_or_error(metadata).reflecttable(self,
> include_columns=include_columns)
>   File
> "/home/gridmon/hari/galaxy_new/galaxy-central/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.5.egg/sqlalchemy/engine/base.py",
> line 1261, in reflecttable
>
> conn = self.contextual_connect()
>   File
> "/home/gridmon/hari/galaxy_new/galaxy-central/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.5.egg/sqlalchemy/engine/threadlocal.py",
> line 194, in contextual_connect
> return self.session.get_connection(**kwargs)
>
>   File
> "/home/gridmon/hari/galaxy_new/galaxy-central/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.5.egg/sqlalchemy/engine/threadlocal.py",
> line 20, in get_connection
> return self.engine.TLConnection(self, self.engine.pool.connect(),
> close_with_result=close_with_result)
>
>   File
> "/home/gridmon/hari/galaxy_new/galaxy-central/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.5.egg/sqlalchemy/pool.py",
> line 151, in connect
> agent = _ConnectionFairy(self)
>   File
> "/home/gridmon/hari/galaxy_new/galaxy-central/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.5.egg/sqlalchemy/pool.py",
> line 304, in __init__
>
> rec = self._connection_record = pool.get()
>   File
> "/home/gridmon/hari/galaxy_new/galaxy-central/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.5.egg/sqlalchemy/pool.py",
> line 161, in get
> return self.do_get()
>
>   File
> "/home/gridmon/hari/galaxy_new/galaxy-central/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.5.egg/sqlalchemy/pool.py",
> line 639, in do_get
> con = self.create_connection()
>   File
> "/home/gridmon/hari/galaxy_new/galaxy-central/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.5.egg/sqlalchemy/pool.py",
> line 122, in create_connection
>
> return _ConnectionRecord(self)
>   File
> "/home/gridmon/hari/galaxy_new/galaxy-central/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.5.egg/sqlalchemy/pool.py",
> line 198, in __init__
> self.connection = self.__connect()
>
>   File
> "/home/gridmon/hari/galaxy_new/galaxy-central/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.5.egg/sqlalchemy/pool.py",
> line 261, in __connect
> connection = self.__pool._creator()
>   File
> "/home/gridmon/hari/galaxy_new/galaxy-central/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.5.egg/sqlalchemy/engine/strategies.py",
> line 80, in connect
>
> raise exc.DBAPIError.instance(None, None, e)
> OperationalError: (OperationalError) FATAL:  no pg_hba.conf entry for host
> "[local]", user "galaxy", database "galaxy<", SSL off
> *
>
>
>
> Can any one help me for integrating postgresql to galaxy
> waiting for ur kind reply
>
>
>
>
> --
> Thanks & Regards,
> Hari Krishna .M
>
>

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] PostgreSQL issue: reserved keyword used as identifier

2011-04-08 Thread Sean Davis
2011/4/8 Louise-Amélie Schmitt :
> Hello everyone
>
> I just met a huge problem concerning the database. I'm currently trying
> to transfer my data from MySQL to PostgreSQL by writing a Perl script
> that would do the job.
>
> Here is the issue: In the "form_definition" table, one of the field
> identifiers is "desc", which is a reserved SQL keyword used for ordering
> values. Therefore, There's currently no way of making any query of the
> type "INSERT INTO table_name () VALUES ( list>);" which is a big handicap in this context, since the order of the
> identifiers list we dynamically retrieve is not necessarily (and seldom)
> the right order.
>
> Is there a way to fix this issue without blowing everything up?

You need to quote the identifiers.  A simple example using "desc" as a
column name:

sdavis=# create table test_table(
id int,
desc varchar);
ERROR:  syntax error at or near "desc"
LINE 3: desc varchar);
^
sdavis=# create table test_table(
id int,
"desc" varchar);
CREATE TABLE

Hope that helps.

Sean

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Recommended Specs for Production System

2011-04-08 Thread Sean Davis
On Fri, Apr 8, 2011 at 10:26 AM, Nate Coraor  wrote:
> Assaf Gordon wrote:
>
>> Forgot to mention SGE/PBS: you definitely want to use them (even if you're 
>> using a single machine),
>> because the local job runner doesn't take into account multi-threaded 
>> programs when scheduling jobs.
>> So another core is needed for the SGE scheduler daemons (sge_qmaster and 
>> sge_execd).
>
> I haven't tested, but it's entirely possible that the SGE daemons could
> happily share cores with other processes.  I'd be surprised if they
> spent a whole lot of time on-CPU.

We run SGE for NGS and do not find a need to set aside cores for the
daemons.  That said, if you do have an active cluster (more than a
couple of machines), the SGE master node does benefit from having a
core set aside.

Sean

> A cluster runner is recommended for other reasons, too - restartability
> of the Galaxy process is one of the big ones.
>
> --nate
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>  http://lists.bx.psu.edu/
>

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] SGE and Galaxy (a different approach)

2011-04-05 Thread Sean Davis
On Tue, Apr 5, 2011 at 12:27 PM, andrew stewart
 wrote:
> I'm aware of how to configure Galaxy to use SGE in universe_wsgi.ini,
> however what I want to do is a little different.

Hi, Andrew.  Take a look at this page:

https://bitbucket.org/galaxy/galaxy-central/wiki/Config/Cluster

In particular, does the last section, "Tool Configuration", describe
something like what you want to do?

Sean


> Because I only want
> certain processes to be submitted to the queue, I'd rather control this at
> the tool configuration level (the xml wrapper).  For example:
> 
>     qsub myscript.sh
> 
> This will work, except that the status of the job (in Galaxy) shows as
> completed even though the job has simply been submitted to SGE.  Basically
> Galaxy 'loses track' of the process because the submission process
> (myscript.sh) has completed even if the actual job hasn't.
> Has anyone else tried anything like this before, or have anything helpful to
> suggest?  One thought is to somehow cause the myscript.sh process to pause
> until the SGE job has completed... somehow.
> Any advice appreciated.
> Thanks,
> Andrew

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] [galaxy-user] Filename extension in new tool

2011-02-17 Thread Sean Davis
On Thu, Feb 17, 2011 at 5:48 AM, Peter Cock wrote:

> On Thu, Feb 17, 2011 at 3:00 AM, Sean Davis  wrote:
> > I have a tool that takes a pdb file as input.  The authors of the
> *compiled*
> > code require that the suffix be either ".pdb" or ".ent".  When I upload a
> > .pdb file, the filename that gets fed to the tool now ends in .dat.  What
> is
> > the best way to get the original file extension stored in the file
> database?
> >
> > Thanks,
> > Sean
>
> Once in Galaxy all the data files have the extension .dat on disk, so
> I would try using a wrapper script that creates a symbolic link from the
> input.dat file to something like input.pdb or input.ent (and if that
> doesn't
> work, copy the file) before running the compiled code and then remove
> it afterwards.
>
>
Hi, Peter.  I ended up doing just that.  The hack in all its messiness is
here:

https://gist.github.com/831017



> Separately from this, you may need to extend Galaxy to define pdb
> as a new file format (ideally with a data type sniffer).
>
> This kind of question is better asked on the dev list (CC'dd)
>
>
Thanks.  That is the next step.

Sean
___
To manage your subscriptions to this and other Galaxy lists, please use the 
interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] [galaxy-user] Filename extension in new tool

2011-02-17 Thread Sean Davis
On Thu, Feb 17, 2011 at 8:07 AM, Peter Cock wrote:

> On Thu, Feb 17, 2011 at 12:37 PM, Sean Davis wrote:
> >
> > On Thu, Feb 17, 2011 at 5:48 AM, Peter wrote:
> >>
> >> Once in Galaxy all the data files have the extension .dat on disk, so
> >> I would try using a wrapper script that creates a symbolic link from the
> >> input.dat file to something like input.pdb or input.ent (and if that
> >> doesn't
> >> work, copy the file) before running the compiled code and then remove
> >> it afterwards.
> >>
> >
> > Hi, Peter.  I ended up doing just that.  The hack in all its messiness is
> > here:
> > https://gist.github.com/831017
>
> I would be wary of using ${input.name} like that - test with things
> like renaming the dataset in Galaxy, and pasting in a PBD file
> rather than uploading one. Also I suspect you can get filenames
> with spaces in them which will probably cause trouble. You'll
> notice that Galaxy generates its own *.dat filename which avoid
> spaces.
>
> Personally I would generate the *.pdb or *.ent filename within
> the wrapper script based on the input file name (*.dat). Try:
>
>
Unfortunately, the command-line executable assumes that the filename
contains the ID of the PDB record, so I actually need this right now.  I'm
going to have a chat with the command-line tool developer about designing a
more robust interface.



> os.symlink(fname,fname+".pdb")
> ...
> symdcmd = "SymD %s.pdb" % fname
>
>
> >>
> >> Separately from this, you may need to extend Galaxy to define pdb
> >> as a new file format (ideally with a data type sniffer).
> >>
> >> This kind of question is better asked on the dev list (CC'dd)
> >>
> >
> > Thanks.  That is the next step.
>
> I haven't done this myself yet (but I may well need to before long).
>
>
I extended based on filename extension and added the datatype to data.py.
 This works like a charm, but it isn't foolproof, obviously (no sniffer
yet).  The PDB format isn't too complicated, but it is flexible, so I need
to find out exactly what is required as opposed to "possible".  I see that
biopython has a class and parser for it, so I might be able to use that
rather directly.

Sean
___
To manage your subscriptions to this and other Galaxy lists, please use the 
interface at:

  http://lists.bx.psu.edu/