Re: [galaxy-dev] Some issues with job submission to pulsar

2015-10-07 Thread Raj Ayyampalayam

Hi John,

Thanks for your comments. Upload was just a tool that I picked to test 
my initial install, I hadn't installed any tools into the galaxy 
instance at that time. I set it up to use a shared directory and it is 
working fine now.


One of the goals for this second Galaxy instance at UGA is to isolate 
the Galaxy installation directory from the shared (cluster wide) 
storage. Whenever we have a shared storage issue the Galaxy web 
interface seems to slowdown considerably.


The operating system that is running Galaxy is the latest Ubuntu LTS 
release (Ubuntu seems to be a lot easier to deal with than CentOS 5.x), 
our cluster is running CentOS 5.x. I am in the process of figuring out 
how to install tools and their tool dependencies automatically so that 
the dependencies are compiled on the CentOS machine rather than the 
Ubuntu machine. Any suggestions?


Thanks,
-Raj


On 09/25/2015 01:15 PM, John Chilton wrote:

Depending on how you set things up - either Galaxy, Nginx, or Apache
are creating a file for the upload that is incoming - in the above
case I imagine it is this file -
/home/galaxy/wkdir/galaxy/database/tmp/tmpp6j83l. This file is outside
of Galaxy's data model for tools and jobs - it is just a free floating
file in some ways - so the Pulsar client doesn't know it needs to
transfer it or how to modify the job description to handle it. It is
kind of a very special case for data source tools.

Stepping back a minute, the upload tool is an odd thing to run with
pulsar, it would basically need to send a file to pulsar server,
pulsar would run a very lightweight script that wouldn't modify the
file at all, and then send the same file back to Galaxy with some
metadata. It will really slow down uploads to do them this way I think
- if at all possibly you should try to run this tool closer to Galaxy
or at least not transfer the file (if it is on a shared directory or
something).

Hacking up Galaxy to find these uploaded files would be possible and I
think I would merge a PR to implement that, but it is such a
particular use case (upload shouldn't probably be running on disk that
isn't shared with Galaxy) that I don't think it is worth the effort of
implementing this right now.

Is this fair? Can you find some shared disk to use for these paths -
/home/galaxy/wkdir/galaxy/database/tmp/tmpp6j83l.

-John

On Thu, Sep 24, 2015 at 9:42 PM, Raj Ayyampalayam  wrote:

Hello,

I am setting up a new instance of Galaxy (based on the latest dev branch on
github repo) configured to submit the jobs to a remote cluster via pulsar.
I used the usegalaxy playbook to get the configuration parameters for the
galaxy job_conf.xml and pulsar app.yml files.
I was partially successful in running the jobs via pulsar, the regular tools
(analysis tools, etc..) are running OK on the remote cluster.
I am having trouble getting the upload tool to run on the the remote
cluster.

Here is the error I see in the galaxy error report for the job:
It looks like the upload tool running on the remote machine is trying to
access a file in the database/tmp area of the galaxy server machine.

Traceback (most recent call last):
   File
"/escratch4/apps/galaxy_scratch/pulsar/files/staging/46/tool_files/upload.py",
line 430, in 
 __main__()
   File
"/escratch4/apps/galaxy_scratch/pulsar/files/staging/46/tool_files/upload.py",
line 405, in __main__
 registry.load_datatypes( root_dir=sys.argv[1], config=sys.argv[2] )
   File
"/panfs/pstor.storage/home/qbcglab/galaxy_run/galaxy/lib/galaxy/datatypes/registry.py",
line 94, in load_datatypes
 tree = galaxy.util.parse_xml( config )
   File
"/panfs/pstor.storage/home/qbcglab/galaxy_run/galaxy/lib/galaxy/util/__init__.py",
line 178, in parse_xml
 root = tree.parse( fname, parser=ElementTree.XMLParser(
target=DoctypeSafeCallbackTarget() ) )
   File "/usr/local/python/2.7.8/lib/python2.7/xml/etree/ElementTree.py",
line 647, in parse
 source = open(source, "rb")
IOError: [Errno 2] No such file or directory:
'/home/galaxy/wkdir/galaxy/database/tmp/tmpp6j83l'


I've created a Gist with the contents of job_conf.xml, app.yml, file action
file, error content athttps://gist.github.com/raj76/7be86f6a56714deef050  .

Any suggestions on how to debug this issue?

Thanks,
-Raj








___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
   https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
 

Re: [galaxy-dev] Some issues with job submission to pulsar

2015-09-25 Thread John Chilton
Depending on how you set things up - either Galaxy, Nginx, or Apache
are creating a file for the upload that is incoming - in the above
case I imagine it is this file -
/home/galaxy/wkdir/galaxy/database/tmp/tmpp6j83l. This file is outside
of Galaxy's data model for tools and jobs - it is just a free floating
file in some ways - so the Pulsar client doesn't know it needs to
transfer it or how to modify the job description to handle it. It is
kind of a very special case for data source tools.

Stepping back a minute, the upload tool is an odd thing to run with
pulsar, it would basically need to send a file to pulsar server,
pulsar would run a very lightweight script that wouldn't modify the
file at all, and then send the same file back to Galaxy with some
metadata. It will really slow down uploads to do them this way I think
- if at all possibly you should try to run this tool closer to Galaxy
or at least not transfer the file (if it is on a shared directory or
something).

Hacking up Galaxy to find these uploaded files would be possible and I
think I would merge a PR to implement that, but it is such a
particular use case (upload shouldn't probably be running on disk that
isn't shared with Galaxy) that I don't think it is worth the effort of
implementing this right now.

Is this fair? Can you find some shared disk to use for these paths -
/home/galaxy/wkdir/galaxy/database/tmp/tmpp6j83l.

-John

On Thu, Sep 24, 2015 at 9:42 PM, Raj Ayyampalayam  wrote:
> Hello,
>
> I am setting up a new instance of Galaxy (based on the latest dev branch on
> github repo) configured to submit the jobs to a remote cluster via pulsar.
> I used the usegalaxy playbook to get the configuration parameters for the
> galaxy job_conf.xml and pulsar app.yml files.
> I was partially successful in running the jobs via pulsar, the regular tools
> (analysis tools, etc..) are running OK on the remote cluster.
> I am having trouble getting the upload tool to run on the the remote
> cluster.
>
> Here is the error I see in the galaxy error report for the job:
> It looks like the upload tool running on the remote machine is trying to
> access a file in the database/tmp area of the galaxy server machine.
>
> Traceback (most recent call last):
>   File
> "/escratch4/apps/galaxy_scratch/pulsar/files/staging/46/tool_files/upload.py",
> line 430, in 
> __main__()
>   File
> "/escratch4/apps/galaxy_scratch/pulsar/files/staging/46/tool_files/upload.py",
> line 405, in __main__
> registry.load_datatypes( root_dir=sys.argv[1], config=sys.argv[2] )
>   File
> "/panfs/pstor.storage/home/qbcglab/galaxy_run/galaxy/lib/galaxy/datatypes/registry.py",
> line 94, in load_datatypes
> tree = galaxy.util.parse_xml( config )
>   File
> "/panfs/pstor.storage/home/qbcglab/galaxy_run/galaxy/lib/galaxy/util/__init__.py",
> line 178, in parse_xml
> root = tree.parse( fname, parser=ElementTree.XMLParser(
> target=DoctypeSafeCallbackTarget() ) )
>   File "/usr/local/python/2.7.8/lib/python2.7/xml/etree/ElementTree.py",
> line 647, in parse
> source = open(source, "rb")
> IOError: [Errno 2] No such file or directory:
> '/home/galaxy/wkdir/galaxy/database/tmp/tmpp6j83l'
>
>
> I've created a Gist with the contents of job_conf.xml, app.yml, file action
> file, error content at https://gist.github.com/raj76/7be86f6a56714deef050 .
>
> Any suggestions on how to debug this issue?
>
> Thanks,
> -Raj
>
>
>
>
>
>
>
>
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>   https://lists.galaxyproject.org/
>
> To search Galaxy mailing lists use the unified search at:
>   http://galaxyproject.org/search/mailinglists/
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

[galaxy-dev] Some issues with job submission to pulsar

2015-09-24 Thread Raj Ayyampalayam

Hello,

I am setting up a new instance of Galaxy (based on the latest dev branch 
on github repo) configured to submit the jobs to a remote cluster via 
pulsar.
I used the usegalaxy playbook to get the configuration parameters for 
the galaxy job_conf.xml and pulsar app.yml files.
I was partially successful in running the jobs via pulsar, the regular 
tools (analysis tools, etc..) are running OK on the remote cluster.
I am having trouble getting the upload tool to run on the the remote 
cluster.


Here is the error I see in the galaxy error report for the job:
It looks like the upload tool running on the remote machine is trying to 
access a file in the database/tmp area of the galaxy server machine.


Traceback (most recent call last):
  File "/escratch4/apps/galaxy_scratch/pulsar/files/staging/46/tool_files/upload.py", 
line 430, in 
__main__()
  File 
"/escratch4/apps/galaxy_scratch/pulsar/files/staging/46/tool_files/upload.py", 
line 405, in __main__
registry.load_datatypes( root_dir=sys.argv[1], config=sys.argv[2] )
  File 
"/panfs/pstor.storage/home/qbcglab/galaxy_run/galaxy/lib/galaxy/datatypes/registry.py",
 line 94, in load_datatypes
tree = galaxy.util.parse_xml( config )
  File 
"/panfs/pstor.storage/home/qbcglab/galaxy_run/galaxy/lib/galaxy/util/__init__.py",
 line 178, in parse_xml
root = tree.parse( fname, parser=ElementTree.XMLParser( 
target=DoctypeSafeCallbackTarget() ) )
  File "/usr/local/python/2.7.8/lib/python2.7/xml/etree/ElementTree.py", line 
647, in parse
source = open(source, "rb")
IOError: [Errno 2] No such file or directory: 
'/home/galaxy/wkdir/galaxy/database/tmp/tmpp6j83l'


I've created a Gist with the contents of job_conf.xml, app.yml, file 
action file, error content at 
https://gist.github.com/raj76/7be86f6a56714deef050 .


Any suggestions on how to debug this issue?

Thanks,
-Raj







___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/