Hello Nate,

I had that parameter set to 1, but I up'd it to 5.  I also added -noac to the nfs mounts for /nextgen3

That appears to have fixed it.

Thank you!!!


On 3/6/14, 1:57 PM, Nate Coraor wrote:
Hi Pete,

I'd suggest setting retry_job_output_collection > 0 in universe_wsgi.ini. This is usually a symptom of attribute caching on network filesystems.

--nate


On Wed, Mar 5, 2014 at 8:06 PM, Pete Schmitt <peter.r.schm...@dartmouth.edu> wrote:


In trying something simple, using galaxy I downloaded data from USCS main.   The data gets downloaded but the job errors out.   I verified that the job actually ran, and completed successfully according to the scheduler but  I get errors like this:

galaxy.jobs.runners.drmaa DEBUG 2014-03-05 18:17:35,941 (624/46.dirigo.mdibl.org) state change: job finished normally
galaxy.jobs.runners ERROR 2014-03-05 18:17:36,060 (624/46.dirigo.mdibl.org) Job output not returned from cluster: [Errno 2] No such file or directory: '/nextgen3/galaxy/galaxy-dist/database/job_working_directory/000/624/galaxy_624.o'

There are no directories being created below the 000 directory.   I verified that the directory tree is owned by galaxy and that the galaxy user can run jobs from the command line as a normal user.

I set the parameter "cleanup_job = never".  It was set to "always" which is probably why the files were never there.  Now the files are there, including the galaxy_###.o file but galaxy still errors like above.

I had set the parameter "cluster_files_directory = database/pbs", but that doesn't seem to work any longer.  The .o and .e files used to end up there.

Here is an example:

(galaxyvenv)[galaxy@dirigo 630]$ ll
total 16
-rw------- 1 galaxy galaxy    0 Mar  5 19:29 galaxy_630.e
-rw-rw-r-- 1 galaxy galaxy    2 Mar  5 19:29 galaxy_630.ec
-rw------- 1 galaxy galaxy  940 Mar  5 19:29 galaxy_630.o
-rwxr-xr-x 1 galaxy galaxy 2429 Mar  5 19:29 galaxy_630.sh
-rw-rw-r-- 1 galaxy galaxy  138 Mar  5 19:29 galaxy.json
-rw-rw-r-- 1 galaxy galaxy 2139 Mar  5 19:29 metadata_in_HistoryDatasetAssociation_1182_o830e3
-rw-rw-r-- 1 galaxy galaxy   20 Mar  5 19:29 metadata_kwds_HistoryDatasetAssociation_1182_hOhPp7
-rw-rw-r-- 1 galaxy galaxy   55 Mar  5 19:29 metadata_out_HistoryDatasetAssociation_1182_Ynb70M
-rw-rw-r-- 1 galaxy galaxy    2 Mar  5 19:29 metadata_override_HistoryDatasetAssociation_1182_HsMljG
-rw-rw-r-- 1 galaxy galaxy   44 Mar  5 19:29 metadata_results_HistoryDatasetAssociation_1182_LxdsAZ
(galaxyvenv)[galaxy@dirigo 630]$ pwd
/nextgen3/galaxy/galaxy-dist/database/job_working_directory/000/630

Here is the error from this:

galaxy.jobs.runners.drmaa DEBUG 2014-03-05 19:31:37,731 (630/51.dirigo.mdibl.org) state change: job is running
galaxy.jobs.runners.drmaa DEBUG 2014-03-05 19:31:49,119 (630/51.dirigo.mdibl.org) state change: job finished normally
galaxy.jobs.runners ERROR 2014-03-05 19:31:50,225 (630/51.dirigo.mdibl.org) Job output not returned from cluster: [Errno 2] No such file or directory: '/nextgen3/galaxy/galaxy-dist/database/job_working_directory/000/630/galaxy_630.o'
galaxy.jobs DEBUG 2014-03-05 19:31:50,252 finish(): Moved /nextgen3/galaxy/galaxy-dist/database/job_working_directory/000/630/galaxy_dataset_856.dat to /nextgen3/galaxy/galaxy-dist/database/files/000/dataset_856.dat
galaxy.jobs DEBUG 2014-03-05 19:31:50,351 job 630 ended

On the galaxy page in the history you get in pink:
1 UCSC Main on Human: knownGene (chr22:1-51304566)
error
An error occurred with this dataset:
Job output not returned from cluster

But the dataset is there.

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to