This bug irritated me, so I fixed it. Essentially add_file() in upload.py is
not in on the joke that local dirs are relative paths and need the absolute
path tacked onto it.
Is there a written process on how to submit the fix? I could not find it.
Thanks,
Ted
diff -r 21b645303c02 tools/data_source/upload.py
--- a/tools/data_source/upload.py Thu Dec 22 13:54:33 2011 -0500
+++ b/tools/data_source/upload.py Sat Dec 31 15:29:45 2011 -0800
@@ -74,7 +74,10 @@
id, files_path, path = arg.split( ':', 2 )
rval[int( id )] = ( path, files_path )
return rval
-def add_file( dataset, registry, json_file, output_path ):
+
+import pdb
+
+def add_file( dataset, registry, json_file, output_path, root_dir):
data_type = None
line_count = None
converted_path = None
@@ -94,7 +97,10 @@
file_err( 'Unable to fetch %s\n%s' % ( dataset.path, str( e ) ),
dataset, json_file )
return
dataset.path = temp_name
-# See if we have an empty file
+
+if dataset.type == 'server_dir' and not os.path.isabs( dataset.path):
+ dataset.path = os.path.join( root_dir, dataset.path )
+
if not os.path.exists( dataset.path ):
file_err( 'Uploaded temporary file (%s) does not exist.' %
dataset.path, dataset, json_file )
return
@@ -384,7 +390,7 @@
files_path = output_paths[int( dataset.dataset_id )][1]
add_composite_file( dataset, registry, json_file, output_path,
files_path )
else:
-add_file( dataset, registry, json_file, output_path )
+add_file( dataset, registry, json_file, output_path , sys.argv[1])
# clean up paramfile
try:
os.remove( sys.argv[3] )
[ted@tap galaxy-central]$ !v
On Dec 29, 2011, at 1:22 AM, Ted Goldstein wrote:
Hi there,
Here are three interrelated issues.
I am trying to use Galaxy with some large cancer genomic datasets here at
UCSC and do some systems biology.I have petabyte size dataset data
libraries which will constantly be in flux at the edges. I would prefer to
just have the Galaxy read the metadata from the file system for large
datasets without using the database. Is there a convenient api boundary to
write an adapter to the dataset object interface?
In the meantime, I am going to try to just import day using the link. Its
great that this feature is in already When I import into a couple of a modest
megabyte size dataset using Link to files without copying to Galaxy
option, the status never changes from queued. Is this a bug? Is there a
known work around? I have many large datasets.
Also, it takes a long time to expand the dataset name link. (My experiment
on import is a data tree of about a thousand files). Is this a known bug?
Thanks!
Ted
___
Please keep all replies on the list by using reply all
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
http://lists.bx.psu.edu/
___
Please keep all replies on the list by using reply all
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
http://lists.bx.psu.edu/