Re: [galaxy-dev] inconsistent use of tempfile.mkstemp during upload causes problems

2014-06-10 Thread John Chilton
Hello All,

  Thanks for the well laid out e-mails and great discussion. I think
John-Paul's comment about the code growing up organically is probably
exactly right. (A link below has some details from Nate about this).

  So late last night I opened a sprawling pull request that cleaned up
a lot of stuff in upload.py and then realized it was a little bit
incorrect and wouldn't actually help any of you until you were able to
upgrade to the August 2014 release :) so I declined. I have reworked a
relatively small patch to fix the immediate problem of the consistency
tmpfile consistency as it relates to shutil.move. John-Paul Robinson,
can you apply it directly and see if it fixes your problems?

https://bitbucket.org/jmchilton/galaxy-central-fork-1/commits/dc706d78d9b21a1175199fd9201fe9781d48ffb5/raw

  If it does, the devteam will get this merged and I will continue
with the upload.py improvements that were inspired by this discussion
(see https://bitbucket.org/galaxy/galaxy-central/pull-request/408 for
more details).

-John

On Mon, Jun 9, 2014 at 2:50 PM, John-Paul Robinson j...@uab.edu wrote:
 We've considered the sudo solution, but it opens the window to other
 bugs giving galaxy the power to change ownership of other files in our
 shared user cluster environment.  We could isolate the power to a script
 but then we still need to monitor this code closely.  We'd prefer not to
 introduce that requirement.

 I didn't have the time to trace this down either. ;)  I just got tired
 of this issue and the inconsistent failures causing confusion in our
 community.

 I hope your insight into the logic drift over time is accurate and can
 be corrected.  The upload code looks like it's gone through a whole lot
 of organic growth. :/

 Looking forward to additional comments from the dev team.

 ~jpr


 On 06/09/2014 03:30 PM, Kandalaft, Iyad wrote:
 Hi JPR,

 I had the same questions while trying to figure out a fool-proof way to 
 allow users to import files into galaxy on our Cluster.  I couldn't exactly 
 figure out, nor did I have the time to really review, why the galaxy code 
 did these steps and why that shutil.move failed.  I opted to simply insert 
 code in upload.py to sudo chown/chmod the files as an easier hack to 
 this problem.  There are pros and cons to using the tmp var from the env, 
 and it will depend on your intentions/infrastructure.  I think the ideology 
 was that the Galaxy folder was supposed to be shared across all nodes in a 
 cluster, and they opted to use the TMP path within the galaxy folder.  
 Overtime, the code probably partially diverged from that notion, which 
 caused this dilemma.

 I believe that the best fix is to make the underlying code simply copy the 
 files into the environment-provided temp, which is configurable in galaxy's 
 universe_wsgi.ini, and assume ownership from the get-go.  This code of 
 copying and/or moving in discrete steps creates unnecessary complexity.

 Regards,
 Iyad

 -Original Message-
 From: galaxy-dev-boun...@lists.bx.psu.edu 
 [mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of John-Paul Robinson
 Sent: Monday, June 09, 2014 3:08 PM
 To: galaxy-dev@lists.bx.psu.edu
 Subject: [galaxy-dev] inconsistent use of tempfile.mkstemp during upload 
 causes problems

 There appears to be some inconsistent use of tempfile.mkstemp() within 
 upload.py that causes problems when users import data files to galaxy from a 
 cluster directory via the upload process and import/temp/dataset directories 
 are on different file systems.

 The issue manifests when Galaxy's job directory, dataset directory and 
 import directory are on different file systems (common for cluster
 environments) in conjunction with a configuration where users can copy their 
 data files directly to the import directory from which Galaxy selects data 
 sets to upload (as opposed to using an FTP gateway).

 While allowing users to copy files to an import directory rather than using 
 the FTP gateway may not be that common, we use this configuration locally to 
 help build a more seamless interface with our local collection of HPC 
 resources.  Users can be logged into their cluster account and move data 
 into galaxy with a file copy command rather than having to use FTP.

 This configuration has worked well in our environment as long as the correct 
 ownership configuration existed on the import directory and as long as the 
 import directory, job temporary directory, and galaxy data set directory 
 were all on the same file system.

 We now have our galaxy dataset directory on a different file system and are 
 seeing inconsistent behavior during the upload.py runs depending on if the 
 data is ordinary text, BAM files, or gzipped data.

 A subset of uploads will fail because of the way temporary files are created 
 by Galaxy to facilitate the import and any associated conversion processes 
 of different file types.

 During the import,

 1) Galaxy will copy the original file

Re: [galaxy-dev] inconsistent use of tempfile.mkstemp during upload causes problems

2014-06-10 Thread John-Paul Robinson
Thanks for proposing a patch.

We're looking at trying it out on our code base but are at least a few
revisions behind and may need to do a little back porting (or catching up).

Let me make sure I understand the intention of the patch, though.


From the code changes, I trust that the intention is that all temporary
files should be created in the Galaxy dataset directory.  That is the
temp directory path will be the explicitly derived from the output_path
that is an argument to upload.py.

This is fine.

I'm somewhat partial to the environment influencing the TMP dir via an
unadorned mkstemp()  but I can appreciate having a definitive
output_path specified on the call to upload.py.

One issue that will affect us slightly (but probably not others) is that
we currently have our dataset path on a glusterfs volume which doesn't
support ACLs.  This means we won't have a way of overriding umask
settings that may be in place for the user.  If a user with umask 022
writes a file to their import dir, the permissions will at best be
read-only for the galaxy processes group owner.  This may prevent with
the os.rename as well.  With a default ACL we could cause the umask to
be ignored in favor of a permissive ACL.  This is really a local deploy
issue but it's worth knowing about.

A second question, it looks like the patch only updates convert_newlines
and convert_newlines_sep2tabs.   The handle_compressed_file and sep2tabs
methods in sniff also have bare mkstemp() calls. Do they need
modification as well?

~jpr

On 06/10/2014 10:49 AM, John Chilton wrote:
 Hello All,

   Thanks for the well laid out e-mails and great discussion. I think
 John-Paul's comment about the code growing up organically is probably
 exactly right. (A link below has some details from Nate about this).

   So late last night I opened a sprawling pull request that cleaned up
 a lot of stuff in upload.py and then realized it was a little bit
 incorrect and wouldn't actually help any of you until you were able to
 upgrade to the August 2014 release :) so I declined. I have reworked a
 relatively small patch to fix the immediate problem of the consistency
 tmpfile consistency as it relates to shutil.move. John-Paul Robinson,
 can you apply it directly and see if it fixes your problems?

 https://bitbucket.org/jmchilton/galaxy-central-fork-1/commits/dc706d78d9b21a1175199fd9201fe9781d48ffb5/raw

   If it does, the devteam will get this merged and I will continue
 with the upload.py improvements that were inspired by this discussion
 (see https://bitbucket.org/galaxy/galaxy-central/pull-request/408 for
 more details).

 -John

 On Mon, Jun 9, 2014 at 2:50 PM, John-Paul Robinson j...@uab.edu wrote:
 We've considered the sudo solution, but it opens the window to other
 bugs giving galaxy the power to change ownership of other files in our
 shared user cluster environment.  We could isolate the power to a script
 but then we still need to monitor this code closely.  We'd prefer not to
 introduce that requirement.

 I didn't have the time to trace this down either. ;)  I just got tired
 of this issue and the inconsistent failures causing confusion in our
 community.

 I hope your insight into the logic drift over time is accurate and can
 be corrected.  The upload code looks like it's gone through a whole lot
 of organic growth. :/

 Looking forward to additional comments from the dev team.

 ~jpr


 On 06/09/2014 03:30 PM, Kandalaft, Iyad wrote:
 Hi JPR,

 I had the same questions while trying to figure out a fool-proof way to 
 allow users to import files into galaxy on our Cluster.  I couldn't exactly 
 figure out, nor did I have the time to really review, why the galaxy code 
 did these steps and why that shutil.move failed.  I opted to simply insert 
 code in upload.py to sudo chown/chmod the files as an easier hack to 
 this problem.  There are pros and cons to using the tmp var from the env, 
 and it will depend on your intentions/infrastructure.  I think the ideology 
 was that the Galaxy folder was supposed to be shared across all nodes in a 
 cluster, and they opted to use the TMP path within the galaxy folder.  
 Overtime, the code probably partially diverged from that notion, which 
 caused this dilemma.

 I believe that the best fix is to make the underlying code simply copy the 
 files into the environment-provided temp, which is configurable in galaxy's 
 universe_wsgi.ini, and assume ownership from the get-go.  This code of 
 copying and/or moving in discrete steps creates unnecessary complexity.

 Regards,
 Iyad

 -Original Message-
 From: galaxy-dev-boun...@lists.bx.psu.edu 
 [mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of John-Paul Robinson
 Sent: Monday, June 09, 2014 3:08 PM
 To: galaxy-dev@lists.bx.psu.edu
 Subject: [galaxy-dev] inconsistent use of tempfile.mkstemp during upload 
 causes problems

 There appears to be some inconsistent use of tempfile.mkstemp() within 
 upload.py that causes problems when users

Re: [galaxy-dev] inconsistent use of tempfile.mkstemp during upload causes problems

2014-06-10 Thread John Chilton
 the time to trace this down either. ;)  I just got tired
 of this issue and the inconsistent failures causing confusion in our
 community.

 I hope your insight into the logic drift over time is accurate and can
 be corrected.  The upload code looks like it's gone through a whole lot
 of organic growth. :/

 Looking forward to additional comments from the dev team.

 ~jpr


 On 06/09/2014 03:30 PM, Kandalaft, Iyad wrote:
 Hi JPR,

 I had the same questions while trying to figure out a fool-proof way to 
 allow users to import files into galaxy on our Cluster.  I couldn't 
 exactly figure out, nor did I have the time to really review, why the 
 galaxy code did these steps and why that shutil.move failed.  I opted to 
 simply insert code in upload.py to sudo chown/chmod the files as an 
 easier hack to this problem.  There are pros and cons to using the tmp 
 var from the env, and it will depend on your intentions/infrastructure.  I 
 think the ideology was that the Galaxy folder was supposed to be shared 
 across all nodes in a cluster, and they opted to use the TMP path within 
 the galaxy folder.  Overtime, the code probably partially diverged from 
 that notion, which caused this dilemma.

 I believe that the best fix is to make the underlying code simply copy the 
 files into the environment-provided temp, which is configurable in 
 galaxy's universe_wsgi.ini, and assume ownership from the get-go.  This 
 code of copying and/or moving in discrete steps creates unnecessary 
 complexity.

 Regards,
 Iyad

 -Original Message-
 From: galaxy-dev-boun...@lists.bx.psu.edu 
 [mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of John-Paul 
 Robinson
 Sent: Monday, June 09, 2014 3:08 PM
 To: galaxy-dev@lists.bx.psu.edu
 Subject: [galaxy-dev] inconsistent use of tempfile.mkstemp during upload 
 causes problems

 There appears to be some inconsistent use of tempfile.mkstemp() within 
 upload.py that causes problems when users import data files to galaxy from 
 a cluster directory via the upload process and import/temp/dataset 
 directories are on different file systems.

 The issue manifests when Galaxy's job directory, dataset directory and 
 import directory are on different file systems (common for cluster
 environments) in conjunction with a configuration where users can copy 
 their data files directly to the import directory from which Galaxy 
 selects data sets to upload (as opposed to using an FTP gateway).

 While allowing users to copy files to an import directory rather than 
 using the FTP gateway may not be that common, we use this configuration 
 locally to help build a more seamless interface with our local collection 
 of HPC resources.  Users can be logged into their cluster account and move 
 data into galaxy with a file copy command rather than having to use FTP.

 This configuration has worked well in our environment as long as the 
 correct ownership configuration existed on the import directory and as 
 long as the import directory, job temporary directory, and galaxy data set 
 directory were all on the same file system.

 We now have our galaxy dataset directory on a different file system and 
 are seeing inconsistent behavior during the upload.py runs depending on if 
 the data is ordinary text, BAM files, or gzipped data.

 A subset of uploads will fail because of the way temporary files are 
 created by Galaxy to facilitate the import and any associated conversion 
 processes of different file types.

 During the import,

 1) Galaxy will copy the original file to a temporary target file 
 (converting as needed during the copy).
 2) Once this first conversion step is complete, Galaxy then attempts to 
 move the temporary file back to the original location, ie. the import 
 directory.
 3) If this move is succeeds, Galaxy completes the upload processing and 
 the data becomes a registered data set in the user's dataset collection.

 Galaxy prefers the Python shutil.move method to move tempfile . This 
 results in a simple os.rename if the files remain on the same file system. 
  However, if os.rename raises OSError because a move was attempted across 
 a file system boundary, shutil.move resorts to a copy2, which copies the 
 data to the original import file and then tries to copy the file 
 attributes (permissions and utimes) to the original import file from the 
 source file (which will be the temporary file Galaxy created in step 1 to 
 begin the conversion process).

 The os.rename and shutil.copy2 behave (and fail) differently depending on 
 the file ownership of the original import file.  The os.rename will 
 succeed even if the Galaxy upload.py job process only maps to the 
 group-owner of the original import file (which can be ensured with group 
 sticky bit on the import dir or ACLs). The shutil.copy2 command, however, 
 will fail if the Galaxy upload.py job process UID is not the user-owner of 
 the original import file.

 We could ensure the os.rename succeeds by keeping

[galaxy-dev] inconsistent use of tempfile.mkstemp during upload causes problems

2014-06-09 Thread John-Paul Robinson
There appears to be some inconsistent use of tempfile.mkstemp() within
upload.py that causes problems when users import data files to galaxy
from a cluster directory via the upload process and import/temp/dataset
directories are on different file systems.

The issue manifests when Galaxy's job directory, dataset directory and
import directory are on different file systems (common for cluster
environments) in conjunction with a configuration where users can copy
their data files directly to the import directory from which Galaxy
selects data sets to upload (as opposed to using an FTP gateway).

While allowing users to copy files to an import directory rather than
using the FTP gateway may not be that common, we use this configuration
locally to help build a more seamless interface with our local
collection of HPC resources.  Users can be logged into their cluster
account and move data into galaxy with a file copy command rather than
having to use FTP.

This configuration has worked well in our environment as long as the
correct ownership configuration existed on the import directory and as
long as the import directory, job temporary directory, and galaxy data
set directory were all on the same file system.

We now have our galaxy dataset directory on a different file system and
are seeing inconsistent behavior during the upload.py runs depending on
if the data is ordinary text, BAM files, or gzipped data.

A subset of uploads will fail because of the way temporary files are
created by Galaxy to facilitate the import and any associated conversion
processes of different file types.

During the import,

1) Galaxy will copy the original file to a temporary target file
(converting as needed during the copy).
2) Once this first conversion step is complete, Galaxy then attempts to
move the temporary file back to the original location, ie. the import
directory.
3) If this move is succeeds, Galaxy completes the upload processing and
the data becomes a registered data set in the user's dataset collection.

Galaxy prefers the Python shutil.move method to move tempfile . This
results in a simple os.rename if the files remain on the same file
system.  However, if os.rename raises OSError because a move was
attempted across a file system boundary, shutil.move resorts to a copy2,
which copies the data to the original import file and then tries to copy
the file attributes (permissions and utimes) to the original import file
from the source file (which will be the temporary file Galaxy created in
step 1 to begin the conversion process).

The os.rename and shutil.copy2 behave (and fail) differently depending
on the file ownership of the original import file.  The os.rename will
succeed even if the Galaxy upload.py job process only maps to the
group-owner of the original import file (which can be ensured with group
sticky bit on the import dir or ACLs). The shutil.copy2 command,
however, will fail if the Galaxy upload.py job process UID is not the
user-owner of the original import file.

We could ensure the os.rename succeeds by keeping the job temporary
directory and the import directory on the same file system.  However, it
seems the temporary directories used by upload.py are inconsistent
across data types which prevents this simple fix from working for all
data types.

When text files are imported, upload calls the sniff.* methods to
perform conversion.  These methods use a bare call to tempfile.mkstemp()
which ensures the file is created in the directory specified by the env
var $TMPDIR. For example in sniff.convert_newlines:

fd, temp_name = tempfile.mkstemp()

https://bitbucket.org/galaxy/galaxy-central/src/5884328e91724e9bdf4b43f012eb63fa0c803ef6/lib/galaxy/datatypes/sniff.py?at=default#cl-105

However, for compressed files, the upload.py script directly creates
temp files but here it specifies the target directory as the same as the
data set directory:

fd, uncompressed = tempfile.mkstemp(
prefix='data_id_%s_upload_gunzip_' % dataset.dataset_id,
dir=os.path.dirname( output_path ), text=False )

https://bitbucket.org/galaxy/galaxy-central/src/5884328e91724e9bdf4b43f012eb63fa0c803ef6/tools/data_source/upload.py?at=default#cl-130

It's not clear if there is any significance to using the data set
directory as the tempdir for compressed files versus the job temporary
directory for other data files.

It seems like all temporary files created by upload.py should be
consistently created in the same temporary location, and preferably in
the job temp directory.

Is there a reason that these file types use different temporary file
locations?

If they used the same tempfile location, we could use one consistent
system configuration and ensure all our data files can be imported even
when the import+tempdir are not on the same file system as the Galaxy
dataset dir.  It seems reasonable that all tempfile.mkstemp() calls
should be unadorned and inherit the temp directory location from their
environment.

A more 

Re: [galaxy-dev] inconsistent use of tempfile.mkstemp during upload causes problems

2014-06-09 Thread Curtis Hendrickson (Campus)
Iyad, 

Thanks for chiming in. We looked at the patch you had posted, but we run on a 
cluster galaxy shares with other apps, so the galaxy user does not have sudo 
outside of the galaxy head-node. 
The upload.py code you posted, therefore does not work for us. 
Therefore, we're also looking at a chown hook in 
lib/galaxy/tools/actions/upload_common.py, but find that code a little more 
inscrutable ( see earlier thread with you on this subject ) 

http://dev.list.galaxyproject.org/Uploading-files-to-galaxy-from-a-folder-tt4664614.html#a4664669
However, both chown hooks seem too heavy handed, and like a workaround or 
band-aid on an underlying problem. I hate to use elevated privileges (with the 
security risks that brings) if we don't really have to. That being said, we 
need to change something in the short terms so things work, be it the 
filesystem layout, the upload.py workflow, or a chown in upload_common.py.

Regards, 
Curtis



-Original Message-
From: galaxy-dev-boun...@lists.bx.psu.edu 
[mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Kandalaft, Iyad
Sent: Monday, June 09, 2014 2:30 PM
To: John-Paul Robinson (Campus); galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] inconsistent use of tempfile.mkstemp during upload 
causes problems

Hi JPR,

I had the same questions while trying to figure out a fool-proof way to allow 
users to import files into galaxy on our Cluster.  I couldn't exactly figure 
out, nor did I have the time to really review, why the galaxy code did these 
steps and why that shutil.move failed.  I opted to simply insert code in 
upload.py to sudo chown/chmod the files as an easier hack to this problem.  
There are pros and cons to using the tmp var from the env, and it will depend 
on your intentions/infrastructure.  I think the ideology was that the Galaxy 
folder was supposed to be shared across all nodes in a cluster, and they opted 
to use the TMP path within the galaxy folder.  Overtime, the code probably 
partially diverged from that notion, which caused this dilemma.

I believe that the best fix is to make the underlying code simply copy the 
files into the environment-provided temp, which is configurable in galaxy's 
universe_wsgi.ini, and assume ownership from the get-go.  This code of copying 
and/or moving in discrete steps creates unnecessary complexity. 

Regards,
Iyad

-Original Message-
From: galaxy-dev-boun...@lists.bx.psu.edu 
[mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of John-Paul Robinson
Sent: Monday, June 09, 2014 3:08 PM
To: galaxy-dev@lists.bx.psu.edu
Subject: [galaxy-dev] inconsistent use of tempfile.mkstemp during upload causes 
problems

There appears to be some inconsistent use of tempfile.mkstemp() within 
upload.py that causes problems when users import data files to galaxy from a 
cluster directory via the upload process and import/temp/dataset directories 
are on different file systems.

The issue manifests when Galaxy's job directory, dataset directory and import 
directory are on different file systems (common for cluster
environments) in conjunction with a configuration where users can copy their 
data files directly to the import directory from which Galaxy selects data sets 
to upload (as opposed to using an FTP gateway).

While allowing users to copy files to an import directory rather than using the 
FTP gateway may not be that common, we use this configuration locally to help 
build a more seamless interface with our local collection of HPC resources.  
Users can be logged into their cluster account and move data into galaxy with a 
file copy command rather than having to use FTP.

This configuration has worked well in our environment as long as the correct 
ownership configuration existed on the import directory and as long as the 
import directory, job temporary directory, and galaxy data set directory were 
all on the same file system.

We now have our galaxy dataset directory on a different file system and are 
seeing inconsistent behavior during the upload.py runs depending on if the data 
is ordinary text, BAM files, or gzipped data.

A subset of uploads will fail because of the way temporary files are created by 
Galaxy to facilitate the import and any associated conversion processes of 
different file types.

During the import,

1) Galaxy will copy the original file to a temporary target file (converting as 
needed during the copy).
2) Once this first conversion step is complete, Galaxy then attempts to move 
the temporary file back to the original location, ie. the import directory.
3) If this move is succeeds, Galaxy completes the upload processing and the 
data becomes a registered data set in the user's dataset collection.

Galaxy prefers the Python shutil.move method to move tempfile . This results in 
a simple os.rename if the files remain on the same file system.  However, if 
os.rename raises OSError because a move was attempted across a file system 
boundary, shutil.move

Re: [galaxy-dev] inconsistent use of tempfile.mkstemp during upload causes problems

2014-06-09 Thread John-Paul Robinson
We've considered the sudo solution, but it opens the window to other
bugs giving galaxy the power to change ownership of other files in our
shared user cluster environment.  We could isolate the power to a script
but then we still need to monitor this code closely.  We'd prefer not to
introduce that requirement.

I didn't have the time to trace this down either. ;)  I just got tired
of this issue and the inconsistent failures causing confusion in our
community.

I hope your insight into the logic drift over time is accurate and can
be corrected.  The upload code looks like it's gone through a whole lot
of organic growth. :/

Looking forward to additional comments from the dev team.

~jpr


On 06/09/2014 03:30 PM, Kandalaft, Iyad wrote:
 Hi JPR,

 I had the same questions while trying to figure out a fool-proof way to allow 
 users to import files into galaxy on our Cluster.  I couldn't exactly figure 
 out, nor did I have the time to really review, why the galaxy code did these 
 steps and why that shutil.move failed.  I opted to simply insert code in 
 upload.py to sudo chown/chmod the files as an easier hack to this 
 problem.  There are pros and cons to using the tmp var from the env, and it 
 will depend on your intentions/infrastructure.  I think the ideology was that 
 the Galaxy folder was supposed to be shared across all nodes in a cluster, 
 and they opted to use the TMP path within the galaxy folder.  Overtime, the 
 code probably partially diverged from that notion, which caused this dilemma.

 I believe that the best fix is to make the underlying code simply copy the 
 files into the environment-provided temp, which is configurable in galaxy's 
 universe_wsgi.ini, and assume ownership from the get-go.  This code of 
 copying and/or moving in discrete steps creates unnecessary complexity. 

 Regards,
 Iyad

 -Original Message-
 From: galaxy-dev-boun...@lists.bx.psu.edu 
 [mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of John-Paul Robinson
 Sent: Monday, June 09, 2014 3:08 PM
 To: galaxy-dev@lists.bx.psu.edu
 Subject: [galaxy-dev] inconsistent use of tempfile.mkstemp during upload 
 causes problems

 There appears to be some inconsistent use of tempfile.mkstemp() within 
 upload.py that causes problems when users import data files to galaxy from a 
 cluster directory via the upload process and import/temp/dataset directories 
 are on different file systems.

 The issue manifests when Galaxy's job directory, dataset directory and import 
 directory are on different file systems (common for cluster
 environments) in conjunction with a configuration where users can copy their 
 data files directly to the import directory from which Galaxy selects data 
 sets to upload (as opposed to using an FTP gateway).

 While allowing users to copy files to an import directory rather than using 
 the FTP gateway may not be that common, we use this configuration locally to 
 help build a more seamless interface with our local collection of HPC 
 resources.  Users can be logged into their cluster account and move data into 
 galaxy with a file copy command rather than having to use FTP.

 This configuration has worked well in our environment as long as the correct 
 ownership configuration existed on the import directory and as long as the 
 import directory, job temporary directory, and galaxy data set directory were 
 all on the same file system.

 We now have our galaxy dataset directory on a different file system and are 
 seeing inconsistent behavior during the upload.py runs depending on if the 
 data is ordinary text, BAM files, or gzipped data.

 A subset of uploads will fail because of the way temporary files are created 
 by Galaxy to facilitate the import and any associated conversion processes of 
 different file types.

 During the import,

 1) Galaxy will copy the original file to a temporary target file (converting 
 as needed during the copy).
 2) Once this first conversion step is complete, Galaxy then attempts to move 
 the temporary file back to the original location, ie. the import directory.
 3) If this move is succeeds, Galaxy completes the upload processing and the 
 data becomes a registered data set in the user's dataset collection.

 Galaxy prefers the Python shutil.move method to move tempfile . This results 
 in a simple os.rename if the files remain on the same file system.  However, 
 if os.rename raises OSError because a move was attempted across a file system 
 boundary, shutil.move resorts to a copy2, which copies the data to the 
 original import file and then tries to copy the file attributes (permissions 
 and utimes) to the original import file from the source file (which will be 
 the temporary file Galaxy created in step 1 to begin the conversion process).

 The os.rename and shutil.copy2 behave (and fail) differently depending on the 
 file ownership of the original import file.  The os.rename will succeed even 
 if the Galaxy upload.py job process only maps