Thomas Poepping created HIVE-22928:
--------------------------------------

             Summary: Allow hive.exec.stagingdir to be a fully qualified 
directory name
                 Key: HIVE-22928
                 URL: https://issues.apache.org/jira/browse/HIVE-22928
             Project: Hive
          Issue Type: Improvement
          Components: Configuration, Hive
    Affects Versions: 3.1.2
            Reporter: Thomas Poepping
            Assignee: Thomas Poepping


Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
name that, for operations like {{insert}} or {{insert overwrite}}, will be 
placed either under the table directory or the partition directory. 

For cases where an HDFS cluster is small but the data being inserted is very 
large (greater than the capacity of the HDFS cluster, as mentioned in a comment 
by [~ashutoshc] on [HIVE-14270]), the client may want to set their staging 
directory to be an explicit blobstore path (or any filesystem path), rather 
than relying on Hive to intelligently build the blobstore path based on an 
interpretation of the job. We may lose locality guarantees, but because renames 
are just as expensive on blobstores no matter what the prefix is, this isn't 
considered a terribly large loss (assuming only blobstore customers use this 
functionality).

Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
suffice in this case, as the stagingdir is not the same.

This commit enables Hive customers to set an absolute location for all staging 
directories. For instances where the configured stagingdir scheme is not the 
same as the scheme for the table location, the default stagingdir configuration 
is used. This avoids a cross-filesystem rename, which is impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to