[jira] [Updated] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name

2020-06-17 Thread Thomas Poepping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-22928:
---
Status: Patch Available  (was: In Progress)

resubmitting .5.patch as .6.patch, hopefully the PreCommit job picks it up this 
time.

> Allow hive.exec.stagingdir to be a fully qualified directory name
> -
>
> Key: HIVE-22928
> URL: https://issues.apache.org/jira/browse/HIVE-22928
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Hive
>Affects Versions: 3.1.2
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>Priority: Minor
> Attachments: HIVE-22928.2.patch, HIVE-22928.3.patch, 
> HIVE-22928.4.patch, HIVE-22928.5.patch, HIVE-22928.6.patch, HIVE-22928.patch
>
>
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
> name that, for operations like {{insert}} or {{insert overwrite}}, will be 
> placed either under the table directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very 
> large (greater than the capacity of the HDFS cluster, as mentioned in a 
> comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their 
> staging directory to be an explicit blobstore path (or any filesystem path), 
> rather than relying on Hive to intelligently build the blobstore path based 
> on an interpretation of the job. We may lose locality guarantees, but because 
> renames are just as expensive on blobstores no matter what the prefix is, 
> this isn't considered a terribly large loss (assuming only blobstore 
> customers use this functionality).
> Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
> suffice in this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all 
> staging directories. For instances where the configured stagingdir scheme is 
> not the same as the scheme for the table location, the default stagingdir 
> configuration is used. This avoids a cross-filesystem rename, which is 
> impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name

2020-06-17 Thread Thomas Poepping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-22928:
---
Status: In Progress  (was: Patch Available)

> Allow hive.exec.stagingdir to be a fully qualified directory name
> -
>
> Key: HIVE-22928
> URL: https://issues.apache.org/jira/browse/HIVE-22928
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Hive
>Affects Versions: 3.1.2
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>Priority: Minor
> Attachments: HIVE-22928.2.patch, HIVE-22928.3.patch, 
> HIVE-22928.4.patch, HIVE-22928.5.patch, HIVE-22928.6.patch, HIVE-22928.patch
>
>
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
> name that, for operations like {{insert}} or {{insert overwrite}}, will be 
> placed either under the table directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very 
> large (greater than the capacity of the HDFS cluster, as mentioned in a 
> comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their 
> staging directory to be an explicit blobstore path (or any filesystem path), 
> rather than relying on Hive to intelligently build the blobstore path based 
> on an interpretation of the job. We may lose locality guarantees, but because 
> renames are just as expensive on blobstores no matter what the prefix is, 
> this isn't considered a terribly large loss (assuming only blobstore 
> customers use this functionality).
> Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
> suffice in this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all 
> staging directories. For instances where the configured stagingdir scheme is 
> not the same as the scheme for the table location, the default stagingdir 
> configuration is used. This avoids a cross-filesystem rename, which is 
> impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name

2020-06-17 Thread Thomas Poepping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-22928:
---
Attachment: HIVE-22928.6.patch

> Allow hive.exec.stagingdir to be a fully qualified directory name
> -
>
> Key: HIVE-22928
> URL: https://issues.apache.org/jira/browse/HIVE-22928
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Hive
>Affects Versions: 3.1.2
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>Priority: Minor
> Attachments: HIVE-22928.2.patch, HIVE-22928.3.patch, 
> HIVE-22928.4.patch, HIVE-22928.5.patch, HIVE-22928.6.patch, HIVE-22928.patch
>
>
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
> name that, for operations like {{insert}} or {{insert overwrite}}, will be 
> placed either under the table directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very 
> large (greater than the capacity of the HDFS cluster, as mentioned in a 
> comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their 
> staging directory to be an explicit blobstore path (or any filesystem path), 
> rather than relying on Hive to intelligently build the blobstore path based 
> on an interpretation of the job. We may lose locality guarantees, but because 
> renames are just as expensive on blobstores no matter what the prefix is, 
> this isn't considered a terribly large loss (assuming only blobstore 
> customers use this functionality).
> Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
> suffice in this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all 
> staging directories. For instances where the configured stagingdir scheme is 
> not the same as the scheme for the table location, the default stagingdir 
> configuration is used. This avoids a cross-filesystem rename, which is 
> impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name

2020-06-07 Thread Thomas Poepping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-22928:
---
Status: Patch Available  (was: In Progress)

patch.5 should fix unit test issues.

> Allow hive.exec.stagingdir to be a fully qualified directory name
> -
>
> Key: HIVE-22928
> URL: https://issues.apache.org/jira/browse/HIVE-22928
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Hive
>Affects Versions: 3.1.2
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>Priority: Minor
> Attachments: HIVE-22928.2.patch, HIVE-22928.3.patch, 
> HIVE-22928.4.patch, HIVE-22928.5.patch, HIVE-22928.patch
>
>
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
> name that, for operations like {{insert}} or {{insert overwrite}}, will be 
> placed either under the table directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very 
> large (greater than the capacity of the HDFS cluster, as mentioned in a 
> comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their 
> staging directory to be an explicit blobstore path (or any filesystem path), 
> rather than relying on Hive to intelligently build the blobstore path based 
> on an interpretation of the job. We may lose locality guarantees, but because 
> renames are just as expensive on blobstores no matter what the prefix is, 
> this isn't considered a terribly large loss (assuming only blobstore 
> customers use this functionality).
> Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
> suffice in this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all 
> staging directories. For instances where the configured stagingdir scheme is 
> not the same as the scheme for the table location, the default stagingdir 
> configuration is used. This avoids a cross-filesystem rename, which is 
> impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name

2020-06-07 Thread Thomas Poepping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-22928:
---
Status: In Progress  (was: Patch Available)

> Allow hive.exec.stagingdir to be a fully qualified directory name
> -
>
> Key: HIVE-22928
> URL: https://issues.apache.org/jira/browse/HIVE-22928
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Hive
>Affects Versions: 3.1.2
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>Priority: Minor
> Attachments: HIVE-22928.2.patch, HIVE-22928.3.patch, 
> HIVE-22928.4.patch, HIVE-22928.5.patch, HIVE-22928.patch
>
>
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
> name that, for operations like {{insert}} or {{insert overwrite}}, will be 
> placed either under the table directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very 
> large (greater than the capacity of the HDFS cluster, as mentioned in a 
> comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their 
> staging directory to be an explicit blobstore path (or any filesystem path), 
> rather than relying on Hive to intelligently build the blobstore path based 
> on an interpretation of the job. We may lose locality guarantees, but because 
> renames are just as expensive on blobstores no matter what the prefix is, 
> this isn't considered a terribly large loss (assuming only blobstore 
> customers use this functionality).
> Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
> suffice in this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all 
> staging directories. For instances where the configured stagingdir scheme is 
> not the same as the scheme for the table location, the default stagingdir 
> configuration is used. This avoids a cross-filesystem rename, which is 
> impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name

2020-06-06 Thread Thomas Poepping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-22928:
---
Attachment: HIVE-22928.5.patch

> Allow hive.exec.stagingdir to be a fully qualified directory name
> -
>
> Key: HIVE-22928
> URL: https://issues.apache.org/jira/browse/HIVE-22928
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Hive
>Affects Versions: 3.1.2
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>Priority: Minor
> Attachments: HIVE-22928.2.patch, HIVE-22928.3.patch, 
> HIVE-22928.4.patch, HIVE-22928.5.patch, HIVE-22928.patch
>
>
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
> name that, for operations like {{insert}} or {{insert overwrite}}, will be 
> placed either under the table directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very 
> large (greater than the capacity of the HDFS cluster, as mentioned in a 
> comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their 
> staging directory to be an explicit blobstore path (or any filesystem path), 
> rather than relying on Hive to intelligently build the blobstore path based 
> on an interpretation of the job. We may lose locality guarantees, but because 
> renames are just as expensive on blobstores no matter what the prefix is, 
> this isn't considered a terribly large loss (assuming only blobstore 
> customers use this functionality).
> Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
> suffice in this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all 
> staging directories. For instances where the configured stagingdir scheme is 
> not the same as the scheme for the table location, the default stagingdir 
> configuration is used. This avoids a cross-filesystem rename, which is 
> impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name

2020-04-17 Thread Thomas Poepping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-22928:
---
Attachment: HIVE-22928.4.patch

> Allow hive.exec.stagingdir to be a fully qualified directory name
> -
>
> Key: HIVE-22928
> URL: https://issues.apache.org/jira/browse/HIVE-22928
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Hive
>Affects Versions: 3.1.2
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>Priority: Minor
> Attachments: HIVE-22928.2.patch, HIVE-22928.3.patch, 
> HIVE-22928.4.patch, HIVE-22928.patch
>
>
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
> name that, for operations like {{insert}} or {{insert overwrite}}, will be 
> placed either under the table directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very 
> large (greater than the capacity of the HDFS cluster, as mentioned in a 
> comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their 
> staging directory to be an explicit blobstore path (or any filesystem path), 
> rather than relying on Hive to intelligently build the blobstore path based 
> on an interpretation of the job. We may lose locality guarantees, but because 
> renames are just as expensive on blobstores no matter what the prefix is, 
> this isn't considered a terribly large loss (assuming only blobstore 
> customers use this functionality).
> Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
> suffice in this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all 
> staging directories. For instances where the configured stagingdir scheme is 
> not the same as the scheme for the table location, the default stagingdir 
> configuration is used. This avoids a cross-filesystem rename, which is 
> impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name

2020-03-25 Thread Thomas Poepping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-22928:
---
Attachment: HIVE-22928.3.patch

> Allow hive.exec.stagingdir to be a fully qualified directory name
> -
>
> Key: HIVE-22928
> URL: https://issues.apache.org/jira/browse/HIVE-22928
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Hive
>Affects Versions: 3.1.2
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>Priority: Minor
> Attachments: HIVE-22928.2.patch, HIVE-22928.3.patch, HIVE-22928.patch
>
>
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
> name that, for operations like {{insert}} or {{insert overwrite}}, will be 
> placed either under the table directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very 
> large (greater than the capacity of the HDFS cluster, as mentioned in a 
> comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their 
> staging directory to be an explicit blobstore path (or any filesystem path), 
> rather than relying on Hive to intelligently build the blobstore path based 
> on an interpretation of the job. We may lose locality guarantees, but because 
> renames are just as expensive on blobstores no matter what the prefix is, 
> this isn't considered a terribly large loss (assuming only blobstore 
> customers use this functionality).
> Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
> suffice in this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all 
> staging directories. For instances where the configured stagingdir scheme is 
> not the same as the scheme for the table location, the default stagingdir 
> configuration is used. This avoids a cross-filesystem rename, which is 
> impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name

2020-03-23 Thread Thomas Poepping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-22928:
---
Attachment: HIVE-22928.2.patch

> Allow hive.exec.stagingdir to be a fully qualified directory name
> -
>
> Key: HIVE-22928
> URL: https://issues.apache.org/jira/browse/HIVE-22928
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Hive
>Affects Versions: 3.1.2
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>Priority: Minor
> Attachments: HIVE-22928.2.patch, HIVE-22928.patch
>
>
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
> name that, for operations like {{insert}} or {{insert overwrite}}, will be 
> placed either under the table directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very 
> large (greater than the capacity of the HDFS cluster, as mentioned in a 
> comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their 
> staging directory to be an explicit blobstore path (or any filesystem path), 
> rather than relying on Hive to intelligently build the blobstore path based 
> on an interpretation of the job. We may lose locality guarantees, but because 
> renames are just as expensive on blobstores no matter what the prefix is, 
> this isn't considered a terribly large loss (assuming only blobstore 
> customers use this functionality).
> Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
> suffice in this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all 
> staging directories. For instances where the configured stagingdir scheme is 
> not the same as the scheme for the table location, the default stagingdir 
> configuration is used. This avoids a cross-filesystem rename, which is 
> impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name

2020-03-16 Thread Thomas Poepping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-22928:
---
Status: Patch Available  (was: Open)

> Allow hive.exec.stagingdir to be a fully qualified directory name
> -
>
> Key: HIVE-22928
> URL: https://issues.apache.org/jira/browse/HIVE-22928
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Hive
>Affects Versions: 3.1.2
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>Priority: Minor
> Attachments: HIVE-22928.patch
>
>
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
> name that, for operations like {{insert}} or {{insert overwrite}}, will be 
> placed either under the table directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very 
> large (greater than the capacity of the HDFS cluster, as mentioned in a 
> comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their 
> staging directory to be an explicit blobstore path (or any filesystem path), 
> rather than relying on Hive to intelligently build the blobstore path based 
> on an interpretation of the job. We may lose locality guarantees, but because 
> renames are just as expensive on blobstores no matter what the prefix is, 
> this isn't considered a terribly large loss (assuming only blobstore 
> customers use this functionality).
> Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
> suffice in this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all 
> staging directories. For instances where the configured stagingdir scheme is 
> not the same as the scheme for the table location, the default stagingdir 
> configuration is used. This avoids a cross-filesystem rename, which is 
> impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name

2020-03-16 Thread Thomas Poepping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-22928:
---
Status: In Progress  (was: Patch Available)

> Allow hive.exec.stagingdir to be a fully qualified directory name
> -
>
> Key: HIVE-22928
> URL: https://issues.apache.org/jira/browse/HIVE-22928
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Hive
>Affects Versions: 3.1.2
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>Priority: Minor
> Attachments: HIVE-22928.patch
>
>
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
> name that, for operations like {{insert}} or {{insert overwrite}}, will be 
> placed either under the table directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very 
> large (greater than the capacity of the HDFS cluster, as mentioned in a 
> comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their 
> staging directory to be an explicit blobstore path (or any filesystem path), 
> rather than relying on Hive to intelligently build the blobstore path based 
> on an interpretation of the job. We may lose locality guarantees, but because 
> renames are just as expensive on blobstores no matter what the prefix is, 
> this isn't considered a terribly large loss (assuming only blobstore 
> customers use this functionality).
> Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
> suffice in this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all 
> staging directories. For instances where the configured stagingdir scheme is 
> not the same as the scheme for the table location, the default stagingdir 
> configuration is used. This avoids a cross-filesystem rename, which is 
> impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name

2020-03-09 Thread Thomas Poepping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-22928:
---
Status: Patch Available  (was: Open)

> Allow hive.exec.stagingdir to be a fully qualified directory name
> -
>
> Key: HIVE-22928
> URL: https://issues.apache.org/jira/browse/HIVE-22928
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Hive
>Affects Versions: 3.1.2
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>Priority: Minor
> Attachments: HIVE-22928.patch
>
>
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
> name that, for operations like {{insert}} or {{insert overwrite}}, will be 
> placed either under the table directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very 
> large (greater than the capacity of the HDFS cluster, as mentioned in a 
> comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their 
> staging directory to be an explicit blobstore path (or any filesystem path), 
> rather than relying on Hive to intelligently build the blobstore path based 
> on an interpretation of the job. We may lose locality guarantees, but because 
> renames are just as expensive on blobstores no matter what the prefix is, 
> this isn't considered a terribly large loss (assuming only blobstore 
> customers use this functionality).
> Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
> suffice in this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all 
> staging directories. For instances where the configured stagingdir scheme is 
> not the same as the scheme for the table location, the default stagingdir 
> configuration is used. This avoids a cross-filesystem rename, which is 
> impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name

2020-03-09 Thread Thomas Poepping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-22928:
---
Status: Open  (was: Patch Available)

resubmitting patch to regen the ASF and checkstyle reports so I can fix them.

> Allow hive.exec.stagingdir to be a fully qualified directory name
> -
>
> Key: HIVE-22928
> URL: https://issues.apache.org/jira/browse/HIVE-22928
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Hive
>Affects Versions: 3.1.2
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>Priority: Minor
> Attachments: HIVE-22928.patch
>
>
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
> name that, for operations like {{insert}} or {{insert overwrite}}, will be 
> placed either under the table directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very 
> large (greater than the capacity of the HDFS cluster, as mentioned in a 
> comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their 
> staging directory to be an explicit blobstore path (or any filesystem path), 
> rather than relying on Hive to intelligently build the blobstore path based 
> on an interpretation of the job. We may lose locality guarantees, but because 
> renames are just as expensive on blobstores no matter what the prefix is, 
> this isn't considered a terribly large loss (assuming only blobstore 
> customers use this functionality).
> Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
> suffice in this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all 
> staging directories. For instances where the configured stagingdir scheme is 
> not the same as the scheme for the table location, the default stagingdir 
> configuration is used. This avoids a cross-filesystem rename, which is 
> impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name

2020-02-26 Thread Thomas Poepping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-22928:
---
Status: Patch Available  (was: Open)

Added v1 patch. Couldn't test s3a:// in the getStagingDir unit test because the 
tests were trying to actually access S3. All unit tests passed otherwise in my 
local dev environment.

> Allow hive.exec.stagingdir to be a fully qualified directory name
> -
>
> Key: HIVE-22928
> URL: https://issues.apache.org/jira/browse/HIVE-22928
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Hive
>Affects Versions: 3.1.2
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>Priority: Minor
> Attachments: HIVE-22928.patch
>
>
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
> name that, for operations like {{insert}} or {{insert overwrite}}, will be 
> placed either under the table directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very 
> large (greater than the capacity of the HDFS cluster, as mentioned in a 
> comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their 
> staging directory to be an explicit blobstore path (or any filesystem path), 
> rather than relying on Hive to intelligently build the blobstore path based 
> on an interpretation of the job. We may lose locality guarantees, but because 
> renames are just as expensive on blobstores no matter what the prefix is, 
> this isn't considered a terribly large loss (assuming only blobstore 
> customers use this functionality).
> Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
> suffice in this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all 
> staging directories. For instances where the configured stagingdir scheme is 
> not the same as the scheme for the table location, the default stagingdir 
> configuration is used. This avoids a cross-filesystem rename, which is 
> impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name

2020-02-26 Thread Thomas Poepping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-22928:
---
Attachment: HIVE-22928.patch

> Allow hive.exec.stagingdir to be a fully qualified directory name
> -
>
> Key: HIVE-22928
> URL: https://issues.apache.org/jira/browse/HIVE-22928
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Hive
>Affects Versions: 3.1.2
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>Priority: Minor
> Attachments: HIVE-22928.patch
>
>
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
> name that, for operations like {{insert}} or {{insert overwrite}}, will be 
> placed either under the table directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very 
> large (greater than the capacity of the HDFS cluster, as mentioned in a 
> comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their 
> staging directory to be an explicit blobstore path (or any filesystem path), 
> rather than relying on Hive to intelligently build the blobstore path based 
> on an interpretation of the job. We may lose locality guarantees, but because 
> renames are just as expensive on blobstores no matter what the prefix is, 
> this isn't considered a terribly large loss (assuming only blobstore 
> customers use this functionality).
> Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
> suffice in this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all 
> staging directories. For instances where the configured stagingdir scheme is 
> not the same as the scheme for the table location, the default stagingdir 
> configuration is used. This avoids a cross-filesystem rename, which is 
> impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)