[jira] [Updated] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name

2020-06-17 Thread Thomas Poepping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-22928:
---
Status: Patch Available  (was: In Progress)

resubmitting .5.patch as .6.patch, hopefully the PreCommit job picks it up this 
time.

> Allow hive.exec.stagingdir to be a fully qualified directory name
> -
>
> Key: HIVE-22928
> URL: https://issues.apache.org/jira/browse/HIVE-22928
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Hive
>Affects Versions: 3.1.2
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>Priority: Minor
> Attachments: HIVE-22928.2.patch, HIVE-22928.3.patch, 
> HIVE-22928.4.patch, HIVE-22928.5.patch, HIVE-22928.6.patch, HIVE-22928.patch
>
>
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
> name that, for operations like {{insert}} or {{insert overwrite}}, will be 
> placed either under the table directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very 
> large (greater than the capacity of the HDFS cluster, as mentioned in a 
> comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their 
> staging directory to be an explicit blobstore path (or any filesystem path), 
> rather than relying on Hive to intelligently build the blobstore path based 
> on an interpretation of the job. We may lose locality guarantees, but because 
> renames are just as expensive on blobstores no matter what the prefix is, 
> this isn't considered a terribly large loss (assuming only blobstore 
> customers use this functionality).
> Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
> suffice in this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all 
> staging directories. For instances where the configured stagingdir scheme is 
> not the same as the scheme for the table location, the default stagingdir 
> configuration is used. This avoids a cross-filesystem rename, which is 
> impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name

2020-06-17 Thread Thomas Poepping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-22928:
---
Status: In Progress  (was: Patch Available)

> Allow hive.exec.stagingdir to be a fully qualified directory name
> -
>
> Key: HIVE-22928
> URL: https://issues.apache.org/jira/browse/HIVE-22928
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Hive
>Affects Versions: 3.1.2
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>Priority: Minor
> Attachments: HIVE-22928.2.patch, HIVE-22928.3.patch, 
> HIVE-22928.4.patch, HIVE-22928.5.patch, HIVE-22928.6.patch, HIVE-22928.patch
>
>
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
> name that, for operations like {{insert}} or {{insert overwrite}}, will be 
> placed either under the table directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very 
> large (greater than the capacity of the HDFS cluster, as mentioned in a 
> comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their 
> staging directory to be an explicit blobstore path (or any filesystem path), 
> rather than relying on Hive to intelligently build the blobstore path based 
> on an interpretation of the job. We may lose locality guarantees, but because 
> renames are just as expensive on blobstores no matter what the prefix is, 
> this isn't considered a terribly large loss (assuming only blobstore 
> customers use this functionality).
> Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
> suffice in this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all 
> staging directories. For instances where the configured stagingdir scheme is 
> not the same as the scheme for the table location, the default stagingdir 
> configuration is used. This avoids a cross-filesystem rename, which is 
> impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name

2020-06-17 Thread Thomas Poepping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-22928:
---
Attachment: HIVE-22928.6.patch

> Allow hive.exec.stagingdir to be a fully qualified directory name
> -
>
> Key: HIVE-22928
> URL: https://issues.apache.org/jira/browse/HIVE-22928
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Hive
>Affects Versions: 3.1.2
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>Priority: Minor
> Attachments: HIVE-22928.2.patch, HIVE-22928.3.patch, 
> HIVE-22928.4.patch, HIVE-22928.5.patch, HIVE-22928.6.patch, HIVE-22928.patch
>
>
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
> name that, for operations like {{insert}} or {{insert overwrite}}, will be 
> placed either under the table directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very 
> large (greater than the capacity of the HDFS cluster, as mentioned in a 
> comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their 
> staging directory to be an explicit blobstore path (or any filesystem path), 
> rather than relying on Hive to intelligently build the blobstore path based 
> on an interpretation of the job. We may lose locality guarantees, but because 
> renames are just as expensive on blobstores no matter what the prefix is, 
> this isn't considered a terribly large loss (assuming only blobstore 
> customers use this functionality).
> Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
> suffice in this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all 
> staging directories. For instances where the configured stagingdir scheme is 
> not the same as the scheme for the table location, the default stagingdir 
> configuration is used. This avoids a cross-filesystem rename, which is 
> impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name

2020-06-08 Thread Thomas Poepping (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128816#comment-17128816
 ] 

Thomas Poepping commented on HIVE-22928:


I've been watching the PreCommit Jenkins and it doesn't look like it's picked 
up my patch yet. Any tips for getting Jenkins to pick up my diff?

> Allow hive.exec.stagingdir to be a fully qualified directory name
> -
>
> Key: HIVE-22928
> URL: https://issues.apache.org/jira/browse/HIVE-22928
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Hive
>Affects Versions: 3.1.2
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>Priority: Minor
> Attachments: HIVE-22928.2.patch, HIVE-22928.3.patch, 
> HIVE-22928.4.patch, HIVE-22928.5.patch, HIVE-22928.patch
>
>
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
> name that, for operations like {{insert}} or {{insert overwrite}}, will be 
> placed either under the table directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very 
> large (greater than the capacity of the HDFS cluster, as mentioned in a 
> comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their 
> staging directory to be an explicit blobstore path (or any filesystem path), 
> rather than relying on Hive to intelligently build the blobstore path based 
> on an interpretation of the job. We may lose locality guarantees, but because 
> renames are just as expensive on blobstores no matter what the prefix is, 
> this isn't considered a terribly large loss (assuming only blobstore 
> customers use this functionality).
> Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
> suffice in this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all 
> staging directories. For instances where the configured stagingdir scheme is 
> not the same as the scheme for the table location, the default stagingdir 
> configuration is used. This avoids a cross-filesystem rename, which is 
> impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name

2020-06-07 Thread Thomas Poepping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-22928:
---
Status: Patch Available  (was: In Progress)

patch.5 should fix unit test issues.

> Allow hive.exec.stagingdir to be a fully qualified directory name
> -
>
> Key: HIVE-22928
> URL: https://issues.apache.org/jira/browse/HIVE-22928
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Hive
>Affects Versions: 3.1.2
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>Priority: Minor
> Attachments: HIVE-22928.2.patch, HIVE-22928.3.patch, 
> HIVE-22928.4.patch, HIVE-22928.5.patch, HIVE-22928.patch
>
>
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
> name that, for operations like {{insert}} or {{insert overwrite}}, will be 
> placed either under the table directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very 
> large (greater than the capacity of the HDFS cluster, as mentioned in a 
> comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their 
> staging directory to be an explicit blobstore path (or any filesystem path), 
> rather than relying on Hive to intelligently build the blobstore path based 
> on an interpretation of the job. We may lose locality guarantees, but because 
> renames are just as expensive on blobstores no matter what the prefix is, 
> this isn't considered a terribly large loss (assuming only blobstore 
> customers use this functionality).
> Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
> suffice in this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all 
> staging directories. For instances where the configured stagingdir scheme is 
> not the same as the scheme for the table location, the default stagingdir 
> configuration is used. This avoids a cross-filesystem rename, which is 
> impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name

2020-06-07 Thread Thomas Poepping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-22928:
---
Status: In Progress  (was: Patch Available)

> Allow hive.exec.stagingdir to be a fully qualified directory name
> -
>
> Key: HIVE-22928
> URL: https://issues.apache.org/jira/browse/HIVE-22928
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Hive
>Affects Versions: 3.1.2
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>Priority: Minor
> Attachments: HIVE-22928.2.patch, HIVE-22928.3.patch, 
> HIVE-22928.4.patch, HIVE-22928.5.patch, HIVE-22928.patch
>
>
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
> name that, for operations like {{insert}} or {{insert overwrite}}, will be 
> placed either under the table directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very 
> large (greater than the capacity of the HDFS cluster, as mentioned in a 
> comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their 
> staging directory to be an explicit blobstore path (or any filesystem path), 
> rather than relying on Hive to intelligently build the blobstore path based 
> on an interpretation of the job. We may lose locality guarantees, but because 
> renames are just as expensive on blobstores no matter what the prefix is, 
> this isn't considered a terribly large loss (assuming only blobstore 
> customers use this functionality).
> Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
> suffice in this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all 
> staging directories. For instances where the configured stagingdir scheme is 
> not the same as the scheme for the table location, the default stagingdir 
> configuration is used. This avoids a cross-filesystem rename, which is 
> impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name

2020-06-06 Thread Thomas Poepping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-22928:
---
Attachment: HIVE-22928.5.patch

> Allow hive.exec.stagingdir to be a fully qualified directory name
> -
>
> Key: HIVE-22928
> URL: https://issues.apache.org/jira/browse/HIVE-22928
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Hive
>Affects Versions: 3.1.2
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>Priority: Minor
> Attachments: HIVE-22928.2.patch, HIVE-22928.3.patch, 
> HIVE-22928.4.patch, HIVE-22928.5.patch, HIVE-22928.patch
>
>
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
> name that, for operations like {{insert}} or {{insert overwrite}}, will be 
> placed either under the table directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very 
> large (greater than the capacity of the HDFS cluster, as mentioned in a 
> comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their 
> staging directory to be an explicit blobstore path (or any filesystem path), 
> rather than relying on Hive to intelligently build the blobstore path based 
> on an interpretation of the job. We may lose locality guarantees, but because 
> renames are just as expensive on blobstores no matter what the prefix is, 
> this isn't considered a terribly large loss (assuming only blobstore 
> customers use this functionality).
> Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
> suffice in this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all 
> staging directories. For instances where the configured stagingdir scheme is 
> not the same as the scheme for the table location, the default stagingdir 
> configuration is used. This avoids a cross-filesystem rename, which is 
> impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name

2020-04-17 Thread Thomas Poepping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-22928:
---
Attachment: HIVE-22928.4.patch

> Allow hive.exec.stagingdir to be a fully qualified directory name
> -
>
> Key: HIVE-22928
> URL: https://issues.apache.org/jira/browse/HIVE-22928
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Hive
>Affects Versions: 3.1.2
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>Priority: Minor
> Attachments: HIVE-22928.2.patch, HIVE-22928.3.patch, 
> HIVE-22928.4.patch, HIVE-22928.patch
>
>
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
> name that, for operations like {{insert}} or {{insert overwrite}}, will be 
> placed either under the table directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very 
> large (greater than the capacity of the HDFS cluster, as mentioned in a 
> comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their 
> staging directory to be an explicit blobstore path (or any filesystem path), 
> rather than relying on Hive to intelligently build the blobstore path based 
> on an interpretation of the job. We may lose locality guarantees, but because 
> renames are just as expensive on blobstores no matter what the prefix is, 
> this isn't considered a terribly large loss (assuming only blobstore 
> customers use this functionality).
> Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
> suffice in this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all 
> staging directories. For instances where the configured stagingdir scheme is 
> not the same as the scheme for the table location, the default stagingdir 
> configuration is used. This avoids a cross-filesystem rename, which is 
> impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name

2020-03-25 Thread Thomas Poepping (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067237#comment-17067237
 ] 

Thomas Poepping commented on HIVE-22928:


attached a new patch: .3

this patch should address the failing test, and will hopefully address the 
checkstyle failures. the rest of TestContext.java is indented with 4 spaces, 
but isn't throwing style errors. my IDE (IntelliJ) automatically conforms to 
pre-existing file style despite project-wide style. I've changed my intellij to 
use the tab character instead of spaces in hopes that that will placate the 
style checker.

> Allow hive.exec.stagingdir to be a fully qualified directory name
> -
>
> Key: HIVE-22928
> URL: https://issues.apache.org/jira/browse/HIVE-22928
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Hive
>Affects Versions: 3.1.2
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>Priority: Minor
> Attachments: HIVE-22928.2.patch, HIVE-22928.3.patch, HIVE-22928.patch
>
>
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
> name that, for operations like {{insert}} or {{insert overwrite}}, will be 
> placed either under the table directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very 
> large (greater than the capacity of the HDFS cluster, as mentioned in a 
> comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their 
> staging directory to be an explicit blobstore path (or any filesystem path), 
> rather than relying on Hive to intelligently build the blobstore path based 
> on an interpretation of the job. We may lose locality guarantees, but because 
> renames are just as expensive on blobstores no matter what the prefix is, 
> this isn't considered a terribly large loss (assuming only blobstore 
> customers use this functionality).
> Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
> suffice in this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all 
> staging directories. For instances where the configured stagingdir scheme is 
> not the same as the scheme for the table location, the default stagingdir 
> configuration is used. This avoids a cross-filesystem rename, which is 
> impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name

2020-03-25 Thread Thomas Poepping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-22928:
---
Attachment: HIVE-22928.3.patch

> Allow hive.exec.stagingdir to be a fully qualified directory name
> -
>
> Key: HIVE-22928
> URL: https://issues.apache.org/jira/browse/HIVE-22928
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Hive
>Affects Versions: 3.1.2
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>Priority: Minor
> Attachments: HIVE-22928.2.patch, HIVE-22928.3.patch, HIVE-22928.patch
>
>
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
> name that, for operations like {{insert}} or {{insert overwrite}}, will be 
> placed either under the table directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very 
> large (greater than the capacity of the HDFS cluster, as mentioned in a 
> comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their 
> staging directory to be an explicit blobstore path (or any filesystem path), 
> rather than relying on Hive to intelligently build the blobstore path based 
> on an interpretation of the job. We may lose locality guarantees, but because 
> renames are just as expensive on blobstores no matter what the prefix is, 
> this isn't considered a terribly large loss (assuming only blobstore 
> customers use this functionality).
> Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
> suffice in this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all 
> staging directories. For instances where the configured stagingdir scheme is 
> not the same as the scheme for the table location, the default stagingdir 
> configuration is used. This avoids a cross-filesystem rename, which is 
> impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name

2020-03-23 Thread Thomas Poepping (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065115#comment-17065115
 ] 

Thomas Poepping commented on HIVE-22928:


Thanks [~dkuzmenko], I'll give that a shot!

> Allow hive.exec.stagingdir to be a fully qualified directory name
> -
>
> Key: HIVE-22928
> URL: https://issues.apache.org/jira/browse/HIVE-22928
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Hive
>Affects Versions: 3.1.2
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>Priority: Minor
> Attachments: HIVE-22928.2.patch, HIVE-22928.patch
>
>
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
> name that, for operations like {{insert}} or {{insert overwrite}}, will be 
> placed either under the table directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very 
> large (greater than the capacity of the HDFS cluster, as mentioned in a 
> comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their 
> staging directory to be an explicit blobstore path (or any filesystem path), 
> rather than relying on Hive to intelligently build the blobstore path based 
> on an interpretation of the job. We may lose locality guarantees, but because 
> renames are just as expensive on blobstores no matter what the prefix is, 
> this isn't considered a terribly large loss (assuming only blobstore 
> customers use this functionality).
> Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
> suffice in this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all 
> staging directories. For instances where the configured stagingdir scheme is 
> not the same as the scheme for the table location, the default stagingdir 
> configuration is used. This avoids a cross-filesystem rename, which is 
> impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name

2020-03-23 Thread Thomas Poepping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-22928:
---
Attachment: HIVE-22928.2.patch

> Allow hive.exec.stagingdir to be a fully qualified directory name
> -
>
> Key: HIVE-22928
> URL: https://issues.apache.org/jira/browse/HIVE-22928
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Hive
>Affects Versions: 3.1.2
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>Priority: Minor
> Attachments: HIVE-22928.2.patch, HIVE-22928.patch
>
>
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
> name that, for operations like {{insert}} or {{insert overwrite}}, will be 
> placed either under the table directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very 
> large (greater than the capacity of the HDFS cluster, as mentioned in a 
> comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their 
> staging directory to be an explicit blobstore path (or any filesystem path), 
> rather than relying on Hive to intelligently build the blobstore path based 
> on an interpretation of the job. We may lose locality guarantees, but because 
> renames are just as expensive on blobstores no matter what the prefix is, 
> this isn't considered a terribly large loss (assuming only blobstore 
> customers use this functionality).
> Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
> suffice in this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all 
> staging directories. For instances where the configured stagingdir scheme is 
> not the same as the scheme for the table location, the default stagingdir 
> configuration is used. This avoids a cross-filesystem rename, which is 
> impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work stopped] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name

2020-03-16 Thread Thomas Poepping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-22928 stopped by Thomas Poepping.
--
> Allow hive.exec.stagingdir to be a fully qualified directory name
> -
>
> Key: HIVE-22928
> URL: https://issues.apache.org/jira/browse/HIVE-22928
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Hive
>Affects Versions: 3.1.2
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>Priority: Minor
> Attachments: HIVE-22928.patch
>
>
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
> name that, for operations like {{insert}} or {{insert overwrite}}, will be 
> placed either under the table directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very 
> large (greater than the capacity of the HDFS cluster, as mentioned in a 
> comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their 
> staging directory to be an explicit blobstore path (or any filesystem path), 
> rather than relying on Hive to intelligently build the blobstore path based 
> on an interpretation of the job. We may lose locality guarantees, but because 
> renames are just as expensive on blobstores no matter what the prefix is, 
> this isn't considered a terribly large loss (assuming only blobstore 
> customers use this functionality).
> Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
> suffice in this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all 
> staging directories. For instances where the configured stagingdir scheme is 
> not the same as the scheme for the table location, the default stagingdir 
> configuration is used. This avoids a cross-filesystem rename, which is 
> impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name

2020-03-16 Thread Thomas Poepping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-22928:
---
Status: Patch Available  (was: Open)

> Allow hive.exec.stagingdir to be a fully qualified directory name
> -
>
> Key: HIVE-22928
> URL: https://issues.apache.org/jira/browse/HIVE-22928
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Hive
>Affects Versions: 3.1.2
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>Priority: Minor
> Attachments: HIVE-22928.patch
>
>
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
> name that, for operations like {{insert}} or {{insert overwrite}}, will be 
> placed either under the table directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very 
> large (greater than the capacity of the HDFS cluster, as mentioned in a 
> comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their 
> staging directory to be an explicit blobstore path (or any filesystem path), 
> rather than relying on Hive to intelligently build the blobstore path based 
> on an interpretation of the job. We may lose locality guarantees, but because 
> renames are just as expensive on blobstores no matter what the prefix is, 
> this isn't considered a terribly large loss (assuming only blobstore 
> customers use this functionality).
> Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
> suffice in this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all 
> staging directories. For instances where the configured stagingdir scheme is 
> not the same as the scheme for the table location, the default stagingdir 
> configuration is used. This avoids a cross-filesystem rename, which is 
> impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name

2020-03-16 Thread Thomas Poepping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-22928:
---
Status: In Progress  (was: Patch Available)

> Allow hive.exec.stagingdir to be a fully qualified directory name
> -
>
> Key: HIVE-22928
> URL: https://issues.apache.org/jira/browse/HIVE-22928
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Hive
>Affects Versions: 3.1.2
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>Priority: Minor
> Attachments: HIVE-22928.patch
>
>
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
> name that, for operations like {{insert}} or {{insert overwrite}}, will be 
> placed either under the table directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very 
> large (greater than the capacity of the HDFS cluster, as mentioned in a 
> comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their 
> staging directory to be an explicit blobstore path (or any filesystem path), 
> rather than relying on Hive to intelligently build the blobstore path based 
> on an interpretation of the job. We may lose locality guarantees, but because 
> renames are just as expensive on blobstores no matter what the prefix is, 
> this isn't considered a terribly large loss (assuming only blobstore 
> customers use this functionality).
> Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
> suffice in this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all 
> staging directories. For instances where the configured stagingdir scheme is 
> not the same as the scheme for the table location, the default stagingdir 
> configuration is used. This avoids a cross-filesystem rename, which is 
> impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name

2020-03-09 Thread Thomas Poepping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-22928:
---
Status: Patch Available  (was: Open)

> Allow hive.exec.stagingdir to be a fully qualified directory name
> -
>
> Key: HIVE-22928
> URL: https://issues.apache.org/jira/browse/HIVE-22928
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Hive
>Affects Versions: 3.1.2
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>Priority: Minor
> Attachments: HIVE-22928.patch
>
>
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
> name that, for operations like {{insert}} or {{insert overwrite}}, will be 
> placed either under the table directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very 
> large (greater than the capacity of the HDFS cluster, as mentioned in a 
> comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their 
> staging directory to be an explicit blobstore path (or any filesystem path), 
> rather than relying on Hive to intelligently build the blobstore path based 
> on an interpretation of the job. We may lose locality guarantees, but because 
> renames are just as expensive on blobstores no matter what the prefix is, 
> this isn't considered a terribly large loss (assuming only blobstore 
> customers use this functionality).
> Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
> suffice in this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all 
> staging directories. For instances where the configured stagingdir scheme is 
> not the same as the scheme for the table location, the default stagingdir 
> configuration is used. This avoids a cross-filesystem rename, which is 
> impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name

2020-03-09 Thread Thomas Poepping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-22928:
---
Status: Open  (was: Patch Available)

resubmitting patch to regen the ASF and checkstyle reports so I can fix them.

> Allow hive.exec.stagingdir to be a fully qualified directory name
> -
>
> Key: HIVE-22928
> URL: https://issues.apache.org/jira/browse/HIVE-22928
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Hive
>Affects Versions: 3.1.2
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>Priority: Minor
> Attachments: HIVE-22928.patch
>
>
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
> name that, for operations like {{insert}} or {{insert overwrite}}, will be 
> placed either under the table directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very 
> large (greater than the capacity of the HDFS cluster, as mentioned in a 
> comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their 
> staging directory to be an explicit blobstore path (or any filesystem path), 
> rather than relying on Hive to intelligently build the blobstore path based 
> on an interpretation of the job. We may lose locality guarantees, but because 
> renames are just as expensive on blobstores no matter what the prefix is, 
> this isn't considered a terribly large loss (assuming only blobstore 
> customers use this functionality).
> Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
> suffice in this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all 
> staging directories. For instances where the configured stagingdir scheme is 
> not the same as the scheme for the table location, the default stagingdir 
> configuration is used. This avoids a cross-filesystem rename, which is 
> impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name

2020-02-26 Thread Thomas Poepping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-22928:
---
Status: Patch Available  (was: Open)

Added v1 patch. Couldn't test s3a:// in the getStagingDir unit test because the 
tests were trying to actually access S3. All unit tests passed otherwise in my 
local dev environment.

> Allow hive.exec.stagingdir to be a fully qualified directory name
> -
>
> Key: HIVE-22928
> URL: https://issues.apache.org/jira/browse/HIVE-22928
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Hive
>Affects Versions: 3.1.2
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>Priority: Minor
> Attachments: HIVE-22928.patch
>
>
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
> name that, for operations like {{insert}} or {{insert overwrite}}, will be 
> placed either under the table directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very 
> large (greater than the capacity of the HDFS cluster, as mentioned in a 
> comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their 
> staging directory to be an explicit blobstore path (or any filesystem path), 
> rather than relying on Hive to intelligently build the blobstore path based 
> on an interpretation of the job. We may lose locality guarantees, but because 
> renames are just as expensive on blobstores no matter what the prefix is, 
> this isn't considered a terribly large loss (assuming only blobstore 
> customers use this functionality).
> Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
> suffice in this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all 
> staging directories. For instances where the configured stagingdir scheme is 
> not the same as the scheme for the table location, the default stagingdir 
> configuration is used. This avoids a cross-filesystem rename, which is 
> impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name

2020-02-26 Thread Thomas Poepping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-22928:
---
Attachment: HIVE-22928.patch

> Allow hive.exec.stagingdir to be a fully qualified directory name
> -
>
> Key: HIVE-22928
> URL: https://issues.apache.org/jira/browse/HIVE-22928
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Hive
>Affects Versions: 3.1.2
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>Priority: Minor
> Attachments: HIVE-22928.patch
>
>
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
> name that, for operations like {{insert}} or {{insert overwrite}}, will be 
> placed either under the table directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very 
> large (greater than the capacity of the HDFS cluster, as mentioned in a 
> comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their 
> staging directory to be an explicit blobstore path (or any filesystem path), 
> rather than relying on Hive to intelligently build the blobstore path based 
> on an interpretation of the job. We may lose locality guarantees, but because 
> renames are just as expensive on blobstores no matter what the prefix is, 
> this isn't considered a terribly large loss (assuming only blobstore 
> customers use this functionality).
> Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
> suffice in this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all 
> staging directories. For instances where the configured stagingdir scheme is 
> not the same as the scheme for the table location, the default stagingdir 
> configuration is used. This avoids a cross-filesystem rename, which is 
> impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name

2020-02-25 Thread Thomas Poepping (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045061#comment-17045061
 ] 

Thomas Poepping commented on HIVE-22928:


Also, it's been a _really_ long time since I've contributed. Are we still doing 
patch files? GitHub pull requests?

> Allow hive.exec.stagingdir to be a fully qualified directory name
> -
>
> Key: HIVE-22928
> URL: https://issues.apache.org/jira/browse/HIVE-22928
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Hive
>Affects Versions: 3.1.2
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>Priority: Minor
>
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
> name that, for operations like {{insert}} or {{insert overwrite}}, will be 
> placed either under the table directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very 
> large (greater than the capacity of the HDFS cluster, as mentioned in a 
> comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their 
> staging directory to be an explicit blobstore path (or any filesystem path), 
> rather than relying on Hive to intelligently build the blobstore path based 
> on an interpretation of the job. We may lose locality guarantees, but because 
> renames are just as expensive on blobstores no matter what the prefix is, 
> this isn't considered a terribly large loss (assuming only blobstore 
> customers use this functionality).
> Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
> suffice in this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all 
> staging directories. For instances where the configured stagingdir scheme is 
> not the same as the scheme for the table location, the default stagingdir 
> configuration is used. This avoids a cross-filesystem rename, which is 
> impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name

2020-02-25 Thread Thomas Poepping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping reassigned HIVE-22928:
--


> Allow hive.exec.stagingdir to be a fully qualified directory name
> -
>
> Key: HIVE-22928
> URL: https://issues.apache.org/jira/browse/HIVE-22928
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Hive
>Affects Versions: 3.1.2
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>Priority: Minor
>
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
> name that, for operations like {{insert}} or {{insert overwrite}}, will be 
> placed either under the table directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very 
> large (greater than the capacity of the HDFS cluster, as mentioned in a 
> comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their 
> staging directory to be an explicit blobstore path (or any filesystem path), 
> rather than relying on Hive to intelligently build the blobstore path based 
> on an interpretation of the job. We may lose locality guarantees, but because 
> renames are just as expensive on blobstores no matter what the prefix is, 
> this isn't considered a terribly large loss (assuming only blobstore 
> customers use this functionality).
> Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
> suffice in this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all 
> staging directories. For instances where the configured stagingdir scheme is 
> not the same as the scheme for the table location, the default stagingdir 
> configuration is used. This avoids a cross-filesystem rename, which is 
> impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-15618) Change hive-blobstore tests to run with Tez by default

2017-09-20 Thread Thomas Poepping (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-15618:
---
Status: Open  (was: Patch Available)

> Change hive-blobstore tests to run with Tez by default
> --
>
> Key: HIVE-15618
> URL: https://issues.apache.org/jira/browse/HIVE-15618
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Affects Versions: 2.2.0
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-15618.patch
>
>
> Ever since the upgrade to Hive 2, Tez has been the default execution engine 
> for Hive. To match that fact, it makes sense to run our tests against Tez, 
> rather than MR. This should more fully validate functionality against what we 
> consider to be Hive defaults.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15618) Change hive-blobstore tests to run with Tez by default

2017-09-20 Thread Thomas Poepping (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-15618:
---
Status: Patch Available  (was: Open)

> Change hive-blobstore tests to run with Tez by default
> --
>
> Key: HIVE-15618
> URL: https://issues.apache.org/jira/browse/HIVE-15618
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Affects Versions: 2.2.0
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-15618.patch
>
>
> Ever since the upgrade to Hive 2, Tez has been the default execution engine 
> for Hive. To match that fact, it makes sense to run our tests against Tez, 
> rather than MR. This should more fully validate functionality against what we 
> consider to be Hive defaults.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16288) Add blobstore tests for ORC and RCFILE file formats

2017-06-14 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049527#comment-16049527
 ] 

Thomas Poepping commented on HIVE-16288:


I hate to bump this again, but it looks like branch-2 is still wrong. Can you 
help again, [~ashutoshc]? Thank you!

> Add blobstore tests for ORC and RCFILE file formats
> ---
>
> Key: HIVE-16288
> URL: https://issues.apache.org/jira/browse/HIVE-16288
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.1.1
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Fix For: 2.3.0, 3.0.0
>
> Attachments: HIVE-16288.patch
>
>
> This patch adds four tests each for ORC and RCFILE when running against 
> blobstore filesystems:
>   * Test for bucketed tables
>   * Test for nonpartitioned tables
>   * Test for partitioned tables
>   * Test for partitioned tables with nonstandard partition locations



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16288) Add blobstore tests for ORC and RCFILE file formats

2017-04-25 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983274#comment-15983274
 ] 

Thomas Poepping commented on HIVE-16288:


hey [~ashutoshc], it looks like the patch may have applied incorrectly, can you 
take a look for us?

> Add blobstore tests for ORC and RCFILE file formats
> ---
>
> Key: HIVE-16288
> URL: https://issues.apache.org/jira/browse/HIVE-16288
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.1.1
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Fix For: 2.3.0, 3.0.0
>
> Attachments: HIVE-16288.patch
>
>
> This patch adds four tests each for ORC and RCFILE when running against 
> blobstore filesystems:
>   * Test for bucketed tables
>   * Test for nonpartitioned tables
>   * Test for partitioned tables
>   * Test for partitioned tables with nonstandard partition locations



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16415) Add tests covering single inserts of zero rows

2017-04-14 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969361#comment-15969361
 ] 

Thomas Poepping commented on HIVE-16415:


I haven't seen any conversation on the dev list about moving 2.2 anytime soon, 
so I don't see why we shouldn't target that.

> Add tests covering single inserts of zero rows
> --
>
> Key: HIVE-16415
> URL: https://issues.apache.org/jira/browse/HIVE-16415
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.1.1
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Fix For: 2.3.0, 3.0.0
>
> Attachments: HIVE-16415.01.patch, HIVE-16415.patch
>
>
> This patch introduces two regression tests into the hive-blobstore qtest 
> module: zero_rows_hdfs.q and zero_rows_blobstore.q. These test doing INSERT 
> commands with a WHERE clause where the condition of the WHERE clause causes 
> zero rows to be considered.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16415) Add tests covering single inserts of zero rows

2017-04-14 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969354#comment-15969354
 ] 

Thomas Poepping commented on HIVE-16415:


Thank you Sergio!

> Add tests covering single inserts of zero rows
> --
>
> Key: HIVE-16415
> URL: https://issues.apache.org/jira/browse/HIVE-16415
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.1.1
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-16415.01.patch, HIVE-16415.patch
>
>
> This patch introduces two regression tests into the hive-blobstore qtest 
> module: zero_rows_hdfs.q and zero_rows_blobstore.q. These test doing INSERT 
> commands with a WHERE clause where the condition of the WHERE clause causes 
> zero rows to be considered.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16415) Add blobstore tests for insertion of zero rows

2017-04-13 Thread Thomas Poepping (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-16415:
---
Attachment: HIVE-16415.01.patch

> Add blobstore tests for insertion of zero rows
> --
>
> Key: HIVE-16415
> URL: https://issues.apache.org/jira/browse/HIVE-16415
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.1.1
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-16415.01.patch, HIVE-16415.patch
>
>
> This patch introduces two regression tests into the hive-blobstore qtest 
> module: zero_rows_hdfs.q and zero_rows_blobstore.q. These test doing INSERT 
> commands with a WHERE clause where the condition of the WHERE clause causes 
> zero rows to be considered.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16415) Add blobstore tests for insertion of zero rows

2017-04-13 Thread Thomas Poepping (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-16415:
---
Status: Open  (was: Patch Available)

> Add blobstore tests for insertion of zero rows
> --
>
> Key: HIVE-16415
> URL: https://issues.apache.org/jira/browse/HIVE-16415
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.1.1
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-16415.01.patch, HIVE-16415.patch
>
>
> This patch introduces two regression tests into the hive-blobstore qtest 
> module: zero_rows_hdfs.q and zero_rows_blobstore.q. These test doing INSERT 
> commands with a WHERE clause where the condition of the WHERE clause causes 
> zero rows to be considered.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16415) Add blobstore tests for insertion of zero rows

2017-04-13 Thread Thomas Poepping (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-16415:
---
Status: Patch Available  (was: Open)

> Add blobstore tests for insertion of zero rows
> --
>
> Key: HIVE-16415
> URL: https://issues.apache.org/jira/browse/HIVE-16415
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.1.1
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-16415.01.patch, HIVE-16415.patch
>
>
> This patch introduces two regression tests into the hive-blobstore qtest 
> module: zero_rows_hdfs.q and zero_rows_blobstore.q. These test doing INSERT 
> commands with a WHERE clause where the condition of the WHERE clause causes 
> zero rows to be considered.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16415) Add blobstore tests for insertion of zero rows

2017-04-13 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968391#comment-15968391
 ] 

Thomas Poepping commented on HIVE-16415:


[~spena] I like that idea. Do you think the commit messages deserves to be 
edited with that addition? "Add tests covering insertion of zero rows", maybe 
even "Add tests covering single inserts of zero rows" to differentiate from the 
separate issue [~ashutoshc] found.

I have a new diff, I'll upload now

> Add blobstore tests for insertion of zero rows
> --
>
> Key: HIVE-16415
> URL: https://issues.apache.org/jira/browse/HIVE-16415
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.1.1
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-16415.patch
>
>
> This patch introduces two regression tests into the hive-blobstore qtest 
> module: zero_rows_hdfs.q and zero_rows_blobstore.q. These test doing INSERT 
> commands with a WHERE clause where the condition of the WHERE clause causes 
> zero rows to be considered.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16427) Fix multi-insert query and write qtests

2017-04-12 Thread Thomas Poepping (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-16427:
---
Description: 
On HIVE-16415, it was found that the bug reported to be fixed in HIVE-14519 was 
not actually fixed.

This task is to find the problem, fix it, and add qtests to verify no future 
regression.

Specifically, the following query does not produce correct answers: 
{code}
>From (select * from src) a
insert overwrite directory '/tmp/emp/dir1/'
select key, value
insert overwrite directory '/tmp/emp/dir2/'
select 'header'
limit 0
insert overwrite directory '/tmp/emp/dir3/'
select key, value 
where key = 100;
{code}

  was:
On HIVE-16415, it was found that the bug reported to be fixed in HIVE-14519 was 
not actually fixed.

This task is to find the problem, fix it, and add qtests to verify no future 
regression.


> Fix multi-insert query and write qtests
> ---
>
> Key: HIVE-16427
> URL: https://issues.apache.org/jira/browse/HIVE-16427
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Thomas Poepping
>
> On HIVE-16415, it was found that the bug reported to be fixed in HIVE-14519 
> was not actually fixed.
> This task is to find the problem, fix it, and add qtests to verify no future 
> regression.
> Specifically, the following query does not produce correct answers: 
> {code}
> From (select * from src) a
> insert overwrite directory '/tmp/emp/dir1/'
> select key, value
> insert overwrite directory '/tmp/emp/dir2/'
> select 'header'
> limit 0
> insert overwrite directory '/tmp/emp/dir3/'
> select key, value 
> where key = 100;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16415) Add blobstore tests for insertion of zero rows

2017-04-12 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966580#comment-15966580
 ] 

Thomas Poepping commented on HIVE-16415:


Thanks [~spena], created HIVE-16427 as a follow-up to HIVE-14519.

> Add blobstore tests for insertion of zero rows
> --
>
> Key: HIVE-16415
> URL: https://issues.apache.org/jira/browse/HIVE-16415
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.1.1
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-16415.patch
>
>
> This patch introduces two regression tests into the hive-blobstore qtest 
> module: zero_rows_hdfs.q and zero_rows_blobstore.q. These test doing INSERT 
> commands with a WHERE clause where the condition of the WHERE clause causes 
> zero rows to be considered.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16415) Add blobstore tests for insertion of zero rows

2017-04-11 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15964989#comment-15964989
 ] 

Thomas Poepping commented on HIVE-16415:


Seems to me that this case, while similar, is different enough to warrant its 
own jira. My suggestion: a follow-up to HIVE-14519 that fixes what is still 
broken, and creates tests for that. Agree? I can create that jira if you like, 
but I don't foresee myself being able to work on it anytime soon.

> Add blobstore tests for insertion of zero rows
> --
>
> Key: HIVE-16415
> URL: https://issues.apache.org/jira/browse/HIVE-16415
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.1.1
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-16415.patch
>
>
> This patch introduces two regression tests into the hive-blobstore qtest 
> module: zero_rows_hdfs.q and zero_rows_blobstore.q. These test doing INSERT 
> commands with a WHERE clause where the condition of the WHERE clause causes 
> zero rows to be considered.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16415) Add blobstore tests for insertion of zero rows

2017-04-11 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15964971#comment-15964971
 ] 

Thomas Poepping commented on HIVE-16415:


[~ashutoshc] what is your suggestion? Should we edit this test to also cover 
this case?

> Add blobstore tests for insertion of zero rows
> --
>
> Key: HIVE-16415
> URL: https://issues.apache.org/jira/browse/HIVE-16415
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.1.1
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-16415.patch
>
>
> This patch introduces two regression tests into the hive-blobstore qtest 
> module: zero_rows_hdfs.q and zero_rows_blobstore.q. These test doing INSERT 
> commands with a WHERE clause where the condition of the WHERE clause causes 
> zero rows to be considered.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16415) Add blobstore tests for insertion of zero rows

2017-04-11 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15964616#comment-15964616
 ] 

Thomas Poepping commented on HIVE-16415:


This test specifically addresses a NullPointerException caused when inserting 
zero rows that our team found before I got here. It seemed prudent to include 
it as a regression test into the current Hive distribution.

> Add blobstore tests for insertion of zero rows
> --
>
> Key: HIVE-16415
> URL: https://issues.apache.org/jira/browse/HIVE-16415
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.1.1
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-16415.patch
>
>
> This patch introduces two regression tests into the hive-blobstore qtest 
> module: zero_rows_hdfs.q and zero_rows_blobstore.q. These test doing INSERT 
> commands with a WHERE clause where the condition of the WHERE clause causes 
> zero rows to be considered.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16415) Add blobstore tests for insertion of zero rows

2017-04-10 Thread Thomas Poepping (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-16415:
---
Status: Patch Available  (was: Open)

> Add blobstore tests for insertion of zero rows
> --
>
> Key: HIVE-16415
> URL: https://issues.apache.org/jira/browse/HIVE-16415
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.1.1
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-16415.patch
>
>
> This patch introduces two regression tests into the hive-blobstore qtest 
> module: zero_rows_hdfs.q and zero_rows_blobstore.q. These test doing INSERT 
> commands with a WHERE clause where the condition of the WHERE clause causes 
> zero rows to be considered.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16415) Add blobstore tests for insertion of zero rows

2017-04-10 Thread Thomas Poepping (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-16415:
---
Attachment: HIVE-16415.patch

Attached patch

> Add blobstore tests for insertion of zero rows
> --
>
> Key: HIVE-16415
> URL: https://issues.apache.org/jira/browse/HIVE-16415
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.1.1
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-16415.patch
>
>
> This patch introduces two regression tests into the hive-blobstore qtest 
> module: zero_rows_hdfs.q and zero_rows_blobstore.q. These test doing INSERT 
> commands with a WHERE clause where the condition of the WHERE clause causes 
> zero rows to be considered.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16415) Add blobstore tests for insertion of zero rows

2017-04-10 Thread Thomas Poepping (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping reassigned HIVE-16415:
--


> Add blobstore tests for insertion of zero rows
> --
>
> Key: HIVE-16415
> URL: https://issues.apache.org/jira/browse/HIVE-16415
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.1.1
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>
> This patch introduces two regression tests into the hive-blobstore qtest 
> module: zero_rows_hdfs.q and zero_rows_blobstore.q. These test doing INSERT 
> commands with a WHERE clause where the condition of the WHERE clause causes 
> zero rows to be considered.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16288) Add blobstore tests for ORC and RCFILE file formats

2017-03-29 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948132#comment-15948132
 ] 

Thomas Poepping commented on HIVE-16288:


Thanks!

> Add blobstore tests for ORC and RCFILE file formats
> ---
>
> Key: HIVE-16288
> URL: https://issues.apache.org/jira/browse/HIVE-16288
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.1.1
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Fix For: 2.3.0, 3.0.0
>
> Attachments: HIVE-16288.patch
>
>
> This patch adds four tests each for ORC and RCFILE when running against 
> blobstore filesystems:
>   * Test for bucketed tables
>   * Test for nonpartitioned tables
>   * Test for partitioned tables
>   * Test for partitioned tables with nonstandard partition locations



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16288) Add blobstore tests for ORC and RCFILE file formats

2017-03-29 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947637#comment-15947637
 ] 

Thomas Poepping commented on HIVE-16288:


Hey [~ashutoshc], also looking to have this backported to branch-2. You want a 
second Jira for that? It should apply cleanly -- I don't think master and 
branch-2 have diverged enough yet.

> Add blobstore tests for ORC and RCFILE file formats
> ---
>
> Key: HIVE-16288
> URL: https://issues.apache.org/jira/browse/HIVE-16288
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.1.1
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Fix For: 3.0.0
>
> Attachments: HIVE-16288.patch
>
>
> This patch adds four tests each for ORC and RCFILE when running against 
> blobstore filesystems:
>   * Test for bucketed tables
>   * Test for nonpartitioned tables
>   * Test for partitioned tables
>   * Test for partitioned tables with nonstandard partition locations



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16288) Add blobstore tests for ORC and RCFILE file formats

2017-03-24 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940692#comment-15940692
 ] 

Thomas Poepping commented on HIVE-16288:


comments.q failure can't possibly be related to this patch.

[~spena] [~stakiar] [~juanrh] can you take a look? Do we still have a "flaky 
tests" umbrella Jira?

> Add blobstore tests for ORC and RCFILE file formats
> ---
>
> Key: HIVE-16288
> URL: https://issues.apache.org/jira/browse/HIVE-16288
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.1.1
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-16288.patch
>
>
> This patch adds four tests each for ORC and RCFILE when running against 
> blobstore filesystems:
>   * Test for bucketed tables
>   * Test for nonpartitioned tables
>   * Test for partitioned tables
>   * Test for partitioned tables with nonstandard partition locations



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16288) Add blobstore tests for ORC and RCFILE file formats

2017-03-23 Thread Thomas Poepping (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-16288:
---
Attachment: HIVE-16288.patch

Attached patch

> Add blobstore tests for ORC and RCFILE file formats
> ---
>
> Key: HIVE-16288
> URL: https://issues.apache.org/jira/browse/HIVE-16288
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.1.1
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-16288.patch
>
>
> This patch adds four tests each for ORC and RCFILE when running against 
> blobstore filesystems:
>   * Test for bucketed tables
>   * Test for nonpartitioned tables
>   * Test for partitioned tables
>   * Test for partitioned tables with nonstandard partition locations



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16288) Add blobstore tests for ORC and RCFILE file formats

2017-03-23 Thread Thomas Poepping (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-16288:
---
Status: Patch Available  (was: Open)

> Add blobstore tests for ORC and RCFILE file formats
> ---
>
> Key: HIVE-16288
> URL: https://issues.apache.org/jira/browse/HIVE-16288
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.1.1
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-16288.patch
>
>
> This patch adds four tests each for ORC and RCFILE when running against 
> blobstore filesystems:
>   * Test for bucketed tables
>   * Test for nonpartitioned tables
>   * Test for partitioned tables
>   * Test for partitioned tables with nonstandard partition locations



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16288) Add blobstore tests for ORC and RCFILE file formats

2017-03-23 Thread Thomas Poepping (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping reassigned HIVE-16288:
--


> Add blobstore tests for ORC and RCFILE file formats
> ---
>
> Key: HIVE-16288
> URL: https://issues.apache.org/jira/browse/HIVE-16288
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.1.1
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>
> This patch adds four tests each for ORC and RCFILE when running against 
> blobstore filesystems:
>   * Test for bucketed tables
>   * Test for nonpartitioned tables
>   * Test for partitioned tables
>   * Test for partitioned tables with nonstandard partition locations



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-14848) S3 creds added to a hidden list by HIVE-14588 are not working on MR jobs

2017-03-09 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15903699#comment-15903699
 ] 

Thomas Poepping commented on HIVE-14848:


Good point [~spena], doesn't make sense to have this before Hadoop 2.9. Here's 
the issue I was facing -- when running against "real S3" using the Tez 
execution engine, the credentials passed via hive-site.xml are not passed to 
individual Tez workers, which fails the tests. There are ways around this, and 
myself and my team have successfully worked around it.

Honestly, if nobody else has any issue, I'm happy to wait on this for Hadoop 
2.9.0 as well. I'll try to track on my end so we don't drop this issue by 
accident.

> S3 creds added to a hidden list by HIVE-14588 are not working on MR jobs
> 
>
> Key: HIVE-14848
> URL: https://issues.apache.org/jira/browse/HIVE-14848
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-14848.1.patch, HIVE-14848.1.patch
>
>
> When S3 credentials are included in hive-site.xml, then MR jobs that need to 
> read
> data from S3 cannot use them because S3 values are stripped from the Job 
> configuration
> object before submitting the MR job.
> {noformat}
> @Override
> public void initialize(HiveConf conf, QueryPlan queryPlan, DriverContext 
> driverContext) {
>   ...
>   conf.stripHiddenConfigurations(job);
>   this.jobExecHelper = new HadoopJobExecHelper(job, console, this, this);
> }
> {noformat}
> A nice to have (available on hadoop 2.9.0) is an MR 
> {{mapreduce.job.redacted-properties}} that can be used to hide this list on 
> the MR side (such as history server UI) to allow MR run the job without 
> issues.
> UPDATE:
> Change the call to stripHiddenConfigurations() in 
> ql/exec/tez/DagUtils.createConfiguration(), because this is currently broken 
> for running hive-blobstore suite against Tez



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15952) Add blobstore integration test for CREATE LIKE

2017-03-09 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15903667#comment-15903667
 ] 

Thomas Poepping commented on HIVE-15952:


Thanks Sergio, appreciate the help

> Add blobstore integration test for CREATE LIKE
> --
>
> Key: HIVE-15952
> URL: https://issues.apache.org/jira/browse/HIVE-15952
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Reporter: Juan Rodríguez Hortalá
>Assignee: Juan Rodríguez Hortalá
> Fix For: 2.2.0
>
> Attachments: HIVE-15952.patch
>
>
> This patch adds a new positive test for the integration with blobstores. The 
> test checks that we can create an external table with LIKE of an existing 
> table, and then drop the new table without affecting the original, with both 
> tables located in a blobstore.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15952) Add blobstore integration test for CREATE LIKE

2017-03-08 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902103#comment-15902103
 ] 

Thomas Poepping commented on HIVE-15952:


hey [~spena] it looks like on the commit, the patch might have been a little 
weird, it was committed outside of the itests directory:

https://github.com/apache/hive/commit/47d06fe0064536e194cb35d5d23f0572a398059d

master on github has a hive-blobstore/ directory in the project root, that 
should be under the itests/ directory. Can we fix this?

> Add blobstore integration test for CREATE LIKE
> --
>
> Key: HIVE-15952
> URL: https://issues.apache.org/jira/browse/HIVE-15952
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Reporter: Juan Rodríguez Hortalá
>Assignee: Juan Rodríguez Hortalá
> Fix For: 2.2.0
>
> Attachments: HIVE-15952.patch
>
>
> This patch adds a new positive test for the integration with blobstores. The 
> test checks that we can create an external table with LIKE of an existing 
> table, and then drop the new table without affecting the original, with both 
> tables located in a blobstore.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-14848) S3 creds added to a hidden list by HIVE-14588 are not working on MR jobs

2017-03-08 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902075#comment-15902075
 ] 

Thomas Poepping commented on HIVE-14848:


Have we completely dropped this patch?

> S3 creds added to a hidden list by HIVE-14588 are not working on MR jobs
> 
>
> Key: HIVE-14848
> URL: https://issues.apache.org/jira/browse/HIVE-14848
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-14848.1.patch, HIVE-14848.1.patch
>
>
> When S3 credentials are included in hive-site.xml, then MR jobs that need to 
> read
> data from S3 cannot use them because S3 values are stripped from the Job 
> configuration
> object before submitting the MR job.
> {noformat}
> @Override
> public void initialize(HiveConf conf, QueryPlan queryPlan, DriverContext 
> driverContext) {
>   ...
>   conf.stripHiddenConfigurations(job);
>   this.jobExecHelper = new HadoopJobExecHelper(job, console, this, this);
> }
> {noformat}
> A nice to have (available on hadoop 2.9.0) is an MR 
> {{mapreduce.job.redacted-properties}} that can be used to hide this list on 
> the MR side (such as history server UI) to allow MR run the job without 
> issues.
> UPDATE:
> Change the call to stripHiddenConfigurations() in 
> ql/exec/tez/DagUtils.createConfiguration(), because this is currently broken 
> for running hive-blobstore suite against Tez



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15916) Add blobstore tests for CTAS

2017-02-28 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888785#comment-15888785
 ] 

Thomas Poepping commented on HIVE-15916:


Took a look, non-binding +1

> Add blobstore tests for CTAS
> 
>
> Key: HIVE-15916
> URL: https://issues.apache.org/jira/browse/HIVE-15916
> Project: Hive
>  Issue Type: Improvement
>  Components: Test
>Affects Versions: 2.1.1
>Reporter: Juan Rodríguez Hortalá
>Assignee: Juan Rodríguez Hortalá
> Attachments: HIVE-15916.patch
>
>
> This patch covers 3 tests checking CTAS operations against blobstore 
> filesystems. The tests check we can create a table with a CTAS statement from 
> another table, for the source-target combinations blobtore-blobstore, 
> blobstore-hdfs, hdfs-blobstore, and for two target tables, one in the same 
> default database as  the source, and another in a new database. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15881) Use new thread count variable name instead of mapred.dfsclient.parallelism.max

2017-02-21 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876900#comment-15876900
 ] 

Thomas Poepping commented on HIVE-15881:


Hey [~spena], updated the RB. Just one question, otherwise non-binding +1

> Use new thread count variable name instead of mapred.dfsclient.parallelism.max
> --
>
> Key: HIVE-15881
> URL: https://issues.apache.org/jira/browse/HIVE-15881
> Project: Hive
>  Issue Type: Task
>  Components: Query Planning
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Minor
> Attachments: HIVE-15881.1.patch, HIVE-15881.2.patch
>
>
> The Utilities class has two methods, {{getInputSummary}} and 
> {{getInputPaths}}, that use the variable {{mapred.dfsclient.parallelism.max}} 
> to get the summary of a list of input locations in parallel. These methods 
> are Hive related, but the variable name does not look it is specific for Hive.
> Also, the above variable is not on HiveConf nor used anywhere else. I just 
> found a reference on the Hadoop MR1 code.
> I'd like to propose the deprecation of {{mapred.dfsclient.parallelism.max}}, 
> and use a different variable name, such as 
> {{hive.get.input.listing.num.threads}}, that reflects the intention of the 
> variable. The removal of the old variable might happen on Hive 3.x



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-15881) Use new thread count variable name instead of mapred.dfsclient.parallelism.max

2017-02-21 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876900#comment-15876900
 ] 

Thomas Poepping edited comment on HIVE-15881 at 2/21/17 10:42 PM:
--

Hey [~spena], updated the RB. Just one question, otherwise non-binding +1 
pending QA 


was (Author: poeppt):
Hey [~spena], updated the RB. Just one question, otherwise non-binding +1

> Use new thread count variable name instead of mapred.dfsclient.parallelism.max
> --
>
> Key: HIVE-15881
> URL: https://issues.apache.org/jira/browse/HIVE-15881
> Project: Hive
>  Issue Type: Task
>  Components: Query Planning
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Minor
> Attachments: HIVE-15881.1.patch, HIVE-15881.2.patch
>
>
> The Utilities class has two methods, {{getInputSummary}} and 
> {{getInputPaths}}, that use the variable {{mapred.dfsclient.parallelism.max}} 
> to get the summary of a list of input locations in parallel. These methods 
> are Hive related, but the variable name does not look it is specific for Hive.
> Also, the above variable is not on HiveConf nor used anywhere else. I just 
> found a reference on the Hadoop MR1 code.
> I'd like to propose the deprecation of {{mapred.dfsclient.parallelism.max}}, 
> and use a different variable name, such as 
> {{hive.get.input.listing.num.threads}}, that reflects the intention of the 
> variable. The removal of the old variable might happen on Hive 3.x



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15867) Add blobstore tests for import/export

2017-02-21 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876786#comment-15876786
 ] 

Thomas Poepping commented on HIVE-15867:


Hi [~stakiar], yes, still on the radar. My colleague is working on a patch now, 
once it passes our internal review he'll attach it here. Sorry for the wait

> Add blobstore tests for import/export
> -
>
> Key: HIVE-15867
> URL: https://issues.apache.org/jira/browse/HIVE-15867
> Project: Hive
>  Issue Type: Bug
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>
> This patch covers ten separate tests testing import and export operations 
> running against blobstore filesystems:
> * Import addpartition
> ** blobstore -> file
> ** file -> blobstore
> ** blobstore -> blobstore
> ** blobstore -> hdfs
> * import/export
> ** blobstore -> file
> ** file -> blobstore
> ** blobstore -> blobstore (partitioned and non-partitioned)
> ** blobstore -> HDFS (partitioned and non-partitioned)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15852) Tablesampling on Tez in low-record case throws ArrayIndexOutOfBoundsException

2017-02-13 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864742#comment-15864742
 ] 

Thomas Poepping commented on HIVE-15852:


[~ashutoshc] Yeah, I think that solution makes the most sense. Even if there 
are some hidden dependencies in other projects, as you say, there is no 
interface contract and so those assumptions should not even be made. 

I will take a look into how tablesampling can be improved. Hopefully the fix is 
not too wide.

> Tablesampling on Tez in low-record case throws ArrayIndexOutOfBoundsException
> -
>
> Key: HIVE-15852
> URL: https://issues.apache.org/jira/browse/HIVE-15852
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 2.1.1
>Reporter: Thomas Poepping
>
> Due to HIVE-13040 ( https://issues.apache.org/jira/browse/HIVE-13040 ), which 
> doesn't create empty files to represent empty buckets when Hive is on Tez, a 
> couple things are broken.
> First of all, if there are empty buckets (which is possible with large 
> datasets in the partitioned-bucketed case), tablesampling will not work if 
> you're referencing a bucket number higher than the number of files.
> e.g. In some partition 'p', there are three rows. The table 't' is clustered 
> into ten buckets. With maximal hashing, only three bucket files will be 
> created. If we do select * from t tablesample (bucket x out of 10) where 
>  (where x > 3), an ArrayIndexOutOfBoundsException will be 
> thrown because Hive assumes there are only three buckets.
> Second, other applications (such as Pig) may be making assumptions about the 
> number of files equaling the number of buckets.
> Possible fixes:
> * Revert HIVE-13040
> * Change how tablesampling is implemented to accept possibility that number 
> of files != number of buckets
> ** Would require coordination across projects to change assumptions
> Things to consider:
> * what performance gains are there from not creating empty files?
> * if the gains are large, are we willing to lose them? (by reverting 
> HIVE-13040)
> * _how else can we avoid creating unnecessary files, while still maintaining 
> invariants other applications expect?_



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15881) Use new thread count variable name instead of mapred.dfsclient.parallelism.max

2017-02-13 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864718#comment-15864718
 ] 

Thomas Poepping commented on HIVE-15881:


[~spena] Good point. To take into account everyone's ideas, maybe something 
like {{hive.exec.input.listing.max.threads}}? It's tough, but I'm not sure 
there will be one obvious best answer, I think each one will have pros and cons 
compared to any other. 

Agree with [~yalovyyi]'s suggestion as well. What would we want default to be? 
Looking through {{HiveConf}}, the different {{*.max.threads}} values are:
* {{METASTORESERVERMAXTHREADS("hive.metastore.server.max.threads", 1000,}}
* {{HIVE_SERVER2_WEBUI_MAX_THREADS("hive.server2.webui.max.threads", 50, "The 
max HiveServer2 WebUI threads"),}}
* 
{{LLAP_DAEMON_AM_REPORTER_MAX_THREADS("hive.llap.daemon.am-reporter.max.threads",
 4,}}

So.. all over the place. How would we decide on a suitable default? Or maybe, 
the default is "as many as possible" and it can be lowered by users themselves?

> Use new thread count variable name instead of mapred.dfsclient.parallelism.max
> --
>
> Key: HIVE-15881
> URL: https://issues.apache.org/jira/browse/HIVE-15881
> Project: Hive
>  Issue Type: Task
>  Components: Query Planning
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Minor
>
> The Utilities class has two methods, {{getInputSummary}} and 
> {{getInputPaths}}, that use the variable {{mapred.dfsclient.parallelism.max}} 
> to get the summary of a list of input locations in parallel. These methods 
> are Hive related, but the variable name does not look it is specific for Hive.
> Also, the above variable is not on HiveConf nor used anywhere else. I just 
> found a reference on the Hadoop MR1 code.
> I'd like to propose the deprecation of {{mapred.dfsclient.parallelism.max}}, 
> and use a different variable name, such as 
> {{hive.get.input.listing.num.threads}}, that reflects the intention of the 
> variable. The removal of the old variable might happen on Hive 3.x



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15895) Use HDFS for stats collection temp dir on blob storage

2017-02-13 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864703#comment-15864703
 ] 

Thomas Poepping commented on HIVE-15895:


Thanks [~spena], but I specifically was talking about another use of 
{{getExtTmpPathRelTo()}} in {{SemanticAnalyzer}}, 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L2207

Happy to have that be addressed elsewhere though, as it could be considered out 
of scope for this JIRA. 

+1 from me as well

> Use HDFS for stats collection temp dir on blob storage
> --
>
> Key: HIVE-15895
> URL: https://issues.apache.org/jira/browse/HIVE-15895
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Statistics
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-15895.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15895) Use HDFS for stats collection temp dir on blob storage

2017-02-13 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864561#comment-15864561
 ] 

Thomas Poepping commented on HIVE-15895:


It looks like a similar external tmp path is used in {{getMetadata}} in the 
{{SemanticAnalyzer}}. Just from briefly looking, I can't find any reason why we 
wouldn't also want to have the tmp path be on HDFS for this as well. Do we want 
to change this?

> Use HDFS for stats collection temp dir on blob storage
> --
>
> Key: HIVE-15895
> URL: https://issues.apache.org/jira/browse/HIVE-15895
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Statistics
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-15895.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15881) Use new thread count variable name instead of mapred.dfsclient.parallelism.max

2017-02-13 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864325#comment-15864325
 ] 

Thomas Poepping commented on HIVE-15881:


Spent a little more time looking into this, found the same thing as Sahil. I 
would like to suggest some edits:

I would change the naming of the variable. As I said above, 
{{hive.get.input.listing.num.threads}} isn't exactly clear, and because this is 
used for two things ({{getInputSummary}} and {{getInputPaths}}) I would like to 
propose a different way of doing this.

* Two configuration values, one for each usage:
** {{hive.exec.input.paths.num.threads}} for {{getInputPaths}}
** {{hive.exec.input.summary.num.threads}} for {{getInputSummary}}
* Bonus is that these follow existing {{HiveConf}} patterns

Other questions that should be asked:
* what is the default value? 
** I'm looking through {{HiveConf}}, but not seeing any other config value that 
might suffice. Maybe we just use 1?

Also, I'm a fan of '0' meaning 'maximum number allowable' with these types of 
config values, but that can be up for discussion. We would need a way to figure 
out what "max allowable" actually _means_, either through another configuration 
value or based on the OS structure? I don't know.

> Use new thread count variable name instead of mapred.dfsclient.parallelism.max
> --
>
> Key: HIVE-15881
> URL: https://issues.apache.org/jira/browse/HIVE-15881
> Project: Hive
>  Issue Type: Task
>  Components: Query Planning
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Minor
>
> The Utilities class has two methods, {{getInputSummary}} and 
> {{getInputPaths}}, that use the variable {{mapred.dfsclient.parallelism.max}} 
> to get the summary of a list of input locations in parallel. These methods 
> are Hive related, but the variable name does not look it is specific for Hive.
> Also, the above variable is not on HiveConf nor used anywhere else. I just 
> found a reference on the Hadoop MR1 code.
> I'd like to propose the deprecation of {{mapred.dfsclient.parallelism.max}}, 
> and use a different variable name, such as 
> {{hive.get.input.listing.num.threads}}, that reflects the intention of the 
> variable. The removal of the old variable might happen on Hive 3.x



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15881) Use new thread count variable name instead of mapred.dfsclient.parallelism.max

2017-02-10 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861995#comment-15861995
 ] 

Thomas Poepping commented on HIVE-15881:


>From what I can tell, saying
bq. These methods are Hive related
isn't exactly true. 

I agree that the configuration value name is confusing, mainly because of the 
inclusion of {{mapred}} (we support Tez too, right? ;) ). But it seems like the 
dfsclient (HDFS, S3a, etc) should actually have some level of control over the 
number of threads being used to access it. What if we have some {{FileSystem}} 
implementation that, for some reason, is non-threadsafe? It should be able to 
limit threads. Renaming the config to something like 
{{hive.get.input.listing.num.threads}} can hide what's _really_ happening here, 
which would be the dfsclient controlling its maximum allowable number of 
parallel accessors on its own.

That being said, I can't find a link in open source Hadoop to this 
configuration value. Can you link it here? If it is actually only used by Hive 
across the entire Hadoop ecosystem, then I have no problem doing what you 
suggest. I just have a hard time believing that's the case.

> Use new thread count variable name instead of mapred.dfsclient.parallelism.max
> --
>
> Key: HIVE-15881
> URL: https://issues.apache.org/jira/browse/HIVE-15881
> Project: Hive
>  Issue Type: Task
>  Components: Query Planning
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Minor
>
> The Utilities class has two methods, {{getInputSummary}} and 
> {{getInputPaths}}, that use the variable {{mapred.dfsclient.parallelism.max}} 
> to get the summary of a list of input locations in parallel. These methods 
> are Hive related, but the variable name does not look it is specific for Hive.
> Also, the above variable is not on HiveConf nor used anywhere else. I just 
> found a reference on the Hadoop MR1 code.
> I'd like to propose the deprecation of {{mapred.dfsclient.parallelism.max}}, 
> and use a different variable name, such as 
> {{hive.get.input.listing.num.threads}}, that reflects the intention of the 
> variable. The removal of the old variable might happen on Hive 3.x



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15856) Hive export/import (hive.exim.uri.scheme.whitelist) to support s3a

2017-02-09 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15860419#comment-15860419
 ] 

Thomas Poepping commented on HIVE-15856:


The way that we will likely be doing this is with multiple different tests -- 
one for each of these cases. But in essence, that's what we have as well :)

> Hive export/import (hive.exim.uri.scheme.whitelist) to support s3a
> --
>
> Key: HIVE-15856
> URL: https://issues.apache.org/jira/browse/HIVE-15856
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> Add support for export / import operations on S3.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15856) Hive export/import (hive.exim.uri.scheme.whitelist) to support s3a

2017-02-09 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15860325#comment-15860325
 ] 

Thomas Poepping commented on HIVE-15856:


Hey [~stakiar], I've actually been working on import/export tests for 
blobstore, just created HIVE-15867 to track

Suggest we make this JIRA dependent on that one

> Hive export/import (hive.exim.uri.scheme.whitelist) to support s3a
> --
>
> Key: HIVE-15856
> URL: https://issues.apache.org/jira/browse/HIVE-15856
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> Add support for export / import operations on S3.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-15867) Add blobstore tests for import/export

2017-02-09 Thread Thomas Poepping (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping reassigned HIVE-15867:
--


> Add blobstore tests for import/export
> -
>
> Key: HIVE-15867
> URL: https://issues.apache.org/jira/browse/HIVE-15867
> Project: Hive
>  Issue Type: Bug
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>
> This patch covers ten separate tests testing import and export operations 
> running against blobstore filesystems:
> * Import addpartition
> ** blobstore -> file
> ** file -> blobstore
> ** blobstore -> blobstore
> ** blobstore -> hdfs
> * import/export
> ** blobstore -> file
> ** file -> blobstore
> ** blobstore -> blobstore (partitioned and non-partitioned)
> ** blobstore -> HDFS (partitioned and non-partitioned)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15852) Tablesampling on Tez in low-record case throws ArrayIndexOutOfBoundsException

2017-02-08 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858485#comment-15858485
 ] 

Thomas Poepping commented on HIVE-15852:


[~ashutoshc] Ashutosh, sorry it took so long to open this Jira issue. Here's a 
summary of what I've found so far. While it's the easiest solution, I really 
don't want to revert HIVE-13040, I think the performance gains can be large, 
especially in the blobstore (s3a or azure) case, as empty file creation is far 
from free.

Happy to hear suggestions, and start a conversation.

> Tablesampling on Tez in low-record case throws ArrayIndexOutOfBoundsException
> -
>
> Key: HIVE-15852
> URL: https://issues.apache.org/jira/browse/HIVE-15852
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 2.1.1
>Reporter: Thomas Poepping
>
> Due to HIVE-13040 ( https://issues.apache.org/jira/browse/HIVE-13040 ), which 
> doesn't create empty files to represent empty buckets when Hive is on Tez, a 
> couple things are broken.
> First of all, if there are empty buckets (which is possible with large 
> datasets in the partitioned-bucketed case), tablesampling will not work if 
> you're referencing a bucket number higher than the number of files.
> e.g. In some partition 'p', there are three rows. The table 't' is clustered 
> into ten buckets. With maximal hashing, only three bucket files will be 
> created. If we do select * from t tablesample (bucket x out of 10) where 
>  (where x > 3), an ArrayIndexOutOfBoundsException will be 
> thrown because Hive assumes there are only three buckets.
> Second, other applications (such as Pig) may be making assumptions about the 
> number of files equaling the number of buckets.
> Possible fixes:
> * Revert HIVE-13040
> * Change how tablesampling is implemented to accept possibility that number 
> of files != number of buckets
> ** Would require coordination across projects to change assumptions
> Things to consider:
> * what performance gains are there from not creating empty files?
> * if the gains are large, are we willing to lose them? (by reverting 
> HIVE-13040)
> * _how else can we avoid creating unnecessary files, while still maintaining 
> invariants other applications expect?_



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-14083) ALTER INDEX in Tez causes NullPointerException

2017-01-19 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830811#comment-15830811
 ] 

Thomas Poepping edited comment on HIVE-14083 at 1/20/17 1:01 AM:
-

Sorry to jump on this so late, but why are we okay with accepting that indexing 
just doesn't work? Are there no follow-up Jira issues to attack the problem and 
work on supporting indexes with Tez?


was (Author: poeppt):
Sorry to jump on this so late, buy why are we okay with accepting that indexing 
just doesn't work? Are there no follow-up Jira issues to attack the problem and 
work on supporting indexes with Tez?

> ALTER INDEX in Tez causes NullPointerException
> --
>
> Key: HIVE-14083
> URL: https://issues.apache.org/jira/browse/HIVE-14083
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Minor
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14083.patch
>
>
> ALTER INDEX causes a NullPointerException when run under TEZ execution 
> engine. Query runs without issue when submitted using MR execution mode.
> To reproduce:
> 1. CREATE INDEX sample_08_index ON TABLE sample_08 (code) AS 'COMPACT' WITH 
> DEFERRED REBUILD; 
> 2. ALTER INDEX sample_08_index ON sample_08 REBUILD; 
> *Stacktrace from Hive 1.2.1*
> {code:java}
> ERROR : Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1460577396252_0005_1_00, diagnostics=[Task failed, 
> taskId=task_1460577396252_0005_1_00_00, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Failure while running task:java.lang.RuntimeException: 
> java.lang.RuntimeException: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: java.lang.NullPointerException
>   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196)
>   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:135)
>   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:101)
>   at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:149)
>   at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:80)
>   at 
> org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:650)
>   at 
> org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:621)
>   at 
> org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
>   at 
> org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:390)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:128)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147)
>   ... 14 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:269)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:233)
>   at 
> 

[jira] [Commented] (HIVE-14083) ALTER INDEX in Tez causes NullPointerException

2017-01-19 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830811#comment-15830811
 ] 

Thomas Poepping commented on HIVE-14083:


Sorry to jump on this so late, buy why are we okay with accepting that indexing 
just doesn't work? Are there no follow-up Jira issues to attack the problem and 
work on supporting indexes with Tez?

> ALTER INDEX in Tez causes NullPointerException
> --
>
> Key: HIVE-14083
> URL: https://issues.apache.org/jira/browse/HIVE-14083
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Minor
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14083.patch
>
>
> ALTER INDEX causes a NullPointerException when run under TEZ execution 
> engine. Query runs without issue when submitted using MR execution mode.
> To reproduce:
> 1. CREATE INDEX sample_08_index ON TABLE sample_08 (code) AS 'COMPACT' WITH 
> DEFERRED REBUILD; 
> 2. ALTER INDEX sample_08_index ON sample_08 REBUILD; 
> *Stacktrace from Hive 1.2.1*
> {code:java}
> ERROR : Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1460577396252_0005_1_00, diagnostics=[Task failed, 
> taskId=task_1460577396252_0005_1_00_00, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Failure while running task:java.lang.RuntimeException: 
> java.lang.RuntimeException: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: java.lang.NullPointerException
>   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196)
>   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:135)
>   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:101)
>   at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:149)
>   at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:80)
>   at 
> org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:650)
>   at 
> org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:621)
>   at 
> org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
>   at 
> org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:390)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:128)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147)
>   ... 14 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:269)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:233)
>   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193)
>   ... 25 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15667) TestBlobstoreCliDriver tests are failing due to output differences

2017-01-19 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830703#comment-15830703
 ] 

Thomas Poepping commented on HIVE-15667:


LGTM, pending Jenkins run. Nonbinding +1

For those curious, like I was, it looks like HIVE-15297 made the change 
requiring this Jira.

> TestBlobstoreCliDriver tests are failing due to output differences
> --
>
> Key: HIVE-15667
> URL: https://issues.apache.org/jira/browse/HIVE-15667
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-15667.1.patch
>
>
> All itests/hive-blobstore are failing and their .q.out files need to be 
> updated.
> CC: [~poeppt]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15667) TestBlobstoreCliDriver tests are failing due to output differences

2017-01-19 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830312#comment-15830312
 ] 

Thomas Poepping commented on HIVE-15667:


Thanks for this Sergio, any chance you can open a quick RB? Patch is a little 
hard to read as is

> TestBlobstoreCliDriver tests are failing due to output differences
> --
>
> Key: HIVE-15667
> URL: https://issues.apache.org/jira/browse/HIVE-15667
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-15667.1.patch
>
>
> All itests/hive-blobstore are failing and their .q.out files need to be 
> updated.
> CC: [~poeppt]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15546) Optimize Utilities.getInputPaths() so each listStatus of a partition is done in parallel

2017-01-18 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15829016#comment-15829016
 ] 

Thomas Poepping commented on HIVE-15546:


I see how that could make sense -- just have the executor treat the empty 
partition as it would any other, by getting all files and parsing. It's just in 
the case of an empty partition, an empty file is used.

Seems fine to me. I also took a look at the RB, no problems there. Non-binding 
+1 from me.

> Optimize Utilities.getInputPaths() so each listStatus of a partition is done 
> in parallel
> 
>
> Key: HIVE-15546
> URL: https://issues.apache.org/jira/browse/HIVE-15546
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15546.1.patch, HIVE-15546.2.patch, 
> HIVE-15546.3.patch, HIVE-15546.4.patch, HIVE-15546.5.patch
>
>
> When running on blobstores (like S3) where metadata operations (like 
> listStatus) are costly, Utilities.getInputPaths() can add significant 
> overhead when setting up the input paths for an MR / Spark / Tez job.
> The method performs a listStatus on all input paths in order to check if the 
> path is empty. If the path is empty, a dummy file is created for the given 
> partition. This is all done sequentially. This can be really slow when there 
> are a lot of empty partitions. Even when all partitions have input data, this 
> can take a long time.
> We should either:
> (1) Just remove the logic to check if each input path is empty, and handle 
> any edge cases accordingly.
> (2) Multi-thread the listStatus calls



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15546) Optimize Utilities.getInputPaths() so each listStatus of a partition is done in parallel

2017-01-18 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828994#comment-15828994
 ] 

Thomas Poepping commented on HIVE-15546:


What is the benefit of using dummy files in empty partitions?

> Optimize Utilities.getInputPaths() so each listStatus of a partition is done 
> in parallel
> 
>
> Key: HIVE-15546
> URL: https://issues.apache.org/jira/browse/HIVE-15546
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15546.1.patch, HIVE-15546.2.patch, 
> HIVE-15546.3.patch, HIVE-15546.4.patch, HIVE-15546.5.patch
>
>
> When running on blobstores (like S3) where metadata operations (like 
> listStatus) are costly, Utilities.getInputPaths() can add significant 
> overhead when setting up the input paths for an MR / Spark / Tez job.
> The method performs a listStatus on all input paths in order to check if the 
> path is empty. If the path is empty, a dummy file is created for the given 
> partition. This is all done sequentially. This can be really slow when there 
> are a lot of empty partitions. Even when all partitions have input data, this 
> can take a long time.
> We should either:
> (1) Just remove the logic to check if each input path is empty, and handle 
> any edge cases accordingly.
> (2) Multi-thread the listStatus calls



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15618) Change hive-blobstore tests to run with Tez by default

2017-01-18 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828672#comment-15828672
 ] 

Thomas Poepping commented on HIVE-15618:


Patch still hasn't been picked up by QA, but I believe I have a better solution 
for this. I'm working on a new patch to upload, will attach a 
HIVE-15618.1.patch when complete

> Change hive-blobstore tests to run with Tez by default
> --
>
> Key: HIVE-15618
> URL: https://issues.apache.org/jira/browse/HIVE-15618
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Affects Versions: 2.2.0
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-15618.patch
>
>
> Ever since the upgrade to Hive 2, Tez has been the default execution engine 
> for Hive. To match that fact, it makes sense to run our tests against Tez, 
> rather than MR. This should more fully validate functionality against what we 
> consider to be Hive defaults.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15576) Fix bug in QTestUtil where lines after a partial mask will not be masked

2017-01-17 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826498#comment-15826498
 ] 

Thomas Poepping commented on HIVE-15576:


Ping, help? [~sershe] I see you watched this issue, do you mind taking a look 
at the patch?

> Fix bug in QTestUtil where lines after a partial mask will not be masked
> 
>
> Key: HIVE-15576
> URL: https://issues.apache.org/jira/browse/HIVE-15576
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Affects Versions: 2.2.0
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-15576.patch
>
>
> If the qfile output of a qtest contains two maskable lines right after one 
> another, where the first contains a partial match candidate, the second line 
> will not be evaluated for masking. This patch fixes that bug by disregarding 
> whether a partial mask was found in the previous line.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15618) Change hive-blobstore tests to run with Tez by default

2017-01-17 Thread Thomas Poepping (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-15618:
---
Status: Open  (was: Patch Available)

> Change hive-blobstore tests to run with Tez by default
> --
>
> Key: HIVE-15618
> URL: https://issues.apache.org/jira/browse/HIVE-15618
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Affects Versions: 2.2.0
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-15618.patch
>
>
> Ever since the upgrade to Hive 2, Tez has been the default execution engine 
> for Hive. To match that fact, it makes sense to run our tests against Tez, 
> rather than MR. This should more fully validate functionality against what we 
> consider to be Hive defaults.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15618) Change hive-blobstore tests to run with Tez by default

2017-01-17 Thread Thomas Poepping (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-15618:
---
Status: Patch Available  (was: Open)

resubmitting patch, because it was never picked up by QA

> Change hive-blobstore tests to run with Tez by default
> --
>
> Key: HIVE-15618
> URL: https://issues.apache.org/jira/browse/HIVE-15618
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Affects Versions: 2.2.0
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-15618.patch
>
>
> Ever since the upgrade to Hive 2, Tez has been the default execution engine 
> for Hive. To match that fact, it makes sense to run our tests against Tez, 
> rather than MR. This should more fully validate functionality against what we 
> consider to be Hive defaults.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15616) Improve contents of qfile test output

2017-01-13 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822210#comment-15822210
 ] 

Thomas Poepping commented on HIVE-15616:


Can you open a reviews.apache.org submission for this change?

> Improve contents of qfile test output
> -
>
> Key: HIVE-15616
> URL: https://issues.apache.org/jira/browse/HIVE-15616
> Project: Hive
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 2.1.1
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>Priority: Minor
> Attachments: HIVE-15616.patch
>
>
> The current output of the failed qtests has a less than ideal signal to noise 
> ratio.
> We have duplicated stack traces and messages between the error message/stack 
> trace/error out.
> For diff errors the actual difference is missing from the error message and 
> can be found only in the standard out.
> I would like to simplify this output by removing duplications, moving 
> relevant information to the top.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15618) Change hive-blobstore tests to run with Tez by default

2017-01-13 Thread Thomas Poepping (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-15618:
---
Status: Patch Available  (was: Open)

> Change hive-blobstore tests to run with Tez by default
> --
>
> Key: HIVE-15618
> URL: https://issues.apache.org/jira/browse/HIVE-15618
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Affects Versions: 2.2.0
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-15618.patch
>
>
> Ever since the upgrade to Hive 2, Tez has been the default execution engine 
> for Hive. To match that fact, it makes sense to run our tests against Tez, 
> rather than MR. This should more fully validate functionality against what we 
> consider to be Hive defaults.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15618) Change hive-blobstore tests to run with Tez by default

2017-01-13 Thread Thomas Poepping (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-15618:
---
Attachment: HIVE-15618.patch

> Change hive-blobstore tests to run with Tez by default
> --
>
> Key: HIVE-15618
> URL: https://issues.apache.org/jira/browse/HIVE-15618
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Affects Versions: 2.2.0
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-15618.patch
>
>
> Ever since the upgrade to Hive 2, Tez has been the default execution engine 
> for Hive. To match that fact, it makes sense to run our tests against Tez, 
> rather than MR. This should more fully validate functionality against what we 
> consider to be Hive defaults.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14848) S3 creds added to a hidden list by HIVE-14588 are not working on MR jobs

2017-01-11 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819122#comment-15819122
 ] 

Thomas Poepping commented on HIVE-14848:


Ping?

> S3 creds added to a hidden list by HIVE-14588 are not working on MR jobs
> 
>
> Key: HIVE-14848
> URL: https://issues.apache.org/jira/browse/HIVE-14848
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-14848.1.patch, HIVE-14848.1.patch
>
>
> When S3 credentials are included in hive-site.xml, then MR jobs that need to 
> read
> data from S3 cannot use them because S3 values are stripped from the Job 
> configuration
> object before submitting the MR job.
> {noformat}
> @Override
> public void initialize(HiveConf conf, QueryPlan queryPlan, DriverContext 
> driverContext) {
>   ...
>   conf.stripHiddenConfigurations(job);
>   this.jobExecHelper = new HadoopJobExecHelper(job, console, this, this);
> }
> {noformat}
> A nice to have (available on hadoop 2.9.0) is an MR 
> {{mapreduce.job.redacted-properties}} that can be used to hide this list on 
> the MR side (such as history server UI) to allow MR run the job without 
> issues.
> UPDATE:
> Change the call to stripHiddenConfigurations() in 
> ql/exec/tez/DagUtils.createConfiguration(), because this is currently broken 
> for running hive-blobstore suite against Tez



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15576) Fix bug in QTestUtil where lines after a partial mask will not be masked

2017-01-11 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819087#comment-15819087
 ] 

Thomas Poepping commented on HIVE-15576:


I've determined that all qtest failures are irrelevant:

TestCliDriver:
  case_sensitivity, input_testxpath, and udf_coalesce all fail without my change
TestMiniLlapCliDriver:
  orc_ppd_schema_evol_3a fails without my change
  orc_ppd_basic passes both with and without my change in my environment
TestMiniLlapLocalCliDriver:
  schema_evol_text_vec_part fails without my change
  columnstats_part_coltype passes both with and without my change in my 
environment

Can I get someone to look at this?

> Fix bug in QTestUtil where lines after a partial mask will not be masked
> 
>
> Key: HIVE-15576
> URL: https://issues.apache.org/jira/browse/HIVE-15576
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Affects Versions: 2.2.0
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-15576.patch
>
>
> If the qfile output of a qtest contains two maskable lines right after one 
> another, where the first contains a partial match candidate, the second line 
> will not be evaluated for masking. This patch fixes that bug by disregarding 
> whether a partial mask was found in the previous line.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15576) Fix bug in QTestUtil where lines after a partial mask will not be masked

2017-01-11 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15818885#comment-15818885
 ] 

Thomas Poepping commented on HIVE-15576:


I'll re-run qtests in my local environment to double check these failures.

> Fix bug in QTestUtil where lines after a partial mask will not be masked
> 
>
> Key: HIVE-15576
> URL: https://issues.apache.org/jira/browse/HIVE-15576
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Affects Versions: 2.2.0
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-15576.patch
>
>
> If the qfile output of a qtest contains two maskable lines right after one 
> another, where the first contains a partial match candidate, the second line 
> will not be evaluated for masking. This patch fixes that bug by disregarding 
> whether a partial mask was found in the previous line.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15576) Fix bug in QTestUtil where lines after a partial mask will not be masked

2017-01-10 Thread Thomas Poepping (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-15576:
---
Attachment: HIVE-15576.patch

> Fix bug in QTestUtil where lines after a partial mask will not be masked
> 
>
> Key: HIVE-15576
> URL: https://issues.apache.org/jira/browse/HIVE-15576
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Affects Versions: 2.2.0
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-15576.patch
>
>
> If the qfile output of a qtest contains two maskable lines right after one 
> another, where the first contains a partial match candidate, the second line 
> will not be evaluated for masking. This patch fixes that bug by disregarding 
> whether a partial mask was found in the previous line.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15576) Fix bug in QTestUtil where lines after a partial mask will not be masked

2017-01-10 Thread Thomas Poepping (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-15576:
---
Status: Patch Available  (was: Open)

> Fix bug in QTestUtil where lines after a partial mask will not be masked
> 
>
> Key: HIVE-15576
> URL: https://issues.apache.org/jira/browse/HIVE-15576
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Affects Versions: 2.2.0
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-15576.patch
>
>
> If the qfile output of a qtest contains two maskable lines right after one 
> another, where the first contains a partial match candidate, the second line 
> will not be evaluated for masking. This patch fixes that bug by disregarding 
> whether a partial mask was found in the previous line.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14165) Remove Hive file listing during split computation

2016-12-21 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767035#comment-15767035
 ] 

Thomas Poepping commented on HIVE-14165:


Hi Sahil,

When you update the patch, can you create a new ReviewBoard submission?

WRT the InputFormat issue, my feeling is that we should stray away 
from backwards-incompatible changes. Is there no way we can avoid the 
backwards-incompatible change, but still avoid the unnecessary list? 

I will be able to provide more targeted feedback once the RB submission has 
been updated.

> Remove Hive file listing during split computation
> -
>
> Key: HIVE-14165
> URL: https://issues.apache.org/jira/browse/HIVE-14165
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.0
>Reporter: Abdullah Yousufi
>Assignee: Sahil Takiar
> Attachments: HIVE-14165.02.patch, HIVE-14165.03.patch, 
> HIVE-14165.04.patch, HIVE-14165.05.patch, HIVE-14165.06.patch, 
> HIVE-14165.patch
>
>
> The Hive side listing in FetchOperator.java is unnecessary, since Hadoop's 
> FileInputFormat.java will list the files during split computation anyway to 
> determine their size. One way to remove this is to catch the 
> InvalidInputFormat exception thrown by FileInputFormat#getSplits() on the 
> Hive side instead of doing the file listing beforehand.
> For S3 select queries on partitioned tables, this results in a 2x speedup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-1620) Patch to write directly to S3 from Hive

2016-12-21 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767027#comment-15767027
 ] 

Thomas Poepping commented on HIVE-1620:
---

Hi Sahil,
 
Yes, direct write works well in production. There are definitely some difficult 
design decisions to be made, and as you say, there is no great solution to 
clean up after failure. Some other issues are: self-referencing insert 
overwrite data loss, metadata loss in dynamic partitioning, no good visibility 
of partial results. There are workarounds / best practices for these, though. 
We are happy to engage in conversation about pros and cons.
 
The biggest thing we would like to stress with these implementations is that 
they should be pluggable. The solution should be as generic as possible to 
avoid spaghetti code.
 
We think the best solution is to make this a conversation about the best 
design. We are happy to participate in a community design and implementation, 
drawing on our experience with these types of issues.

> Patch to write directly to S3 from Hive
> ---
>
> Key: HIVE-1620
> URL: https://issues.apache.org/jira/browse/HIVE-1620
> Project: Hive
>  Issue Type: New Feature
>Reporter: Vaibhav Aggarwal
>Assignee: Vaibhav Aggarwal
> Attachments: HIVE-1620.patch
>
>
> We want to submit a patch to Hive which allows user to write files directly 
> to S3.
> This patch allow user to specify an S3 location as the table output location 
> and hence eliminates the need  of copying data from HDFS to S3.
> Users can run Hive queries directly over the data stored in S3.
> This patch helps integrate hive with S3 better and quicker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14848) S3 creds added to a hidden list by HIVE-14588 are not working on MR jobs

2016-11-30 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15709721#comment-15709721
 ] 

Thomas Poepping commented on HIVE-14848:


[~spena] suggest you also change the call to stripHiddenConfigurations() in 
ql/exec/tez/DagUtils.createConfiguration(), because this is currently broken 
for running hive-blobstore suite against Tez.

> S3 creds added to a hidden list by HIVE-14588 are not working on MR jobs
> 
>
> Key: HIVE-14848
> URL: https://issues.apache.org/jira/browse/HIVE-14848
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-14848.1.patch, HIVE-14848.1.patch
>
>
> When S3 credentials are included in hive-site.xml, then MR jobs that need to 
> read
> data from S3 cannot use them because S3 values are stripped from the Job 
> configuration
> object before submitting the MR job.
> {noformat}
> @Override
> public void initialize(HiveConf conf, QueryPlan queryPlan, DriverContext 
> driverContext) {
>   ...
>   conf.stripHiddenConfigurations(job);
>   this.jobExecHelper = new HadoopJobExecHelper(job, console, this, this);
> }
> {noformat}
> A nice to have (available on hadoop 2.9.0) is an MR 
> {{mapreduce.job.redacted-properties}} that can be used to hide this list on 
> the MR side (such as history server UI) to allow MR run the job without 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15266) Edit test output of negative blobstore tests to match HIVE-15226

2016-11-22 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15687679#comment-15687679
 ] 

Thomas Poepping commented on HIVE-15266:


[~spena] [~mohitsabharwal] [~stakiar] Can we take a look? Trivial change

> Edit test output of negative blobstore tests to match HIVE-15226
> 
>
> Key: HIVE-15266
> URL: https://issues.apache.org/jira/browse/HIVE-15266
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-15266.1.patch
>
>
> In HIVE-15226 ( https://issues.apache.org/jira/browse/HIVE-15226 ), blobstore 
> tests were changed to print a different masking pattern for the blobstore 
> path. In that patch, test output was replaced for the clientpositive test ( 
> insert_into.q ), but not for the clientnegative test ( select_dropped_table.q 
> ), causing the negative tests to fail.
> This patch is the result of -Dtest.output.overwrite=true with the 
> clientnegative tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15266) Edit test output of negative blobstore tests to match HIVE-15226

2016-11-22 Thread Thomas Poepping (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-15266:
---
Status: Patch Available  (was: Open)

> Edit test output of negative blobstore tests to match HIVE-15226
> 
>
> Key: HIVE-15266
> URL: https://issues.apache.org/jira/browse/HIVE-15266
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-15266.1.patch
>
>
> In HIVE-15226 ( https://issues.apache.org/jira/browse/HIVE-15226 ), blobstore 
> tests were changed to print a different masking pattern for the blobstore 
> path. In that patch, test output was replaced for the clientpositive test ( 
> insert_into.q ), but not for the clientnegative test ( select_dropped_table.q 
> ), causing the negative tests to fail.
> This patch is the result of -Dtest.output.overwrite=true with the 
> clientnegative tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15266) Edit test output of negative blobstore tests to match HIVE-15226

2016-11-22 Thread Thomas Poepping (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-15266:
---
Attachment: HIVE-15266.1.patch

Attached initial patch

> Edit test output of negative blobstore tests to match HIVE-15226
> 
>
> Key: HIVE-15266
> URL: https://issues.apache.org/jira/browse/HIVE-15266
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-15266.1.patch
>
>
> In HIVE-15226 ( https://issues.apache.org/jira/browse/HIVE-15226 ), blobstore 
> tests were changed to print a different masking pattern for the blobstore 
> path. In that patch, test output was replaced for the clientpositive test ( 
> insert_into.q ), but not for the clientnegative test ( select_dropped_table.q 
> ), causing the negative tests to fail.
> This patch is the result of -Dtest.output.overwrite=true with the 
> clientnegative tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15093) For S3-to-S3 renames, files should be moved individually rather than at a directory level

2016-10-28 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616031#comment-15616031
 ] 

Thomas Poepping commented on HIVE-15093:


Shouldn't this be implemented in s3a? There could be many other instances where 
Hive attempts to rename a directory, and there is much potential in the future 
for this to be implemented wrongly.
In addition, other Hadoop-stack applications may also have a use case for 
blobstore->blobstore renames. Should we place the burden on them to implement 
this themselves?
Also, what if there is some new type of filesystem coming in the future, that 
requires some other implementation of move. Do we want to set the precedent of 
a large tree of if/else based on filesystem?

Seems that the more correct solution would be to allow s3a to be smart about 
seeing that it's a rename from blobstore directory->blobstore directory, and 
use a threadpool in that case.

Thoughts?

> For S3-to-S3 renames, files should be moved individually rather than at a 
> directory level
> -
>
> Key: HIVE-15093
> URL: https://issues.apache.org/jira/browse/HIVE-15093
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 2.1.0
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15093.1.patch
>
>
> Hive's MoveTask uses the Hive.moveFile method to move data within a 
> distributed filesystem as well as blobstore filesystems.
> If the move is done within the same filesystem:
> 1: If the source path is a subdirectory of the destination path, files will 
> be moved one by one using a threapool of workers
> 2: If the source path is not a subdirectory of the destination path, a single 
> rename operation is used to move the entire directory
> The second option may not work well on blobstores such as S3. Renames are not 
> metadata operations and require copying all the data. Client connectors to 
> blobstores may not efficiently rename directories. Worst case, the connector 
> will copy each file one by one, sequentially rather than using a threadpool 
> of workers to copy the data (e.g. HADOOP-13600).
> Hive already has code to rename files using a threadpool of workers, but this 
> only occurs in case number 1.
> This JIRA aims to modify the code so that case 1 is triggered when copying 
> within a blobstore. The focus is on copies within a blobstore because 
> needToCopy will return true if the src and target filesystems are different, 
> in which case a different code path is triggered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14373) Add integration tests for hive on S3

2016-10-12 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15569450#comment-15569450
 ] 

Thomas Poepping commented on HIVE-14373:


Have two +1s on RB. Awaiting precommit tests, then patch should be good to go

> Add integration tests for hive on S3
> 
>
> Key: HIVE-14373
> URL: https://issues.apache.org/jira/browse/HIVE-14373
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Thomas Poepping
> Attachments: HIVE-14373.02.patch, HIVE-14373.03.patch, 
> HIVE-14373.04.patch, HIVE-14373.05.patch, HIVE-14373.06.patch, 
> HIVE-14373.patch
>
>
> With Hive doing improvements to run on S3, it would be ideal to have better 
> integration testing on S3.
> These S3 tests won't be able to be executed by HiveQA because it will need 
> Amazon credentials. We need to write suite based on ideas from the Hadoop 
> project where:
> - an xml file is provided with S3 credentials
> - a committer must run these tests manually to verify it works
> - the xml file should not be part of the commit, and hiveqa should not run 
> these tests.
> https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14373) Add integration tests for hive on S3

2016-10-12 Thread Thomas Poepping (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-14373:
---
Status: Patch Available  (was: Open)

> Add integration tests for hive on S3
> 
>
> Key: HIVE-14373
> URL: https://issues.apache.org/jira/browse/HIVE-14373
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Thomas Poepping
> Attachments: HIVE-14373.02.patch, HIVE-14373.03.patch, 
> HIVE-14373.04.patch, HIVE-14373.05.patch, HIVE-14373.06.patch, 
> HIVE-14373.patch
>
>
> With Hive doing improvements to run on S3, it would be ideal to have better 
> integration testing on S3.
> These S3 tests won't be able to be executed by HiveQA because it will need 
> Amazon credentials. We need to write suite based on ideas from the Hadoop 
> project where:
> - an xml file is provided with S3 credentials
> - a committer must run these tests manually to verify it works
> - the xml file should not be part of the commit, and hiveqa should not run 
> these tests.
> https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14373) Add integration tests for hive on S3

2016-10-12 Thread Thomas Poepping (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-14373:
---
Attachment: HIVE-14373.06.patch

Attach new patch, addressed comments from RB

> Add integration tests for hive on S3
> 
>
> Key: HIVE-14373
> URL: https://issues.apache.org/jira/browse/HIVE-14373
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Thomas Poepping
> Attachments: HIVE-14373.02.patch, HIVE-14373.03.patch, 
> HIVE-14373.04.patch, HIVE-14373.05.patch, HIVE-14373.06.patch, 
> HIVE-14373.patch
>
>
> With Hive doing improvements to run on S3, it would be ideal to have better 
> integration testing on S3.
> These S3 tests won't be able to be executed by HiveQA because it will need 
> Amazon credentials. We need to write suite based on ideas from the Hadoop 
> project where:
> - an xml file is provided with S3 credentials
> - a committer must run these tests manually to verify it works
> - the xml file should not be part of the commit, and hiveqa should not run 
> these tests.
> https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14373) Add integration tests for hive on S3

2016-10-12 Thread Thomas Poepping (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-14373:
---
Status: Open  (was: Patch Available)

> Add integration tests for hive on S3
> 
>
> Key: HIVE-14373
> URL: https://issues.apache.org/jira/browse/HIVE-14373
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Thomas Poepping
> Attachments: HIVE-14373.02.patch, HIVE-14373.03.patch, 
> HIVE-14373.04.patch, HIVE-14373.05.patch, HIVE-14373.06.patch, 
> HIVE-14373.patch
>
>
> With Hive doing improvements to run on S3, it would be ideal to have better 
> integration testing on S3.
> These S3 tests won't be able to be executed by HiveQA because it will need 
> Amazon credentials. We need to write suite based on ideas from the Hadoop 
> project where:
> - an xml file is provided with S3 credentials
> - a committer must run these tests manually to verify it works
> - the xml file should not be part of the commit, and hiveqa should not run 
> these tests.
> https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14373) Add integration tests for hive on S3

2016-10-12 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15569399#comment-15569399
 ] 

Thomas Poepping commented on HIVE-14373:


[~spena] I responded to your comments on RB. I would like to open a separate 
JIRA after the submission of this one that will change the qtests to run on Tez 
by default, rather than running on MR. What do you think?

> Add integration tests for hive on S3
> 
>
> Key: HIVE-14373
> URL: https://issues.apache.org/jira/browse/HIVE-14373
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Thomas Poepping
> Attachments: HIVE-14373.02.patch, HIVE-14373.03.patch, 
> HIVE-14373.04.patch, HIVE-14373.05.patch, HIVE-14373.patch
>
>
> With Hive doing improvements to run on S3, it would be ideal to have better 
> integration testing on S3.
> These S3 tests won't be able to be executed by HiveQA because it will need 
> Amazon credentials. We need to write suite based on ideas from the Hadoop 
> project where:
> - an xml file is provided with S3 credentials
> - a committer must run these tests manually to verify it works
> - the xml file should not be part of the commit, and hiveqa should not run 
> these tests.
> https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14373) Add integration tests for hive on S3

2016-09-22 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514387#comment-15514387
 ] 

Thomas Poepping commented on HIVE-14373:


It wouldn't make sense for these tests to be related, as I am touching almost 
no existing code. Can I get eyes on this?

> Add integration tests for hive on S3
> 
>
> Key: HIVE-14373
> URL: https://issues.apache.org/jira/browse/HIVE-14373
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Thomas Poepping
> Attachments: HIVE-14373.02.patch, HIVE-14373.03.patch, 
> HIVE-14373.04.patch, HIVE-14373.05.patch, HIVE-14373.patch
>
>
> With Hive doing improvements to run on S3, it would be ideal to have better 
> integration testing on S3.
> These S3 tests won't be able to be executed by HiveQA because it will need 
> Amazon credentials. We need to write suite based on ideas from the Hadoop 
> project where:
> - an xml file is provided with S3 credentials
> - a committer must run these tests manually to verify it works
> - the xml file should not be part of the commit, and hiveqa should not run 
> these tests.
> https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14373) Add integration tests for hive on S3

2016-09-22 Thread Thomas Poepping (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-14373:
---
Attachment: (was: HIVE-14373.05.patch)

> Add integration tests for hive on S3
> 
>
> Key: HIVE-14373
> URL: https://issues.apache.org/jira/browse/HIVE-14373
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Thomas Poepping
> Attachments: HIVE-14373.02.patch, HIVE-14373.03.patch, 
> HIVE-14373.04.patch, HIVE-14373.05.patch, HIVE-14373.patch
>
>
> With Hive doing improvements to run on S3, it would be ideal to have better 
> integration testing on S3.
> These S3 tests won't be able to be executed by HiveQA because it will need 
> Amazon credentials. We need to write suite based on ideas from the Hadoop 
> project where:
> - an xml file is provided with S3 credentials
> - a committer must run these tests manually to verify it works
> - the xml file should not be part of the commit, and hiveqa should not run 
> these tests.
> https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14373) Add integration tests for hive on S3

2016-09-22 Thread Thomas Poepping (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-14373:
---
Status: Patch Available  (was: In Progress)

> Add integration tests for hive on S3
> 
>
> Key: HIVE-14373
> URL: https://issues.apache.org/jira/browse/HIVE-14373
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Thomas Poepping
> Attachments: HIVE-14373.02.patch, HIVE-14373.03.patch, 
> HIVE-14373.04.patch, HIVE-14373.05.patch, HIVE-14373.05.patch, 
> HIVE-14373.patch
>
>
> With Hive doing improvements to run on S3, it would be ideal to have better 
> integration testing on S3.
> These S3 tests won't be able to be executed by HiveQA because it will need 
> Amazon credentials. We need to write suite based on ideas from the Hadoop 
> project where:
> - an xml file is provided with S3 credentials
> - a committer must run these tests manually to verify it works
> - the xml file should not be part of the commit, and hiveqa should not run 
> these tests.
> https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14373) Add integration tests for hive on S3

2016-09-22 Thread Thomas Poepping (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-14373:
---
Attachment: HIVE-14373.05.patch

I'll create a new review-board submission, as I can't edit the old one.

This patch takes what Abdullah had and adds improvements, including:
 * added an abstraction in CliDrivers to increase code reuse
 * allowed QTEST_LEAVE_FILES implemented in HIVE-8100 to be used to leave files 
in blobstore for debugging
 * implemented unique blobstore paths for individual test runs, so if multiple 
people start test runs at the same time with the same blobstore path, there 
will be no collisions
 * moved test.blobstore.path to the conf.xml file, so it need not be specified 
each time
 * fixed README, added more examples

> Add integration tests for hive on S3
> 
>
> Key: HIVE-14373
> URL: https://issues.apache.org/jira/browse/HIVE-14373
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Thomas Poepping
> Attachments: HIVE-14373.02.patch, HIVE-14373.03.patch, 
> HIVE-14373.04.patch, HIVE-14373.05.patch, HIVE-14373.05.patch, 
> HIVE-14373.patch
>
>
> With Hive doing improvements to run on S3, it would be ideal to have better 
> integration testing on S3.
> These S3 tests won't be able to be executed by HiveQA because it will need 
> Amazon credentials. We need to write suite based on ideas from the Hadoop 
> project where:
> - an xml file is provided with S3 credentials
> - a committer must run these tests manually to verify it works
> - the xml file should not be part of the commit, and hiveqa should not run 
> these tests.
> https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14373) Add integration tests for hive on S3

2016-09-22 Thread Thomas Poepping (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-14373:
---
Attachment: HIVE-14373.05.patch

I'll open a new review-board request for this, as I can't update the old one.

This patch takes what Abdullah had and:
 * creates an abstraction in the CliDrivers for code reuse
 * allows the QTEST_LEAVE_FILES environment variable implemented in HIVE-8100 
to be used to optionally leave files in S3 for inspection and debugging
 * abstracts the test.blobstore.path to the conf.xml file, so it doesn't need 
to be set each time
 * implements a unique folder identifier for each test run, so if multiple 
people run tests against the same blobstore path at the same time, there will 
be no collisions

> Add integration tests for hive on S3
> 
>
> Key: HIVE-14373
> URL: https://issues.apache.org/jira/browse/HIVE-14373
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Thomas Poepping
> Attachments: HIVE-14373.02.patch, HIVE-14373.03.patch, 
> HIVE-14373.04.patch, HIVE-14373.05.patch, HIVE-14373.patch
>
>
> With Hive doing improvements to run on S3, it would be ideal to have better 
> integration testing on S3.
> These S3 tests won't be able to be executed by HiveQA because it will need 
> Amazon credentials. We need to write suite based on ideas from the Hadoop 
> project where:
> - an xml file is provided with S3 credentials
> - a committer must run these tests manually to verify it works
> - the xml file should not be part of the commit, and hiveqa should not run 
> these tests.
> https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14373) Add integration tests for hive on S3

2016-09-22 Thread Thomas Poepping (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-14373:
---
Status: In Progress  (was: Patch Available)

> Add integration tests for hive on S3
> 
>
> Key: HIVE-14373
> URL: https://issues.apache.org/jira/browse/HIVE-14373
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Thomas Poepping
> Attachments: HIVE-14373.02.patch, HIVE-14373.03.patch, 
> HIVE-14373.04.patch, HIVE-14373.patch
>
>
> With Hive doing improvements to run on S3, it would be ideal to have better 
> integration testing on S3.
> These S3 tests won't be able to be executed by HiveQA because it will need 
> Amazon credentials. We need to write suite based on ideas from the Hadoop 
> project where:
> - an xml file is provided with S3 credentials
> - a committer must run these tests manually to verify it works
> - the xml file should not be part of the commit, and hiveqa should not run 
> these tests.
> https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14373) Add integration tests for hive on S3

2016-08-31 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15452704#comment-15452704
 ] 

Thomas Poepping commented on HIVE-14373:


I can pick it up, if you're okay with that [~spena]

> Add integration tests for hive on S3
> 
>
> Key: HIVE-14373
> URL: https://issues.apache.org/jira/browse/HIVE-14373
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Abdullah Yousufi
> Attachments: HIVE-14373.02.patch, HIVE-14373.03.patch, 
> HIVE-14373.04.patch, HIVE-14373.patch
>
>
> With Hive doing improvements to run on S3, it would be ideal to have better 
> integration testing on S3.
> These S3 tests won't be able to be executed by HiveQA because it will need 
> Amazon credentials. We need to write suite based on ideas from the Hadoop 
> project where:
> - an xml file is provided with S3 credentials
> - a committer must run these tests manually to verify it works
> - the xml file should not be part of the commit, and hiveqa should not run 
> these tests.
> https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >