[jira] [Created] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name
Thomas Poepping created HIVE-22928: -- Summary: Allow hive.exec.stagingdir to be a fully qualified directory name Key: HIVE-22928 URL: https://issues.apache.org/jira/browse/HIVE-22928 Project: Hive Issue Type: Improvement Components: Configuration, Hive Affects Versions: 3.1.2 Reporter: Thomas Poepping Assignee: Thomas Poepping Currently, {{hive.exec.stagingdir}} can only be set as a relative directory name that, for operations like {{insert}} or {{insert overwrite}}, will be placed either under the table directory or the partition directory. For cases where an HDFS cluster is small but the data being inserted is very large (greater than the capacity of the HDFS cluster, as mentioned in a comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their staging directory to be an explicit blobstore path (or any filesystem path), rather than relying on Hive to intelligently build the blobstore path based on an interpretation of the job. We may lose locality guarantees, but because renames are just as expensive on blobstores no matter what the prefix is, this isn't considered a terribly large loss (assuming only blobstore customers use this functionality). Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually suffice in this case, as the stagingdir is not the same. This commit enables Hive customers to set an absolute location for all staging directories. For instances where the configured stagingdir scheme is not the same as the scheme for the table location, the default stagingdir configuration is used. This avoids a cross-filesystem rename, which is impossible anyway. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-15576) Fix bug in QTestUtil where lines after a partial mask will not be masked
Thomas Poepping created HIVE-15576: -- Summary: Fix bug in QTestUtil where lines after a partial mask will not be masked Key: HIVE-15576 URL: https://issues.apache.org/jira/browse/HIVE-15576 Project: Hive Issue Type: Bug Components: Testing Infrastructure Affects Versions: 2.2.0 Reporter: Thomas Poepping Assignee: Thomas Poepping If the qfile output of a qtest contains two maskable lines right after one another, where the first contains a partial match candidate, the second line will not be evaluated for masking. This patch fixes that bug by disregarding whether a partial mask was found in the previous line. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15618) Change hive-blobstore tests to run with Tez by default
Thomas Poepping created HIVE-15618: -- Summary: Change hive-blobstore tests to run with Tez by default Key: HIVE-15618 URL: https://issues.apache.org/jira/browse/HIVE-15618 Project: Hive Issue Type: Bug Components: Testing Infrastructure Affects Versions: 2.2.0 Reporter: Thomas Poepping Assignee: Thomas Poepping Ever since the upgrade to Hive 2, Tez has been the default execution engine for Hive. To match that fact, it makes sense to run our tests against Tez, rather than MR. This should more fully validate functionality against what we consider to be Hive defaults. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15852) Tablesampling on Tez in low-record case throws ArrayIndexOutOfBoundsException
Thomas Poepping created HIVE-15852: -- Summary: Tablesampling on Tez in low-record case throws ArrayIndexOutOfBoundsException Key: HIVE-15852 URL: https://issues.apache.org/jira/browse/HIVE-15852 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 2.1.1 Reporter: Thomas Poepping Due to HIVE-13040 ( https://issues.apache.org/jira/browse/HIVE-13040 ), which doesn't create empty files to represent empty buckets when Hive is on Tez, a couple things are broken. First of all, if there are empty buckets (which is possible with large datasets in the partitioned-bucketed case), tablesampling will not work if you're referencing a bucket number higher than the number of files. e.g. In some partition 'p', there are three rows. The table 't' is clustered into ten buckets. With maximal hashing, only three bucket files will be created. If we do select * from t tablesample (bucket x out of 10) where (where x > 3), an ArrayIndexOutOfBoundsException will be thrown because Hive assumes there are only three buckets. Second, other applications (such as Pig) may be making assumptions about the number of files equaling the number of buckets. Possible fixes: * Revert HIVE-13040 * Change how tablesampling is implemented to accept possibility that number of files != number of buckets ** Would require coordination across projects to change assumptions Things to consider: * what performance gains are there from not creating empty files? * if the gains are large, are we willing to lose them? (by reverting HIVE-13040) * _how else can we avoid creating unnecessary files, while still maintaining invariants other applications expect?_ -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-15867) Add blobstore tests for import/export
Thomas Poepping created HIVE-15867: -- Summary: Add blobstore tests for import/export Key: HIVE-15867 URL: https://issues.apache.org/jira/browse/HIVE-15867 Project: Hive Issue Type: Bug Reporter: Thomas Poepping Assignee: Thomas Poepping This patch covers ten separate tests testing import and export operations running against blobstore filesystems: * Import addpartition ** blobstore -> file ** file -> blobstore ** blobstore -> blobstore ** blobstore -> hdfs * import/export ** blobstore -> file ** file -> blobstore ** blobstore -> blobstore (partitioned and non-partitioned) ** blobstore -> HDFS (partitioned and non-partitioned) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16288) Add blobstore tests for ORC and RCFILE file formats
Thomas Poepping created HIVE-16288: -- Summary: Add blobstore tests for ORC and RCFILE file formats Key: HIVE-16288 URL: https://issues.apache.org/jira/browse/HIVE-16288 Project: Hive Issue Type: Test Components: Tests Affects Versions: 2.1.1 Reporter: Thomas Poepping Assignee: Thomas Poepping This patch adds four tests each for ORC and RCFILE when running against blobstore filesystems: * Test for bucketed tables * Test for nonpartitioned tables * Test for partitioned tables * Test for partitioned tables with nonstandard partition locations -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16415) Add blobstore tests for insertion of zero rows
Thomas Poepping created HIVE-16415: -- Summary: Add blobstore tests for insertion of zero rows Key: HIVE-16415 URL: https://issues.apache.org/jira/browse/HIVE-16415 Project: Hive Issue Type: Test Components: Tests Affects Versions: 2.1.1 Reporter: Thomas Poepping Assignee: Thomas Poepping This patch introduces two regression tests into the hive-blobstore qtest module: zero_rows_hdfs.q and zero_rows_blobstore.q. These test doing INSERT commands with a WHERE clause where the condition of the WHERE clause causes zero rows to be considered. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16427) Fix multi-insert query and write qtests
Thomas Poepping created HIVE-16427: -- Summary: Fix multi-insert query and write qtests Key: HIVE-16427 URL: https://issues.apache.org/jira/browse/HIVE-16427 Project: Hive Issue Type: Bug Components: Logical Optimizer Reporter: Thomas Poepping On HIVE-16415, it was found that the bug reported to be fixed in HIVE-14519 was not actually fixed. This task is to find the problem, fix it, and add qtests to verify no future regression. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-13405) Fix Connection Leak in OrcRawRecordMerger
Thomas Poepping created HIVE-13405: -- Summary: Fix Connection Leak in OrcRawRecordMerger Key: HIVE-13405 URL: https://issues.apache.org/jira/browse/HIVE-13405 Project: Hive Issue Type: Bug Components: ORC Affects Versions: 2.0.0 Reporter: Thomas Poepping In OrcRawRecordMerger.getLastFlushLength, if the opened stream throws an IOException on .available() or on .readLong(), the function will exit without closing the stream. This patch adds a try-with-resources to fix this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13523) Fix connection leak in ORC RecordReader and refactor for unit testing
Thomas Poepping created HIVE-13523: -- Summary: Fix connection leak in ORC RecordReader and refactor for unit testing Key: HIVE-13523 URL: https://issues.apache.org/jira/browse/HIVE-13523 Project: Hive Issue Type: Bug Components: ORC Affects Versions: 2.0.0 Reporter: Thomas Poepping In RecordReaderImpl, a MetadataReaderImpl object was being created (opening a file), but never closed, causing a leak. This change closes the Metadata object in RecordReaderImpl, and does substantial refactoring to make RecordReaderImpl testable: * Created DataReaderFactory and MetadataReaderFactory (plus default implementations) so that the create() methods can be mocked to verify that the objects are actually closed in RecordReaderImpl.close() * Created MetadataReaderProperties and DataReaderProperties to clean up argument lists, making code more readable * Created a builder() for RecordReaderImpl to make the code more readable * DataReader and MetadataReader now extend closeable (there was no reason for them not to in the first place) so I can use the guava Closer interface: http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/io/Closer.html * Use the Closer interface to guarantee that regardless of if either close() call fails, both will be attempted (preventing further potential leaks) * Create builders for MetadataReaderProperties, DataReaderProperties, and RecordReaderImpl to help with code readability -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14175) Fix creating buckets without scheme information
Thomas Poepping created HIVE-14175: -- Summary: Fix creating buckets without scheme information Key: HIVE-14175 URL: https://issues.apache.org/jira/browse/HIVE-14175 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 2.1.0, 1.2.1 Reporter: Thomas Poepping Assignee: Thomas Poepping If a table is created on a non-default filesystem (i.e. non-hdfs), the empty files will be created with incorrect scheme information. This patch extracts the scheme and authority information for the new paths. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14174) Fix creating buckets without scheme information
Thomas Poepping created HIVE-14174: -- Summary: Fix creating buckets without scheme information Key: HIVE-14174 URL: https://issues.apache.org/jira/browse/HIVE-14174 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 2.1.0, 1.2.1 Reporter: Thomas Poepping Assignee: Thomas Poepping If a table is created on a non-default filesystem (i.e. non-hdfs), the empty files will be created with incorrect scheme information. This patch extracts the scheme and authority information for the new paths. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15266) Edit test output of negative blobstore tests to match HIVE-15226
Thomas Poepping created HIVE-15266: -- Summary: Edit test output of negative blobstore tests to match HIVE-15226 Key: HIVE-15266 URL: https://issues.apache.org/jira/browse/HIVE-15266 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 2.2.0 Reporter: Thomas Poepping Assignee: Thomas Poepping In HIVE-15226 ( https://issues.apache.org/jira/browse/HIVE-15226 ), blobstore tests were changed to print a different masking pattern for the blobstore path. In that patch, test output was replaced for the clientpositive test ( insert_into.q ), but not for the clientnegative test ( select_dropped_table.q ), causing the negative tests to fail. This patch is the result of -Dtest.output.overwrite=true with the clientnegative tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)