[jira] [Created] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name

2020-02-25 Thread Thomas Poepping (Jira)
Thomas Poepping created HIVE-22928:
--

 Summary: Allow hive.exec.stagingdir to be a fully qualified 
directory name
 Key: HIVE-22928
 URL: https://issues.apache.org/jira/browse/HIVE-22928
 Project: Hive
  Issue Type: Improvement
  Components: Configuration, Hive
Affects Versions: 3.1.2
Reporter: Thomas Poepping
Assignee: Thomas Poepping


Currently, {{hive.exec.stagingdir}} can only be set as a relative directory 
name that, for operations like {{insert}} or {{insert overwrite}}, will be 
placed either under the table directory or the partition directory. 

For cases where an HDFS cluster is small but the data being inserted is very 
large (greater than the capacity of the HDFS cluster, as mentioned in a comment 
by [~ashutoshc] on [HIVE-14270]), the client may want to set their staging 
directory to be an explicit blobstore path (or any filesystem path), rather 
than relying on Hive to intelligently build the blobstore path based on an 
interpretation of the job. We may lose locality guarantees, but because renames 
are just as expensive on blobstores no matter what the prefix is, this isn't 
considered a terribly large loss (assuming only blobstore customers use this 
functionality).

Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually 
suffice in this case, as the stagingdir is not the same.

This commit enables Hive customers to set an absolute location for all staging 
directories. For instances where the configured stagingdir scheme is not the 
same as the scheme for the table location, the default stagingdir configuration 
is used. This avoids a cross-filesystem rename, which is impossible anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-16427) Fix multi-insert query and write qtests

2017-04-12 Thread Thomas Poepping (JIRA)
Thomas Poepping created HIVE-16427:
--

 Summary: Fix multi-insert query and write qtests
 Key: HIVE-16427
 URL: https://issues.apache.org/jira/browse/HIVE-16427
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Reporter: Thomas Poepping


On HIVE-16415, it was found that the bug reported to be fixed in HIVE-14519 was 
not actually fixed.

This task is to find the problem, fix it, and add qtests to verify no future 
regression.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16415) Add blobstore tests for insertion of zero rows

2017-04-10 Thread Thomas Poepping (JIRA)
Thomas Poepping created HIVE-16415:
--

 Summary: Add blobstore tests for insertion of zero rows
 Key: HIVE-16415
 URL: https://issues.apache.org/jira/browse/HIVE-16415
 Project: Hive
  Issue Type: Test
  Components: Tests
Affects Versions: 2.1.1
Reporter: Thomas Poepping
Assignee: Thomas Poepping


This patch introduces two regression tests into the hive-blobstore qtest 
module: zero_rows_hdfs.q and zero_rows_blobstore.q. These test doing INSERT 
commands with a WHERE clause where the condition of the WHERE clause causes 
zero rows to be considered.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16288) Add blobstore tests for ORC and RCFILE file formats

2017-03-23 Thread Thomas Poepping (JIRA)
Thomas Poepping created HIVE-16288:
--

 Summary: Add blobstore tests for ORC and RCFILE file formats
 Key: HIVE-16288
 URL: https://issues.apache.org/jira/browse/HIVE-16288
 Project: Hive
  Issue Type: Test
  Components: Tests
Affects Versions: 2.1.1
Reporter: Thomas Poepping
Assignee: Thomas Poepping


This patch adds four tests each for ORC and RCFILE when running against 
blobstore filesystems:
  * Test for bucketed tables
  * Test for nonpartitioned tables
  * Test for partitioned tables
  * Test for partitioned tables with nonstandard partition locations



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-15867) Add blobstore tests for import/export

2017-02-09 Thread Thomas Poepping (JIRA)
Thomas Poepping created HIVE-15867:
--

 Summary: Add blobstore tests for import/export
 Key: HIVE-15867
 URL: https://issues.apache.org/jira/browse/HIVE-15867
 Project: Hive
  Issue Type: Bug
Reporter: Thomas Poepping
Assignee: Thomas Poepping


This patch covers ten separate tests testing import and export operations 
running against blobstore filesystems:
* Import addpartition
** blobstore -> file
** file -> blobstore
** blobstore -> blobstore
** blobstore -> hdfs
* import/export
** blobstore -> file
** file -> blobstore
** blobstore -> blobstore (partitioned and non-partitioned)
** blobstore -> HDFS (partitioned and non-partitioned)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-15852) Tablesampling on Tez in low-record case throws ArrayIndexOutOfBoundsException

2017-02-08 Thread Thomas Poepping (JIRA)
Thomas Poepping created HIVE-15852:
--

 Summary: Tablesampling on Tez in low-record case throws 
ArrayIndexOutOfBoundsException
 Key: HIVE-15852
 URL: https://issues.apache.org/jira/browse/HIVE-15852
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 2.1.1
Reporter: Thomas Poepping


Due to HIVE-13040 ( https://issues.apache.org/jira/browse/HIVE-13040 ), which 
doesn't create empty files to represent empty buckets when Hive is on Tez, a 
couple things are broken.

First of all, if there are empty buckets (which is possible with large datasets 
in the partitioned-bucketed case), tablesampling will not work if you're 
referencing a bucket number higher than the number of files.
e.g. In some partition 'p', there are three rows. The table 't' is clustered 
into ten buckets. With maximal hashing, only three bucket files will be 
created. If we do select * from t tablesample (bucket x out of 10) where 
 (where x > 3), an ArrayIndexOutOfBoundsException will be 
thrown because Hive assumes there are only three buckets.

Second, other applications (such as Pig) may be making assumptions about the 
number of files equaling the number of buckets.

Possible fixes:
* Revert HIVE-13040
* Change how tablesampling is implemented to accept possibility that number of 
files != number of buckets
** Would require coordination across projects to change assumptions

Things to consider:
* what performance gains are there from not creating empty files?
* if the gains are large, are we willing to lose them? (by reverting HIVE-13040)
* _how else can we avoid creating unnecessary files, while still maintaining 
invariants other applications expect?_



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-15618) Change hive-blobstore tests to run with Tez by default

2017-01-13 Thread Thomas Poepping (JIRA)
Thomas Poepping created HIVE-15618:
--

 Summary: Change hive-blobstore tests to run with Tez by default
 Key: HIVE-15618
 URL: https://issues.apache.org/jira/browse/HIVE-15618
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Affects Versions: 2.2.0
Reporter: Thomas Poepping
Assignee: Thomas Poepping


Ever since the upgrade to Hive 2, Tez has been the default execution engine for 
Hive. To match that fact, it makes sense to run our tests against Tez, rather 
than MR. This should more fully validate functionality against what we consider 
to be Hive defaults.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15576) Fix bug in QTestUtil where lines after a partial mask will not be masked

2017-01-10 Thread Thomas Poepping (JIRA)
Thomas Poepping created HIVE-15576:
--

 Summary: Fix bug in QTestUtil where lines after a partial mask 
will not be masked
 Key: HIVE-15576
 URL: https://issues.apache.org/jira/browse/HIVE-15576
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Affects Versions: 2.2.0
Reporter: Thomas Poepping
Assignee: Thomas Poepping


If the qfile output of a qtest contains two maskable lines right after one 
another, where the first contains a partial match candidate, the second line 
will not be evaluated for masking. This patch fixes that bug by disregarding 
whether a partial mask was found in the previous line.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15266) Edit test output of negative blobstore tests to match HIVE-15226

2016-11-22 Thread Thomas Poepping (JIRA)
Thomas Poepping created HIVE-15266:
--

 Summary: Edit test output of negative blobstore tests to match 
HIVE-15226
 Key: HIVE-15266
 URL: https://issues.apache.org/jira/browse/HIVE-15266
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 2.2.0
Reporter: Thomas Poepping
Assignee: Thomas Poepping


In HIVE-15226 ( https://issues.apache.org/jira/browse/HIVE-15226 ), blobstore 
tests were changed to print a different masking pattern for the blobstore path. 
In that patch, test output was replaced for the clientpositive test ( 
insert_into.q ), but not for the clientnegative test ( select_dropped_table.q 
), causing the negative tests to fail.

This patch is the result of -Dtest.output.overwrite=true with the 
clientnegative tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14174) Fix creating buckets without scheme information

2016-07-06 Thread Thomas Poepping (JIRA)
Thomas Poepping created HIVE-14174:
--

 Summary: Fix creating buckets without scheme information
 Key: HIVE-14174
 URL: https://issues.apache.org/jira/browse/HIVE-14174
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 2.1.0, 1.2.1
Reporter: Thomas Poepping
Assignee: Thomas Poepping


If a table is created on a non-default filesystem (i.e. non-hdfs), the empty 
files will be created with incorrect scheme information. This patch extracts 
the scheme and authority information for the new paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14175) Fix creating buckets without scheme information

2016-07-06 Thread Thomas Poepping (JIRA)
Thomas Poepping created HIVE-14175:
--

 Summary: Fix creating buckets without scheme information
 Key: HIVE-14175
 URL: https://issues.apache.org/jira/browse/HIVE-14175
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 2.1.0, 1.2.1
Reporter: Thomas Poepping
Assignee: Thomas Poepping


If a table is created on a non-default filesystem (i.e. non-hdfs), the empty 
files will be created with incorrect scheme information. This patch extracts 
the scheme and authority information for the new paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13523) Fix connection leak in ORC RecordReader and refactor for unit testing

2016-04-14 Thread Thomas Poepping (JIRA)
Thomas Poepping created HIVE-13523:
--

 Summary: Fix connection leak in ORC RecordReader and refactor for 
unit testing
 Key: HIVE-13523
 URL: https://issues.apache.org/jira/browse/HIVE-13523
 Project: Hive
  Issue Type: Bug
  Components: ORC
Affects Versions: 2.0.0
Reporter: Thomas Poepping


In RecordReaderImpl, a MetadataReaderImpl object was being created (opening a 
file), but never closed, causing a leak. This change closes the Metadata object 
in RecordReaderImpl, and does substantial refactoring to make RecordReaderImpl 
testable:
 * Created DataReaderFactory and MetadataReaderFactory (plus default 
implementations) so that the create() methods can be mocked to verify that the 
objects are actually closed in RecordReaderImpl.close()
 * Created MetadataReaderProperties and DataReaderProperties to clean up 
argument lists, making code more readable
 * Created a builder() for RecordReaderImpl to make the code more readable
 * DataReader and MetadataReader now extend closeable (there was no reason for 
them not to in the first place) so I can use the guava Closer interface: 
http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/io/Closer.html
 * Use the Closer interface to guarantee that regardless of if either close() 
call fails, both will be attempted (preventing further potential leaks)
 * Create builders for MetadataReaderProperties, DataReaderProperties, and 
RecordReaderImpl to help with code readability



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13405) Fix Connection Leak in OrcRawRecordMerger

2016-04-01 Thread Thomas Poepping (JIRA)
Thomas Poepping created HIVE-13405:
--

 Summary: Fix Connection Leak in OrcRawRecordMerger
 Key: HIVE-13405
 URL: https://issues.apache.org/jira/browse/HIVE-13405
 Project: Hive
  Issue Type: Bug
  Components: ORC
Affects Versions: 2.0.0
Reporter: Thomas Poepping


In OrcRawRecordMerger.getLastFlushLength, if the opened stream throws an 
IOException on .available() or on .readLong(), the function will exit without 
closing the stream.

This patch adds a try-with-resources to fix this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)