Review Request 67497: HIVE-19794: Disable removing order by from subquery in GenericUDTFGetSplits
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/67497/ --- Review request for hive and Jason Dere. Bugs: HIVE-19794 https://issues.apache.org/jira/browse/HIVE-19794 Repository: hive-git Description --- HIVE-19794: Disable removing order by from subquery in GenericUDTFGetSplits Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java dd42fd127e633304a2da499afa60f7b051d329a9 itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcGenericUDTFGetSplits.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java 57f6c66a56a88bb7383ebe5832bba75240dea554 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFGetSplits.java 20d09611ccdf863d5a5e7dc811efe091f7b4aba2 Diff: https://reviews.apache.org/r/67497/diff/1/ Testing --- Thanks, Prasanth_J
Re: Cleaning up old version in dist
+1 On Thu, Jun 7, 2018 at 11:13 AM, Alan Gates wrote: > Apache asks that we keep at most 2 current versions in dist, to minimize > the space we take up on distribution mirrors. Since we are running > multiple lines and a have a couple of separately releasable modules we'll > have more than 2 versions there. But we have old versions of Hive 2 (2.1, > 2.2) and of the storage-api (2.4, 2.5). I think we should remove these. > That will leave us with the most up to date versions of Hive 1, 2, 3, the > storage api, and the standalone metastore. Note that this does not affect > their availability in maven central or the apache archive. > > Alan.
Re: Review Request 67263: HIVE-19602
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/67263/ --- (Updated June 7, 2018, 10:43 p.m.) Review request for hive, Sahil Takiar and Vihang Karajgaonkar. Changes --- Making changes to be in sync with HIVE-19508 so that it does not cause merge conflicts with master Bugs: HIVE-19602 https://issues.apache.org/jira/browse/HIVE-19602 Repository: hive-git Description --- Refactor inplace progress code in Hive-on-spark progress monitor to use ProgressMonitor instance Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobMonitor.java e78b1cd6637c46070378c25a372916817fe99a59 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkProgressMonitor.java PRE-CREATION Diff: https://reviews.apache.org/r/67263/diff/5/ Changes: https://reviews.apache.org/r/67263/diff/4-5/ Testing --- Thanks, Bharathkrishna Guruvayoor Murali
[jira] [Created] (HIVE-19826) OrcRawRecordMerger doesn't work for more than one file
Sergey Shelukhin created HIVE-19826: --- Summary: OrcRawRecordMerger doesn't work for more than one file Key: HIVE-19826 URL: https://issues.apache.org/jira/browse/HIVE-19826 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Key object in the map is reused and reset, leading to bizarre merges and wrong results. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Cleaning up old version in dist
Apache asks that we keep at most 2 current versions in dist, to minimize the space we take up on distribution mirrors. Since we are running multiple lines and a have a couple of separately releasable modules we'll have more than 2 versions there. But we have old versions of Hive 2 (2.1, 2.2) and of the storage-api (2.4, 2.5). I think we should remove these. That will leave us with the most up to date versions of Hive 1, 2, 3, the storage api, and the standalone metastore. Note that this does not affect their availability in maven central or the apache archive. Alan.
Re: [DISCUSS] Release of standalone-metastore
I have pushed the standalone metastore src and bin tarballs and their signatures and hashes into Hive's dist area, so they should soon be available for download. Congrats to all who worked on this! As part of creating a release tag for the standalone metastore I noticed we didn't have one for release 3.0.0, so I created a tag for that as well. Alan. On Tue, Jun 5, 2018 at 10:45 AM Alan Gates wrote: > I have put the binary and source objects up at > https://home.apache.org/~gates/hive-standalone-metastore-3.0.0/ so > everyone can take a look before I officially push them to dist. > > I don't think we need to vote on this as we have already officially > released these objects, I'm just adding sha and gpg signatures for download > purposes. But, please take a look and make sure I did everything > properly. I'll push them to dist after a couple of days to give everyone a > chance to look them over. > > Alan. > > On Wed, May 30, 2018 at 11:00 AM Vihang Karajgaonkar > wrote: > >> The proposal to post the source and bin to the distribution sounds good to >> me. We can do the testing and release standalone-metastore 3.1 like to you >> suggested above. >> >> On Tue, May 29, 2018 at 10:49 PM, Peter Vary wrote: >> >> > What do you think about adding a ne profile, which adds a possibility to >> > compile the code with one command, until we separate standalone >> metastore >> > to a new project? Like -Pitests, but -Pmetastore. So "mvn clean install >> > -Pmetastore,itests" will compile everything. >> > >> > Alan Gates ezt írta (időpont: 2018. máj. 30., >> Sze >> > 0:42): >> > >> > > On Tue, May 29, 2018 at 3:29 PM Vihang Karajgaonkar < >> vih...@cloudera.com >> > > >> > > wrote: >> > > >> > > > How about cutting out a branch-3.0.1 and releasing 3.0.1 with the >> > pom.xml >> > > > fixed? My concern with above approach is we haven't tested >> > > > standalone-metastore when deployed independent of Hive. >> > > >> > > Actually, there is. The tarballs for source and bin are already out >> > > there. If I post them on the distribution site then they'll be >> easier to >> > > find. So we can test that now. And we can then do a 3.1 release of >> the >> > > metastore whenever we want, as long as it's before a 3.1 release of >> Hive. >> > > >> > > Alan. >> > > >> > > >> > > > So we don't know if >> > > > there is something is fundamentally broken in that mode and given >> that >> > we >> > > > don't know when 3.1 is going to be released it may remain in that >> state >> > > for >> > > > long time which is not good. I think may be a good approach now >> would >> > be >> > > to >> > > > test 3.0 standalone-metastore and fix any issues along with the >> pom.xml >> > > > changes and do a 3.0.1 release. What do you think? >> > > > >> > > > Thanks, >> > > > Vihang >> > > > >> > > > On Tue, May 29, 2018 at 1:57 PM, Alan Gates >> > > wrote: >> > > > >> > > > > In the thread on releasing Hive 3.0 I wrote >> > > > > >> > > > > We should work on producing a standalone-metastore >> > > > > release in the same time frame so that the schema's, etc. match. I >> > can >> > > RM >> > > > > that unless someone else wants to. >> > > > > >> > > > > https://lists.apache.org/thread.html/307b281c3742fdf6aeb7fac >> > > > > 3ee74a98830400b67711755572de15b80@%3Cdev.hive.apache.org%3E >> > > > > >> > > > > My thinking was to produce a separate metastore release, like we >> do >> > for >> > > > > storage-api. However, I missed that I needed to do some work in >> > > > branch-3.0 >> > > > > to disconnect standalone-metastore from the pom before the release >> > (in >> > > > the >> > > > > same way that storage-api does). Thus when we released Hive 3.0 >> we >> > > also >> > > > > released the standalone-metastore. See >> > > > > https://search.maven.org/#search%7Cga%7C2%7Cg%3A%22org. >> > apache.hive%22 >> > > > So >> > > > > I can't release another version of standalone-metastore 3.0. >> Here is >> > > > what >> > > > > I propose we do: >> > > > > >> > > > > >> > > > >1. Put the src and bin tarballs for standalone-metastore in >> Hive's >> > > > >distribution site. We have already voted on these as part of >> 3.0 >> > > > > release >> > > > >process. >> > > > >2. Like storage-api, we keep the standalone-metastore linked in >> > the >> > > > pom >> > > > >in the master branch. This makes life easier for developers as >> > they >> > > > >produce new patches. >> > > > >3. Also like storage-api, at some future point before we >> release >> > > Hive >> > > > >3.1 I will: >> > > > > 1. Make a separate branch for standalone-metastore from >> > branch-3 >> > > > > 2. Release a standalone-metastore 3.1 from this new branch >> > > > > 3. Remove standalone-metastore from the list of sub-modules >> in >> > > > Hive's >> > > > > pom.xml >> > > > > 4. Make Hive depend on the released 3.1 version of the >> > > > > standalone-metastore. >> > > > >4. For branch-3.0, I do not
[jira] [Created] (HIVE-19825) HiveServer2 leader selection shall use different zookeeper znode
Daniel Dai created HIVE-19825: - Summary: HiveServer2 leader selection shall use different zookeeper znode Key: HIVE-19825 URL: https://issues.apache.org/jira/browse/HIVE-19825 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Daniel Dai Assignee: Daniel Dai Currently, HiveServer2 leader selection (used only by privilegesynchronizer now) is reuse /hiveserver2 parent znode which is already used for HiveServer2 service discovery. This interfere the service discovery. I'd like to switch to a different znode /hiveserver2-leader. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] hive pull request #369: HIVE-19723: Arrow serde: "Unsupported data type: Tim...
GitHub user pudidic opened a pull request: https://github.com/apache/hive/pull/369 HIVE-19723: Arrow serde: "Unsupported data type: Timestamp(NANOSECOND, null)" This pull request added a randomized unit test, supports microsecond for Spark integration, and changed TestJdbcWithMiniLlapArrow to test microsecond. The previous pull request was hard to merge, due to some reverted conflicts on Apache master branch. You can merge this pull request into a Git repository by running: $ git pull https://github.com/pudidic/hive HIVE-19723 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/369.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #369 commit 1600141292aef67105a2df0435bd0bab52b1e4e3 Author: Teddy Choi Date: 2018-06-07T16:14:08Z HIVE-19723: Arrow serde: "Unsupported data type: Timestamp(NANOSECOND, null)" ---
[GitHub] hive pull request #360: HIVE-19723: Arrow serde: "Unsupported data type: Tim...
Github user pudidic closed the pull request at: https://github.com/apache/hive/pull/360 ---
[jira] [Created] (HIVE-19824) Improve online datasize estimations for MapJoins
Zoltan Haindrich created HIVE-19824: --- Summary: Improve online datasize estimations for MapJoins Key: HIVE-19824 URL: https://issues.apache.org/jira/browse/HIVE-19824 Project: Hive Issue Type: Improvement Reporter: Zoltan Haindrich Assignee: Zoltan Haindrich Statistics.datasize() only accounts for "real" data size; but for example handling 1M rows might introduce some datastructure overhead...if the "real" data is small - even this overhead might become the real memory usage for 6.5M rows of (int,int) the estimation is 52MB in reality this eats up ~260MB from which 210MB is used to service the hashmap functionality to that many rows. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19823) BytesBytesMultiHashMap estimation should account for load size
Zoltan Haindrich created HIVE-19823: --- Summary: BytesBytesMultiHashMap estimation should account for load size Key: HIVE-19823 URL: https://issues.apache.org/jira/browse/HIVE-19823 Project: Hive Issue Type: Improvement Reporter: Zoltan Haindrich Assignee: Zoltan Haindrich it could happen that the capacity is known beforehand; and the estimated size of the hashtable is accurate; but still because after some time the element count violates loadfactor ration a rehash will occur. this by default could happen with a {{1-loadfactor = 25%}} probability https://github.com/apache/hive/blob/cfd57348c1ac188e0ba131d5636a62ff7b7c27be/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/BytesBytesMultiHashMap.java#L176-L187 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 67485: HIVE-19783 Retrieve only locations in HiveMetaStore.dropPartitionsAndGetLocations
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/67485/ --- (Updated June 7, 2018, 10:31 a.m.) Review request for hive, Alexander Kolbasov and Vihang Karajgaonkar. Changes --- Null locations are possible. Handle those as well Bugs: HIVE-19783 https://issues.apache.org/jira/browse/HIVE-19783 Repository: hive-git Description --- Added a new getPartitionLocations method to the RawStore interface. Implemented getPartitionLocations in ObjectStore using JDQL. Question: In CachedObjectStore: Shall I call rawStore.getPartitionLocations or reimplement it using getPartitions? Modified dropPartitionsAndGetLocations: - Instead of querying every partition data. Query only the locations using the new interface method - Removed partKeys parameter which become unneccessary Diffs (updated) - itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java ff97522 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java b9f5fb8 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java b3a8dd0 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java f350aa9 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java d9356b8 standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java 8c3ada3 standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java f98e8de Diff: https://reviews.apache.org/r/67485/diff/2/ Changes: https://reviews.apache.org/r/67485/diff/1-2/ Testing --- Run the TestTablesCreateDropAlterTruncate test (partitioned table creation and drop) Thanks, Peter Vary
Review Request 67485: HIVE-19783 Retrieve only locations in HiveMetaStore.dropPartitionsAndGetLocations
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/67485/ --- Review request for hive, Alexander Kolbasov and Vihang Karajgaonkar. Bugs: HIVE-19783 https://issues.apache.org/jira/browse/HIVE-19783 Repository: hive-git Description --- Added a new getPartitionLocations method to the RawStore interface. Implemented getPartitionLocations in ObjectStore using JDQL. Question: In CachedObjectStore: Shall I call rawStore.getPartitionLocations or reimplement it using getPartitions? Modified dropPartitionsAndGetLocations: - Instead of querying every partition data. Query only the locations using the new interface method - Removed partKeys parameter which become unneccessary Diffs - itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java 0cc0ae5 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java d8b8414 standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java b15d89d standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java 283798c standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java 9da8d72 standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java 0461c4e standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java b71eda4 Diff: https://reviews.apache.org/r/67485/diff/1/ Testing --- Run the TestTablesCreateDropAlterTruncate test (partitioned table creation and drop) Thanks, Peter Vary
Review Request 67484: HIVE-19782 Flash out TestObjectStore.testDirectSQLDropParitionsCleanup
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/67484/ --- Review request for hive, Alexander Kolbasov and Vihang Karajgaonkar. Bugs: HIVE-19782 https://issues.apache.org/jira/browse/HIVE-19782 Repository: hive-git Description --- Updated test table/partition generation so we can insert into every related table, not just the basic ones. Use this only when testing the table cleanup, so we save a minimal time on tests. Fixed an exiting bug in HiveObjectRefBuilder.java find by the tests Added a possibility to add PartitionColumnReference to HiveObjectRefBuilder.java Diffs - standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/client/builder/HiveObjectRefBuilder.java 62a227a standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/TestObjectStore.java 7984af6 Diff: https://reviews.apache.org/r/67484/diff/1/ Testing --- Run the TestObjectStore.java tests Thanks, Peter Vary