date:20201202

[jira] [Work logged] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS

2020-12-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-2?focusedWorklogId=519448=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519448
 ]

ASF GitHub Bot logged work on HIVE-2:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 07:41
Start Date: 03/Dec/20 07:41
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1716:
URL: https://github.com/apache/hive/pull/1716#discussion_r534827989



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -316,6 +314,30 @@ private boolean removeFiles(String location, 
ValidWriteIdList writeIdList, Compa
   }
   fs.delete(dead, true);
 }
-return true;
+// Check if there will be more obsolete directories to clean when 
possible. We will only mark cleaned when this
+// number reaches 0.
+return getNumEventuallyObsoleteDirs(location, dirSnapshots) == 0;
+  }
+
+  /**
+   * Get the number of base/delta directories the Cleaner should remove 
eventually. If we check this after cleaning
+   * we can see if the Cleaner has further work to do in this table/partition 
directory that it hasn't been able to
+   * finish, e.g. because of an open transaction at the time of compaction.
+   * We do this by assuming that there are no open transactions anywhere and 
then calling getAcidState. If there are
+   * obsolete directories, then the Cleaner has more work to do.
+   * @param location location of table
+   * @return number of dirs left for the cleaner to clean – eventually
+   * @throws IOException
+   */
+  private int getNumEventuallyObsoleteDirs(String location, Map dirSnapshots)
+  throws IOException {
+ValidTxnList validTxnList = new ValidReadTxnList();
+//save it so that getAcidState() sees it
+conf.set(ValidTxnList.VALID_TXNS_KEY, validTxnList.writeToString());
+ValidReaderWriteIdList validWriteIdList = new ValidReaderWriteIdList();
+Path locPath = new Path(location);
+AcidUtils.Directory dir = 
AcidUtils.getAcidState(locPath.getFileSystem(conf), locPath, conf, 
validWriteIdList,
+Ref.from(false), false, dirSnapshots);
+return dir.getObsolete().size();

Review comment:
   New deltas by themselves don't stop marking the compaction cleaned (they 
are not in obsolete list), but overlapping compactions do, I think problematic 
scenario could be if there are some long running ETL jobs also next to your 
streaming writes. That would mean the that cleaning jobs would pile up, they 
would run continuously, but the low values in min_history_level would prevent 
them to clean every obsolete up and ever marked as cleaned :(





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519448)
Time Spent: 5h 10m  (was: 5h)

> compactor.Cleaner should not set state "mark cleaned" if there are obsolete 
> files in the FS
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> This is an improvement on HIVE-24314, in which markCleaned() is called only 
> if +any+ files are deleted by the cleaner. This could cause a problem in the 
> following case:
> Say for table_1 compaction1 cleaning was blocked by an open txn, and 
> compaction is run again on the same table (compaction2). Both compaction1 and 
> compaction2 could be in "ready for cleaning" at the same time. By this time 
> the blocking open txn could be committed. When the cleaner runs, one of 
> compaction1 and compaction2 will remain in the "ready for cleaning" state:
> Say compaction2 is picked up by the cleaner first. The Cleaner deletes all 
> obsolete files.  Then compaction1 is picked up by the cleaner; the cleaner 
> doesn't remove any files and compaction1 will stay in the queue in a "ready 
> for cleaning" state.
> HIVE-24291 already solves this issue but if it isn't usable (for example if 
> HMS schema changes are out the question) then HIVE-24314 + this change will 
> fix the issue of the Cleaner not removing all obsolete files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24473) Update HBase version to 2.1.10

2020-12-02 Thread Istvan Toth (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HIVE-24473:
---
Description: 
Hive currently builds with a 2.0.0 pre-release.

Update HBase to more recent version.

We cannot use anything later than 2.2.4 because of HBASE-22394

So the options are 2.1.10 and 2.2.4

I suggest 2.1.10 because it's a chronologically later release, and it maximises 
compatibility with HBase server deployments.

 

  was:
Hive currently builds with a 2.0.0 pre-release.

Update HBase to more recent version.

We cannot use anything later than 2.2.4 because of HBASE-22394

So the options are 2.1.10 and 2.2.4

I suggest 2.1.10 because it's a chronologically later release, and it maximises 
compatibility HBase server deployments.

 


> Update HBase version to 2.1.10
> --
>
> Key: HIVE-24473
> URL: https://issues.apache.org/jira/browse/HIVE-24473
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 4.0.0
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24473.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive currently builds with a 2.0.0 pre-release.
> Update HBase to more recent version.
> We cannot use anything later than 2.2.4 because of HBASE-22394
> So the options are 2.1.10 and 2.2.4
> I suggest 2.1.10 because it's a chronologically later release, and it 
> maximises compatibility with HBase server deployments.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24473) Update HBase version to 2.1.10

2020-12-02 Thread Istvan Toth (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HIVE-24473:
---
Attachment: HIVE-24473.patch
Status: Patch Available  (was: Open)

> Update HBase version to 2.1.10
> --
>
> Key: HIVE-24473
> URL: https://issues.apache.org/jira/browse/HIVE-24473
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 4.0.0
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24473.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive currently builds with a 2.0.0 pre-release.
> Update HBase to more recent version.
> We cannot use anything later than 2.2.4 because of HBASE-22394
> So the options are 2.1.10 and 2.2.4
> I suggest 2.1.10 because it's a chronologically later release, and it 
> maximises compatibility HBase server deployments.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24473) Update HBase version to 2.1.10

2020-12-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24473:
--
Labels: pull-request-available  (was: )

> Update HBase version to 2.1.10
> --
>
> Key: HIVE-24473
> URL: https://issues.apache.org/jira/browse/HIVE-24473
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 4.0.0
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive currently builds with a 2.0.0 pre-release.
> Update HBase to more recent version.
> We cannot use anything later than 2.2.4 because of HBASE-22394
> So the options are 2.1.10 and 2.2.4
> I suggest 2.1.10 because it's a chronologically later release, and it 
> maximises compatibility HBase server deployments.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24473) Update HBase version to 2.1.10

2020-12-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24473?focusedWorklogId=519441=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519441
 ]

ASF GitHub Bot logged work on HIVE-24473:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 07:16
Start Date: 03/Dec/20 07:16
Worklog Time Spent: 10m 
  Work Description: stoty opened a new pull request #1729:
URL: https://github.com/apache/hive/pull/1729


   ### What changes were proposed in this pull request?
   Update included HBase version to 2.1.10
   
   ### Why are the changes needed?
   Currently Hive includes an old-pre-relase version of HBase.
   The included version is a GA (if older) release with a lot of fixes.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Hive test suite run successfully 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519441)
Remaining Estimate: 0h
Time Spent: 10m

> Update HBase version to 2.1.10
> --
>
> Key: HIVE-24473
> URL: https://issues.apache.org/jira/browse/HIVE-24473
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 4.0.0
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive currently builds with a 2.0.0 pre-release.
> Update HBase to more recent version.
> We cannot use anything later than 2.2.4 because of HBASE-22394
> So the options are 2.1.10 and 2.2.4
> I suggest 2.1.10 because it's a chronologically later release, and it 
> maximises compatibility HBase server deployments.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24473) Update HBase version to 2.1.10

2020-12-02 Thread Istvan Toth (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth reassigned HIVE-24473:
--


> Update HBase version to 2.1.10
> --
>
> Key: HIVE-24473
> URL: https://issues.apache.org/jira/browse/HIVE-24473
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 4.0.0
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>
> Hive currently builds with a 2.0.0 pre-release.
> Update HBase to more recent version.
> We cannot use anything later than 2.2.4 because of HBASE-22394
> So the options are 2.1.10 and 2.2.4
> I suggest 2.1.10 because it's a chronologically later release, and it 
> maximises compatibility HBase server deployments.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24472) Optimize LlapTaskSchedulerService::preemptTasksFromMap

2020-12-02 Thread Rajesh Balamohan (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17242961#comment-17242961
 ] 

Rajesh Balamohan commented on HIVE-24472:
-

Ref: Q14 in tpcds

> Optimize LlapTaskSchedulerService::preemptTasksFromMap
> --
>
> Key: HIVE-24472
> URL: https://issues.apache.org/jira/browse/HIVE-24472
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
> Attachments: Screenshot 2020-12-03 at 12.13.03 PM.png
>
>
> !Screenshot 2020-12-03 at 12.13.03 PM.png|width=1063,height=571!
> speculativeTasks could possibly include node information to reduce CPU burn 
> in LlapTaskSchedulerService::preemptTasksFromMap
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24471) Add support for combiner in hash mode group aggregation

2020-12-02 Thread mahesh kumar behera (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-24471:
--


> Add support for combiner in hash mode group aggregation 
> 
>
> Key: HIVE-24471
> URL: https://issues.apache.org/jira/browse/HIVE-24471
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> In map side group aggregation, partial grouped aggregation is calculated to 
> reduce the data written to disk by map task. In case of hash aggregation, 
> where the input data is not sorted, hash table is used. If the hash table 
> size increases beyond configurable limit, data is flushed to disk and new 
> hash table is generated. If the reduction by hash table is less than min hash 
> aggregation reduction calculated during compile time, the map side 
> aggregation is converted to streaming mode. So if the first few batch of 
> records does not result into significant reduction, then the mode is switched 
> to streaming mode. This may have impact on performance, if the subsequent 
> batch of records have less number of distinct values. To mitigate this 
> situation, a combiner can be added to the map task after the keys are sorted. 
> This will make sure that the aggregation is done if possible and reduce the 
> data written to disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24468) Use Event Time instead of Current Time in Notification Log DB Entry

2020-12-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24468?focusedWorklogId=519396=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519396
 ]

ASF GitHub Bot logged work on HIVE-24468:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 05:39
Start Date: 03/Dec/20 05:39
Worklog Time Spent: 10m 
  Work Description: pvary commented on pull request #1728:
URL: https://github.com/apache/hive/pull/1728#issuecomment-737678785


   Could this cause out of order notification timestamps? If we are sure that 
nobody relies on timestamps to check notification order (bad practice) then we 
can change, but I would be cautious about changing this, as notifications are 
widely used API.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519396)
Time Spent: 40m  (was: 0.5h)

> Use Event Time instead of Current Time in Notification Log DB Entry
> ---
>
> Key: HIVE-24468
> URL: https://issues.apache.org/jira/browse/HIVE-24468
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24470) Separate HiveMetastore Thrift and Driver logic

2020-12-02 Thread Cameron Moberg (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cameron Moberg reassigned HIVE-24470:
-


> Separate HiveMetastore Thrift and Driver logic
> --
>
> Key: HIVE-24470
> URL: https://issues.apache.org/jira/browse/HIVE-24470
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Cameron Moberg
>Assignee: Cameron Moberg
>Priority: Minor
>
> In the file HiveMetastore.java the majority of the code is a thrift interface 
> rather than the actual logic behind starting hive metastore, this should be 
> moved out into a separate file to clean up the file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24397) Add the projection specification to the table request object and add placeholders in ObjectStore.java

2020-12-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24397?focusedWorklogId=519254=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519254
 ]

ASF GitHub Bot logged work on HIVE-24397:
-

Author: ASF GitHub Bot
Created on: 02/Dec/20 21:24
Start Date: 02/Dec/20 21:24
Worklog Time Spent: 10m 
  Work Description: vnhive commented on a change in pull request #1681:
URL: https://github.com/apache/hive/pull/1681#discussion_r530816743



##
File path: 
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestTablesGetExists.java
##
@@ -402,6 +393,134 @@ public void testGetTableObjectsByName() throws Exception {
 
   }
 
+  @Test
+  public void testGetTableObjectsWithProjectionOfSingleField() throws 
Exception {
+List tableNames = new ArrayList<>();
+tableNames.add(testTables[0].getTableName());
+tableNames.add(testTables[1].getTableName());
+
+GetTablesRequest request = new GetTablesRequest();
+request.setProjectionSpec(new GetProjectionsSpec());
+request.setTblNames(tableNames);
+request.setDbName(DEFAULT_DATABASE);
+
+GetProjectionsSpec projectSpec = request.getProjectionSpec();
+List projectedFields = Collections.singletonList("sd.location");
+projectSpec.setFieldList(projectedFields);
+
+List tables = client.getTableObjectsByRequest(request);
+
+Assert.assertEquals("Found tables", 2, tables.size());
+
+for(Table table : tables) {
+  Assert.assertFalse(table.isSetDbName());
+  Assert.assertFalse(table.isSetCatName());
+  Assert.assertFalse(table.isSetTableName());
+  Assert.assertTrue(table.isSetSd());
+}
+  }
+
+  @Test
+  public void testGetTableObjectsWithNullProjectionSpec() throws Exception {
+List tableNames = new ArrayList<>();
+tableNames.add(testTables[0].getTableName());
+tableNames.add(testTables[1].getTableName());
+
+GetTablesRequest request = new GetTablesRequest();
+request.setProjectionSpec(null);
+request.setTblNames(tableNames);
+request.setDbName(DEFAULT_DATABASE);
+
+List tables = client.getTableObjectsByRequest(request);
+
+Assert.assertEquals("Found tables", 2, tables.size());
+  }
+
+  @Test
+  public void testGetTableObjectsWithNonExistentColumn() throws Exception {
+List tableNames = new ArrayList<>();
+tableNames.add(testTables[0].getTableName());
+tableNames.add(testTables[1].getTableName());
+
+GetTablesRequest request = new GetTablesRequest();
+request.setProjectionSpec(new GetProjectionsSpec());
+request.setTblNames(tableNames);
+request.setDbName(DEFAULT_DATABASE);
+
+GetProjectionsSpec projectSpec = request.getProjectionSpec();
+List projectedFields = Arrays.asList("Invalid1");
+projectSpec.setFieldList(projectedFields);
+
+Assert.assertThrows(Exception.class, 
()->client.getTableObjectsByRequest(request));
+  }
+
+
+  @Test
+  public void testGetTableObjectsWithNonExistentColumns() throws Exception {
+List tableNames = new ArrayList<>();
+tableNames.add(testTables[0].getTableName());
+tableNames.add(testTables[1].getTableName());
+
+GetTablesRequest request = new GetTablesRequest();
+request.setProjectionSpec(new GetProjectionsSpec());
+request.setTblNames(tableNames);
+request.setDbName(DEFAULT_DATABASE);
+
+GetProjectionsSpec projectSpec = request.getProjectionSpec();
+List projectedFields = Arrays.asList("Invalid1", "Invalid2");
+projectSpec.setFieldList(projectedFields);
+
+Assert.assertThrows(Exception.class, 
()->client.getTableObjectsByRequest(request));
+  }
+
+  @Test
+  public void testGetTableObjectsWithEmptyProjection() throws Exception {
+List tableNames = new ArrayList<>();
+tableNames.add(testTables[0].getTableName());
+tableNames.add(testTables[1].getTableName());
+
+GetTablesRequest request = new GetTablesRequest();
+request.setProjectionSpec(new GetProjectionsSpec());
+request.setTblNames(tableNames);
+request.setDbName(DEFAULT_DATABASE);
+
+GetProjectionsSpec projectSpec = request.getProjectionSpec();
+List projectedFields = Arrays.asList();
+projectSpec.setFieldList(projectedFields);
+
+List tables = client.getTableObjectsByRequest(request);
+
+Assert.assertEquals("Found tables", 0, tables.size());
+  }
+
+  @Test
+  public void testGetTableObjectsWithProjectionOfMultipleField() throws 
Exception {
+List tableNames = new ArrayList<>();
+tableNames.add(testTables[0].getTableName());
+tableNames.add(testTables[1].getTableName());
+
+GetTablesRequest request = new GetTablesRequest();
+request.setProjectionSpec(new GetProjectionsSpec());
+request.setTblNames(tableNames);
+request.setDbName(DEFAULT_DATABASE);
+
+GetProjectionsSpec projectSpec = request.getProjectionSpec();
+List projectedFields = Arrays.asList("database",

[jira] [Work logged] (HIVE-24468) Use Event Time instead of Current Time in Notification Log DB Entry

2020-12-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24468?focusedWorklogId=519219=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519219
 ]

ASF GitHub Bot logged work on HIVE-24468:
-

Author: ASF GitHub Bot
Created on: 02/Dec/20 20:12
Start Date: 02/Dec/20 20:12
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #1728:
URL: https://github.com/apache/hive/pull/1728


   …g DB Entry
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519219)
Time Spent: 0.5h  (was: 20m)

> Use Event Time instead of Current Time in Notification Log DB Entry
> ---
>
> Key: HIVE-24468
> URL: https://issues.apache.org/jira/browse/HIVE-24468
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24468) Use Event Time instead of Current Time in Notification Log DB Entry

2020-12-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24468?focusedWorklogId=519218=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519218
 ]

ASF GitHub Bot logged work on HIVE-24468:
-

Author: ASF GitHub Bot
Created on: 02/Dec/20 20:06
Start Date: 02/Dec/20 20:06
Worklog Time Spent: 10m 
  Work Description: belugabehr closed pull request #1728:
URL: https://github.com/apache/hive/pull/1728


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519218)
Time Spent: 20m  (was: 10m)

> Use Event Time instead of Current Time in Notification Log DB Entry
> ---
>
> Key: HIVE-24468
> URL: https://issues.apache.org/jira/browse/HIVE-24468
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS

2020-12-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-2?focusedWorklogId=519211=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519211
 ]

ASF GitHub Bot logged work on HIVE-2:
-

Author: ASF GitHub Bot
Created on: 02/Dec/20 19:45
Start Date: 02/Dec/20 19:45
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1716:
URL: https://github.com/apache/hive/pull/1716#discussion_r534377951



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -316,6 +314,30 @@ private boolean removeFiles(String location, 
ValidWriteIdList writeIdList, Compa
   }
   fs.delete(dead, true);
 }
-return true;
+// Check if there will be more obsolete directories to clean when 
possible. We will only mark cleaned when this
+// number reaches 0.
+return getNumEventuallyObsoleteDirs(location, dirSnapshots) == 0;
+  }
+
+  /**
+   * Get the number of base/delta directories the Cleaner should remove 
eventually. If we check this after cleaning
+   * we can see if the Cleaner has further work to do in this table/partition 
directory that it hasn't been able to
+   * finish, e.g. because of an open transaction at the time of compaction.
+   * We do this by assuming that there are no open transactions anywhere and 
then calling getAcidState. If there are
+   * obsolete directories, then the Cleaner has more work to do.
+   * @param location location of table
+   * @return number of dirs left for the cleaner to clean – eventually
+   * @throws IOException
+   */
+  private int getNumEventuallyObsoleteDirs(String location, Map dirSnapshots)
+  throws IOException {
+ValidTxnList validTxnList = new ValidReadTxnList();
+//save it so that getAcidState() sees it
+conf.set(ValidTxnList.VALID_TXNS_KEY, validTxnList.writeToString());
+ValidReaderWriteIdList validWriteIdList = new ValidReaderWriteIdList();
+Path locPath = new Path(location);
+AcidUtils.Directory dir = 
AcidUtils.getAcidState(locPath.getFileSystem(conf), locPath, conf, 
validWriteIdList,
+Ref.from(false), false, dirSnapshots);
+return dir.getObsolete().size();

Review comment:
   consider Case2:
   How the situation is going to change if instead of aborted txns we have a 
successful ones? Imagine writes arrive continuously via streaming. Won't new 
deltas prevent us from cleaning up obsolete ones and marking cleanup operation 
as completed for the corresponding compaction, but rather pile up cleanup 
requests in a queue?
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519211)
Time Spent: 5h  (was: 4h 50m)

> compactor.Cleaner should not set state "mark cleaned" if there are obsolete 
> files in the FS
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> This is an improvement on HIVE-24314, in which markCleaned() is called only 
> if +any+ files are deleted by the cleaner. This could cause a problem in the 
> following case:
> Say for table_1 compaction1 cleaning was blocked by an open txn, and 
> compaction is run again on the same table (compaction2). Both compaction1 and 
> compaction2 could be in "ready for cleaning" at the same time. By this time 
> the blocking open txn could be committed. When the cleaner runs, one of 
> compaction1 and compaction2 will remain in the "ready for cleaning" state:
> Say compaction2 is picked up by the cleaner first. The Cleaner deletes all 
> obsolete files.  Then compaction1 is picked up by the cleaner; the cleaner 
> doesn't remove any files and compaction1 will stay in the queue in a "ready 
> for cleaning" state.
> HIVE-24291 already solves this issue but if it isn't usable (for example if 
> HMS schema changes are out the question) then HIVE-24314 + this change will 
> fix the issue of the Cleaner not removing all obsolete files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version

2020-12-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=519182=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519182
 ]

ASF GitHub Bot logged work on HIVE-24403:
-

Author: ASF GitHub Bot
Created on: 02/Dec/20 18:22
Start Date: 02/Dec/20 18:22
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1688:
URL: https://github.com/apache/hive/pull/1688#discussion_r534385104



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DatabaseProduct.java
##
@@ -186,6 +186,19 @@ public boolean isDeadlock(SQLException e) {
 || e.getMessage().contains("can't serialize access for this 
transaction";
   }
 
+  /**
+   * Is the given exception a table not found exception
+   * @param e Exception
+   * @return
+   */
+  public boolean isTableNotExists(SQLException e) {

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519182)
Time Spent: 4h  (was: 3h 50m)

> change min_history_level schema change to be compatible with previous version
> -
>
> Key: HIVE-24403
> URL: https://issues.apache.org/jira/browse/HIVE-24403
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> In some configurations the HMS backend DB is used by HMS services with 
> different versions. 
>  HIVE-23107 dropped the min_history_level table from the backend DB making 
> the new schema version incompatible with the older HMS services. 
>  It is possible to modify that change to keep the compatibility
>  * Keep the min_history_level table
>  * Add the new fields for the compaction_queue the same way
>  * Create a feature flag for min_history_level and if it is on
>  * Keep the logic inserting to the table during openTxn
>  * Keep the logic removing the records at commitTxn and abortTxn
>  * Change the logic in the cleaner, to get the highwatermark the old way
>  * But still change it to not start the cleaning before that
>  * The txn_to_write_id table cleaning can work the new way in the new version 
> and the old way in the old version
>  * This feature flag can be automatically setup based on the existence of the 
> min_history level table, this way if the table will be dropped all HMS-s can 
> switch to the new functionality without restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24432) Delete Notification Events in Batches

2020-12-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24432?focusedWorklogId=519180=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519180
 ]

ASF GitHub Bot logged work on HIVE-24432:
-

Author: ASF GitHub Bot
Created on: 02/Dec/20 18:20
Start Date: 02/Dec/20 18:20
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #1710:
URL: https://github.com/apache/hive/pull/1710


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519180)
Time Spent: 50m  (was: 40m)

> Delete Notification Events in Batches
> -
>
> Key: HIVE-24432
> URL: https://issues.apache.org/jira/browse/HIVE-24432
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Notification events are loaded in batches (reduces memory pressure on the 
> HMS), but all of the deletes happen under a single transactions and, when 
> deleting many records, can put a lot of pressure on the backend database.
> Instead, delete events in batches (in different transactions) as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version

2020-12-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=519178=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519178
 ]

ASF GitHub Bot logged work on HIVE-24403:
-

Author: ASF GitHub Bot
Created on: 02/Dec/20 18:19
Start Date: 02/Dec/20 18:19
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1688:
URL: https://github.com/apache/hive/pull/1688#discussion_r534383384



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -5094,6 +5153,99 @@ public void countOpenTxns() throws MetaException {
 }
   }
 
+  /**
+   * Add min history level entry for each generated txn record
+   * @param dbConn Connection
+   * @param txnIds new transaction ids
+   * @deprecated Remove this method when min_history_level table is dropped
+   * @throws SQLException ex
+   */
+  @Deprecated
+  private void addTxnToMinHistoryLevel(Connection dbConn, List txnIds, 
long minOpenTxnId) throws SQLException {
+if (!useMinHistoryLevel) {
+  return;
+}
+// Need to register minimum open txnid for current transactions into 
MIN_HISTORY table.
+try (Statement stmt = dbConn.createStatement()) {
+
+  List rows = txnIds.stream().map(txnId -> txnId + ", " + 
minOpenTxnId).collect(Collectors.toList());
+
+  // Insert transaction entries into MIN_HISTORY_LEVEL.
+  List inserts =
+  sqlGenerator.createInsertValuesStmt("\"MIN_HISTORY_LEVEL\" 
(\"MHL_TXNID\", \"MHL_MIN_OPEN_TXNID\")", rows);
+  for (String insert : inserts) {
+LOG.debug("Going to execute insert <" + insert + ">");
+stmt.execute(insert);
+  }
+  LOG.info("Added entries to MIN_HISTORY_LEVEL for current txns: (" + 
txnIds + ") with min_open_txn: " + minOpenTxnId);
+} catch (SQLException e) {
+  if (dbProduct.isTableNotExists(e)) {
+// If the table does not exists anymore, we disable the flag and start 
to work the new way
+// This enables to switch to the new functionality without a restart
+useMinHistoryLevel = false;

Review comment:
   The idea is multiple HMS is using the same backend db, you upgrade them 
one by one, the last one changes the schema, all the others change to the new 
functionality, after the first call to min_history table. Do you have any 
practical example of wrapping them in aspect, I do not want to much more code 
complexity just to avoid checking an exception in four places.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519178)
Time Spent: 3h 50m  (was: 3h 40m)

> change min_history_level schema change to be compatible with previous version
> -
>
> Key: HIVE-24403
> URL: https://issues.apache.org/jira/browse/HIVE-24403
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> In some configurations the HMS backend DB is used by HMS services with 
> different versions. 
>  HIVE-23107 dropped the min_history_level table from the backend DB making 
> the new schema version incompatible with the older HMS services. 
>  It is possible to modify that change to keep the compatibility
>  * Keep the min_history_level table
>  * Add the new fields for the compaction_queue the same way
>  * Create a feature flag for min_history_level and if it is on
>  * Keep the logic inserting to the table during openTxn
>  * Keep the logic removing the records at commitTxn and abortTxn
>  * Change the logic in the cleaner, to get the highwatermark the old way
>  * But still change it to not start the cleaning before that
>  * The txn_to_write_id table cleaning can work the new way in the new version 
> and the old way in the old version
>  * This feature flag can be automatically setup based on the existence of the 
> min_history level table, this way if the table will be dropped all HMS-s can 
> switch to the new functionality without restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS

2020-12-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-2?focusedWorklogId=519172=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519172
 ]

ASF GitHub Bot logged work on HIVE-2:
-

Author: ASF GitHub Bot
Created on: 02/Dec/20 18:11
Start Date: 02/Dec/20 18:11
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1716:
URL: https://github.com/apache/hive/pull/1716#discussion_r534377951



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -316,6 +314,30 @@ private boolean removeFiles(String location, 
ValidWriteIdList writeIdList, Compa
   }
   fs.delete(dead, true);
 }
-return true;
+// Check if there will be more obsolete directories to clean when 
possible. We will only mark cleaned when this
+// number reaches 0.
+return getNumEventuallyObsoleteDirs(location, dirSnapshots) == 0;
+  }
+
+  /**
+   * Get the number of base/delta directories the Cleaner should remove 
eventually. If we check this after cleaning
+   * we can see if the Cleaner has further work to do in this table/partition 
directory that it hasn't been able to
+   * finish, e.g. because of an open transaction at the time of compaction.
+   * We do this by assuming that there are no open transactions anywhere and 
then calling getAcidState. If there are
+   * obsolete directories, then the Cleaner has more work to do.
+   * @param location location of table
+   * @return number of dirs left for the cleaner to clean – eventually
+   * @throws IOException
+   */
+  private int getNumEventuallyObsoleteDirs(String location, Map dirSnapshots)
+  throws IOException {
+ValidTxnList validTxnList = new ValidReadTxnList();
+//save it so that getAcidState() sees it
+conf.set(ValidTxnList.VALID_TXNS_KEY, validTxnList.writeToString());
+ValidReaderWriteIdList validWriteIdList = new ValidReaderWriteIdList();
+Path locPath = new Path(location);
+AcidUtils.Directory dir = 
AcidUtils.getAcidState(locPath.getFileSystem(conf), locPath, conf, 
validWriteIdList,
+Ref.from(false), false, dirSnapshots);
+return dir.getObsolete().size();

Review comment:
   considering Case2:
   How the situation is going to change if instead of aborted txn you have 
successful write? Writes coming in continuously (streaming). Won't new delta 
prevent us from marking clean as complete for corresponding compaction request 
and we'll have just pilled up cleanup requests in a queue?
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519172)
Time Spent: 4h 50m  (was: 4h 40m)

> compactor.Cleaner should not set state "mark cleaned" if there are obsolete 
> files in the FS
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> This is an improvement on HIVE-24314, in which markCleaned() is called only 
> if +any+ files are deleted by the cleaner. This could cause a problem in the 
> following case:
> Say for table_1 compaction1 cleaning was blocked by an open txn, and 
> compaction is run again on the same table (compaction2). Both compaction1 and 
> compaction2 could be in "ready for cleaning" at the same time. By this time 
> the blocking open txn could be committed. When the cleaner runs, one of 
> compaction1 and compaction2 will remain in the "ready for cleaning" state:
> Say compaction2 is picked up by the cleaner first. The Cleaner deletes all 
> obsolete files.  Then compaction1 is picked up by the cleaner; the cleaner 
> doesn't remove any files and compaction1 will stay in the queue in a "ready 
> for cleaning" state.
> HIVE-24291 already solves this issue but if it isn't usable (for example if 
> HMS schema changes are out the question) then HIVE-24314 + this change will 
> fix the issue of the Cleaner not removing all obsolete files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24432) Delete Notification Events in Batches

2020-12-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24432?focusedWorklogId=519168=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519168
 ]

ASF GitHub Bot logged work on HIVE-24432:
-

Author: ASF GitHub Bot
Created on: 02/Dec/20 18:02
Start Date: 02/Dec/20 18:02
Worklog Time Spent: 10m 
  Work Description: belugabehr closed pull request #1710:
URL: https://github.com/apache/hive/pull/1710


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519168)
Time Spent: 40m  (was: 0.5h)

> Delete Notification Events in Batches
> -
>
> Key: HIVE-24432
> URL: https://issues.apache.org/jira/browse/HIVE-24432
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Notification events are loaded in batches (reduces memory pressure on the 
> HMS), but all of the deletes happen under a single transactions and, when 
> deleting many records, can put a lot of pressure on the backend database.
> Instead, delete events in batches (in different transactions) as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version

2020-12-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=519162=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519162
 ]

ASF GitHub Bot logged work on HIVE-24403:
-

Author: ASF GitHub Bot
Created on: 02/Dec/20 17:53
Start Date: 02/Dec/20 17:53
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1688:
URL: https://github.com/apache/hive/pull/1688#discussion_r534366541



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnDbUtil.java
##
@@ -385,6 +391,26 @@ public static String queryToString(Configuration conf, 
String query, boolean inc
 return sb.toString();
   }
 
+  /**
+   * This is only for testing, it does not use the connectionPool from 
TxnHandler!
+   * @param conf
+   * @param query
+   * @throws Exception
+   */
+  @VisibleForTesting
+  public static void executeUpdate(Configuration conf, String query)

Review comment:
   in this case we should consider refactoring this class





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519162)
Time Spent: 3h 40m  (was: 3.5h)

> change min_history_level schema change to be compatible with previous version
> -
>
> Key: HIVE-24403
> URL: https://issues.apache.org/jira/browse/HIVE-24403
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> In some configurations the HMS backend DB is used by HMS services with 
> different versions. 
>  HIVE-23107 dropped the min_history_level table from the backend DB making 
> the new schema version incompatible with the older HMS services. 
>  It is possible to modify that change to keep the compatibility
>  * Keep the min_history_level table
>  * Add the new fields for the compaction_queue the same way
>  * Create a feature flag for min_history_level and if it is on
>  * Keep the logic inserting to the table during openTxn
>  * Keep the logic removing the records at commitTxn and abortTxn
>  * Change the logic in the cleaner, to get the highwatermark the old way
>  * But still change it to not start the cleaning before that
>  * The txn_to_write_id table cleaning can work the new way in the new version 
> and the old way in the old version
>  * This feature flag can be automatically setup based on the existence of the 
> min_history level table, this way if the table will be dropped all HMS-s can 
> switch to the new functionality without restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version

2020-12-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=519160=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519160
 ]

ASF GitHub Bot logged work on HIVE-24403:
-

Author: ASF GitHub Bot
Created on: 02/Dec/20 17:52
Start Date: 02/Dec/20 17:52
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1688:
URL: https://github.com/apache/hive/pull/1688#discussion_r534365714



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -670,6 +725,8 @@ public OpenTxnsResponse openTxns(OpenTxnRequest rqst) 
throws MetaException {
 
   assert txnIds.size() == numTxns;
 
+  addTxnToMinHistoryLevel(dbConn, txnIds, minOpenTxnId);

Review comment:
   That was my first intent, but it resulted in lock timeout, after 
inserting the new records in the txns table, the min open select was not running





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519160)
Time Spent: 3.5h  (was: 3h 20m)

> change min_history_level schema change to be compatible with previous version
> -
>
> Key: HIVE-24403
> URL: https://issues.apache.org/jira/browse/HIVE-24403
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> In some configurations the HMS backend DB is used by HMS services with 
> different versions. 
>  HIVE-23107 dropped the min_history_level table from the backend DB making 
> the new schema version incompatible with the older HMS services. 
>  It is possible to modify that change to keep the compatibility
>  * Keep the min_history_level table
>  * Add the new fields for the compaction_queue the same way
>  * Create a feature flag for min_history_level and if it is on
>  * Keep the logic inserting to the table during openTxn
>  * Keep the logic removing the records at commitTxn and abortTxn
>  * Change the logic in the cleaner, to get the highwatermark the old way
>  * But still change it to not start the cleaning before that
>  * The txn_to_write_id table cleaning can work the new way in the new version 
> and the old way in the old version
>  * This feature flag can be automatically setup based on the existence of the 
> min_history level table, this way if the table will be dropped all HMS-s can 
> switch to the new functionality without restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version

2020-12-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=519148=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519148
 ]

ASF GitHub Bot logged work on HIVE-24403:
-

Author: ASF GitHub Bot
Created on: 02/Dec/20 17:40
Start Date: 02/Dec/20 17:40
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1688:
URL: https://github.com/apache/hive/pull/1688#discussion_r534357567



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -390,6 +404,42 @@ public void setConf(Configuration conf){
 }
   }
 
+  /**
+   * Check if min_history_level table is usable
+   * @return
+   * @throws MetaException
+   */
+  private boolean checkMinHistoryLevelTable(boolean configValue) throws 
MetaException {
+if (!configValue) {
+  // don't check it if disabled
+  return false;
+}
+Connection dbConn = null;
+boolean tableExists = true;
+try {
+  dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED);
+  try (Statement stmt = dbConn.createStatement()) {
+// Dummy query to see if table exists
+try (ResultSet rs = stmt.executeQuery("SELECT 
MIN(\"MHL_MIN_OPEN_TXNID\") FROM \"MIN_HISTORY_LEVEL\"")) {

Review comment:
   fixed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519148)
Time Spent: 3h 20m  (was: 3h 10m)

> change min_history_level schema change to be compatible with previous version
> -
>
> Key: HIVE-24403
> URL: https://issues.apache.org/jira/browse/HIVE-24403
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> In some configurations the HMS backend DB is used by HMS services with 
> different versions. 
>  HIVE-23107 dropped the min_history_level table from the backend DB making 
> the new schema version incompatible with the older HMS services. 
>  It is possible to modify that change to keep the compatibility
>  * Keep the min_history_level table
>  * Add the new fields for the compaction_queue the same way
>  * Create a feature flag for min_history_level and if it is on
>  * Keep the logic inserting to the table during openTxn
>  * Keep the logic removing the records at commitTxn and abortTxn
>  * Change the logic in the cleaner, to get the highwatermark the old way
>  * But still change it to not start the cleaning before that
>  * The txn_to_write_id table cleaning can work the new way in the new version 
> and the old way in the old version
>  * This feature flag can be automatically setup based on the existence of the 
> min_history level table, this way if the table will be dropped all HMS-s can 
> switch to the new functionality without restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version

2020-12-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=519146=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519146
 ]

ASF GitHub Bot logged work on HIVE-24403:
-

Author: ASF GitHub Bot
Created on: 02/Dec/20 17:39
Start Date: 02/Dec/20 17:39
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1688:
URL: https://github.com/apache/hive/pull/1688#discussion_r534357150



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnDbUtil.java
##
@@ -385,6 +391,26 @@ public static String queryToString(Configuration conf, 
String query, boolean inc
 return sb.toString();
   }
 
+  /**
+   * This is only for testing, it does not use the connectionPool from 
TxnHandler!
+   * @param conf
+   * @param query
+   * @throws Exception
+   */
+  @VisibleForTesting
+  public static void executeUpdate(Configuration conf, String query)

Review comment:
   Well, this class is a test utility.
   /**
* Utility methods for creating and destroying txn database/schema, plus 
methods for
* querying against metastore tables.
* Placed here in a separate class so it can be shared across unit tests.
*/
   public final class TxnDbUtil
   
   The problem is more like getEpochFn and executeQueriesInBatchNoCount was 
added to this class, those are production code. I know it would be nicer if it 
would be in a test package, but then it would be harder to use in 5 different 
projects





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519146)
Time Spent: 3h 10m  (was: 3h)

> change min_history_level schema change to be compatible with previous version
> -
>
> Key: HIVE-24403
> URL: https://issues.apache.org/jira/browse/HIVE-24403
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> In some configurations the HMS backend DB is used by HMS services with 
> different versions. 
>  HIVE-23107 dropped the min_history_level table from the backend DB making 
> the new schema version incompatible with the older HMS services. 
>  It is possible to modify that change to keep the compatibility
>  * Keep the min_history_level table
>  * Add the new fields for the compaction_queue the same way
>  * Create a feature flag for min_history_level and if it is on
>  * Keep the logic inserting to the table during openTxn
>  * Keep the logic removing the records at commitTxn and abortTxn
>  * Change the logic in the cleaner, to get the highwatermark the old way
>  * But still change it to not start the cleaning before that
>  * The txn_to_write_id table cleaning can work the new way in the new version 
> and the old way in the old version
>  * This feature flag can be automatically setup based on the existence of the 
> min_history level table, this way if the table will be dropped all HMS-s can 
> switch to the new functionality without restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS

2020-12-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-2?focusedWorklogId=519128=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519128
 ]

ASF GitHub Bot logged work on HIVE-2:
-

Author: ASF GitHub Bot
Created on: 02/Dec/20 17:19
Start Date: 02/Dec/20 17:19
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1716:
URL: https://github.com/apache/hive/pull/1716#discussion_r534342957



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -316,6 +314,30 @@ private boolean removeFiles(String location, 
ValidWriteIdList writeIdList, Compa
   }
   fs.delete(dead, true);
 }
-return true;
+// Check if there will be more obsolete directories to clean when 
possible. We will only mark cleaned when this
+// number reaches 0.
+return getNumEventuallyObsoleteDirs(location, dirSnapshots) == 0;
+  }
+
+  /**
+   * Get the number of base/delta directories the Cleaner should remove 
eventually. If we check this after cleaning
+   * we can see if the Cleaner has further work to do in this table/partition 
directory that it hasn't been able to
+   * finish, e.g. because of an open transaction at the time of compaction.
+   * We do this by assuming that there are no open transactions anywhere and 
then calling getAcidState. If there are
+   * obsolete directories, then the Cleaner has more work to do.
+   * @param location location of table
+   * @return number of dirs left for the cleaner to clean – eventually
+   * @throws IOException
+   */
+  private int getNumEventuallyObsoleteDirs(String location, Map dirSnapshots)
+  throws IOException {
+ValidTxnList validTxnList = new ValidReadTxnList();
+//save it so that getAcidState() sees it
+conf.set(ValidTxnList.VALID_TXNS_KEY, validTxnList.writeToString());
+ValidReaderWriteIdList validWriteIdList = new ValidReaderWriteIdList();
+Path locPath = new Path(location);
+AcidUtils.Directory dir = 
AcidUtils.getAcidState(locPath.getFileSystem(conf), locPath, conf, 
validWriteIdList,
+Ref.from(false), false, dirSnapshots);
+return dir.getObsolete().size();

Review comment:
   Case 1: If HIVE-23107 and the following are there I think none of these 
checks are necessary, because we can be sure, that Cleaner was running when it 
could delete everything it can. Also if delayed cleaning is enabled it is 
guaranteed, that it will never delete any more obsolete directories no matter 
how many times it is running (see: 
validWriteIdList.updateHighWatermark(ci.highestWriteId)). If we must choose, 
checking if anything was removed does less damage
   Case 2: If those fixes are not there I think checking for obsolete files is 
better than checking if anything was removed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519128)
Time Spent: 4h 40m  (was: 4.5h)

> compactor.Cleaner should not set state "mark cleaned" if there are obsolete 
> files in the FS
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> This is an improvement on HIVE-24314, in which markCleaned() is called only 
> if +any+ files are deleted by the cleaner. This could cause a problem in the 
> following case:
> Say for table_1 compaction1 cleaning was blocked by an open txn, and 
> compaction is run again on the same table (compaction2). Both compaction1 and 
> compaction2 could be in "ready for cleaning" at the same time. By this time 
> the blocking open txn could be committed. When the cleaner runs, one of 
> compaction1 and compaction2 will remain in the "ready for cleaning" state:
> Say compaction2 is picked up by the cleaner first. The Cleaner deletes all 
> obsolete files.  Then compaction1 is picked up by the cleaner; the cleaner 
> doesn't remove any files and compaction1 will stay in the queue in a "ready 
> for cleaning" state.
> HIVE-24291 already solves this issue but if it isn't usable (for example if 
> HMS schema changes are out the question) then HIVE-24314 + this change will 
> fix the issue of the Cleaner not removing all obsolete files.



--
This message was sent by Atlassian Jira

[jira] [Work logged] (HIVE-24432) Delete Notification Events in Batches

2020-12-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24432?focusedWorklogId=519120=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519120
 ]

ASF GitHub Bot logged work on HIVE-24432:
-

Author: ASF GitHub Bot
Created on: 02/Dec/20 16:59
Start Date: 02/Dec/20 16:59
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #1710:
URL: https://github.com/apache/hive/pull/1710


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519120)
Time Spent: 0.5h  (was: 20m)

> Delete Notification Events in Batches
> -
>
> Key: HIVE-24432
> URL: https://issues.apache.org/jira/browse/HIVE-24432
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Notification events are loaded in batches (reduces memory pressure on the 
> HMS), but all of the deletes happen under a single transactions and, when 
> deleting many records, can put a lot of pressure on the backend database.
> Instead, delete events in batches (in different transactions) as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24432) Delete Notification Events in Batches

2020-12-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24432?focusedWorklogId=519117=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519117
 ]

ASF GitHub Bot logged work on HIVE-24432:
-

Author: ASF GitHub Bot
Created on: 02/Dec/20 16:58
Start Date: 02/Dec/20 16:58
Worklog Time Spent: 10m 
  Work Description: belugabehr closed pull request #1710:
URL: https://github.com/apache/hive/pull/1710


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519117)
Time Spent: 20m  (was: 10m)

> Delete Notification Events in Batches
> -
>
> Key: HIVE-24432
> URL: https://issues.apache.org/jira/browse/HIVE-24432
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Notification events are loaded in batches (reduces memory pressure on the 
> HMS), but all of the deletes happen under a single transactions and, when 
> deleting many records, can put a lot of pressure on the backend database.
> Instead, delete events in batches (in different transactions) as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24432) Delete Notification Events in Batches

2020-12-02 Thread David Mollitor (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17242511#comment-17242511
 ] 

David Mollitor commented on HIVE-24432:
---

[~anishek] Yes.  I thought about implementing it this way, however, it's not 
always that simple.  Since Hive is using an ORM it can sometimes have a 
negative effect when doing modifications to the DB directly since the ORM 
caches fall out of sync with the state of the DB after that modification.  I 
think in this case, it might be OK, but less risk do it in this manner and the 
clean up isn't too important in terms of performance.

> Delete Notification Events in Batches
> -
>
> Key: HIVE-24432
> URL: https://issues.apache.org/jira/browse/HIVE-24432
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Notification events are loaded in batches (reduces memory pressure on the 
> HMS), but all of the deletes happen under a single transactions and, when 
> deleting many records, can put a lot of pressure on the backend database.
> Instead, delete events in batches (in different transactions) as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24436) Fix Avro NULL_DEFAULT_VALUE compatibility issue

2020-12-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24436?focusedWorklogId=519091=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519091
 ]

ASF GitHub Bot logged work on HIVE-24436:
-

Author: ASF GitHub Bot
Created on: 02/Dec/20 16:18
Start Date: 02/Dec/20 16:18
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #1722:
URL: https://github.com/apache/hive/pull/1722#issuecomment-737336194


   Excellent! For info vote for Avro 1.10.1 (that fixes the null default fix) 
is almost over and artifacts should be published tomorrow or the day after I 
will update my PR once it is out.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519091)
Time Spent: 2.5h  (was: 2h 20m)

> Fix Avro NULL_DEFAULT_VALUE compatibility issue
> ---
>
> Key: HIVE-24436
> URL: https://issues.apache.org/jira/browse/HIVE-24436
> Project: Hive
>  Issue Type: Improvement
>  Components: Avro
>Affects Versions: 2.3.8
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.8, 3.1.3, 4.0.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Exception1:
> {noformat}
> - create hive serde table with Catalog
> *** RUN ABORTED ***
>   java.lang.NoSuchMethodError: 'void 
> org.apache.avro.Schema$Field.(java.lang.String, org.apache.avro.Schema, 
> java.lang.String, org.codehaus.jackson.JsonNode)'
>   at 
> org.apache.hadoop.hive.serde2.avro.TypeInfoToSchema.createAvroField(TypeInfoToSchema.java:76)
>   at 
> org.apache.hadoop.hive.serde2.avro.TypeInfoToSchema.convert(TypeInfoToSchema.java:61)
>   at 
> org.apache.hadoop.hive.serde2.avro.AvroSerDe.getSchemaFromCols(AvroSerDe.java:170)
>   at 
> org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:114)
>   at 
> org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:83)
>   at 
> org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:533)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:450)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:437)
>   at 
> org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:281)
>   at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:263)
> {noformat}
> Exception2:
> {noformat}
> - alter hive serde table add columns -- partitioned - AVRO *** FAILED ***
>   org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.avro.AvroRuntimeException: Unknown datum class: class 
> org.codehaus.jackson.node.NullNode;
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:112)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.createTable(HiveExternalCatalog.scala:245)
>   at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.createTable(ExternalCatalogWithListener.scala:94)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:346)
>   at 
> org.apache.spark.sql.execution.command.CreateTableCommand.run(tables.scala:166)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
>   at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
>   at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3680)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24468) Use Event Time instead of Current Time in Notification Log DB Entry

2020-12-02 Thread Anishek Agarwal (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17242465#comment-17242465
 ] 

Anishek Agarwal commented on HIVE-24468:


+1

> Use Event Time instead of Current Time in Notification Log DB Entry
> ---
>
> Key: HIVE-24468
> URL: https://issues.apache.org/jira/browse/HIVE-24468
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24453) Direct SQL error when parsing create_time value for database

2020-12-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24453?focusedWorklogId=519085=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519085
 ]

ASF GitHub Bot logged work on HIVE-24453:
-

Author: ASF GitHub Bot
Created on: 02/Dec/20 15:56
Start Date: 02/Dec/20 15:56
Worklog Time Spent: 10m 
  Work Description: jcamachor merged pull request #1719:
URL: https://github.com/apache/hive/pull/1719


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519085)
Time Spent: 20m  (was: 10m)

> Direct SQL error when parsing create_time value for database
> 
>
> Key: HIVE-24453
> URL: https://issues.apache.org/jira/browse/HIVE-24453
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HIVE-21077 introduced a {{create_time}} field for {{DBS}} table in HMS. 
> Although the value for that field is always set after that patch, the value 
> could be null if the database was created before the feature went in. 
> DirectSQL should check for null value before parsing the integer, otherwise 
> we hit an exception and fallback to ORM path:
> {code}
> 2020-11-28 09:06:05,414 WARN  org.apache.hadoop.hive.metastore.ObjectStore: 
> [pool-8-thread-194]: Falling back to ORM path due to direct SQL failure (this 
> is not an error): null at 
> org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils.extractSqlInt(MetastoreDirectSqlUtils.java:251)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getDatabase(MetaStoreDirectSql.java:420)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore$1.getSqlResult(ObjectStore.java:839)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24453) Direct SQL error when parsing create_time value for database

2020-12-02 Thread Jesus Camacho Rodriguez (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24453:
---
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Pushed to master, thanks for the review [~kkasa]!

> Direct SQL error when parsing create_time value for database
> 
>
> Key: HIVE-24453
> URL: https://issues.apache.org/jira/browse/HIVE-24453
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HIVE-21077 introduced a {{create_time}} field for {{DBS}} table in HMS. 
> Although the value for that field is always set after that patch, the value 
> could be null if the database was created before the feature went in. 
> DirectSQL should check for null value before parsing the integer, otherwise 
> we hit an exception and fallback to ORM path:
> {code}
> 2020-11-28 09:06:05,414 WARN  org.apache.hadoop.hive.metastore.ObjectStore: 
> [pool-8-thread-194]: Falling back to ORM path due to direct SQL failure (this 
> is not an error): null at 
> org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils.extractSqlInt(MetastoreDirectSqlUtils.java:251)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getDatabase(MetaStoreDirectSql.java:420)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore$1.getSqlResult(ObjectStore.java:839)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS

2020-12-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-2?focusedWorklogId=519051=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519051
 ]

ASF GitHub Bot logged work on HIVE-2:
-

Author: ASF GitHub Bot
Created on: 02/Dec/20 14:50
Start Date: 02/Dec/20 14:50
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #1716:
URL: https://github.com/apache/hive/pull/1716#discussion_r534226306



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -316,6 +314,30 @@ private boolean removeFiles(String location, 
ValidWriteIdList writeIdList, Compa
   }
   fs.delete(dead, true);
 }
-return true;
+// Check if there will be more obsolete directories to clean when 
possible. We will only mark cleaned when this
+// number reaches 0.
+return getNumEventuallyObsoleteDirs(location, dirSnapshots) == 0;
+  }
+
+  /**
+   * Get the number of base/delta directories the Cleaner should remove 
eventually. If we check this after cleaning
+   * we can see if the Cleaner has further work to do in this table/partition 
directory that it hasn't been able to
+   * finish, e.g. because of an open transaction at the time of compaction.
+   * We do this by assuming that there are no open transactions anywhere and 
then calling getAcidState. If there are
+   * obsolete directories, then the Cleaner has more work to do.
+   * @param location location of table
+   * @return number of dirs left for the cleaner to clean – eventually
+   * @throws IOException
+   */
+  private int getNumEventuallyObsoleteDirs(String location, Map dirSnapshots)
+  throws IOException {
+ValidTxnList validTxnList = new ValidReadTxnList();
+//save it so that getAcidState() sees it
+conf.set(ValidTxnList.VALID_TXNS_KEY, validTxnList.writeToString());
+ValidReaderWriteIdList validWriteIdList = new ValidReaderWriteIdList();
+Path locPath = new Path(location);
+AcidUtils.Directory dir = 
AcidUtils.getAcidState(locPath.getFileSystem(conf), locPath, conf, 
validWriteIdList,
+Ref.from(false), false, dirSnapshots);
+return dir.getObsolete().size();

Review comment:
   Okay, I see what you mean. I removed the aborted files from the total.
   
   In general do you think checking for obsolete files is better than checking 
whether we removed any files?
   Case 1: Assuming HIVE-23107 etc. are present in the version?
   Case 2: Assuming HIVE-23107 etc. are _not_ present in the version?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519051)
Time Spent: 4.5h  (was: 4h 20m)

> compactor.Cleaner should not set state "mark cleaned" if there are obsolete 
> files in the FS
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> This is an improvement on HIVE-24314, in which markCleaned() is called only 
> if +any+ files are deleted by the cleaner. This could cause a problem in the 
> following case:
> Say for table_1 compaction1 cleaning was blocked by an open txn, and 
> compaction is run again on the same table (compaction2). Both compaction1 and 
> compaction2 could be in "ready for cleaning" at the same time. By this time 
> the blocking open txn could be committed. When the cleaner runs, one of 
> compaction1 and compaction2 will remain in the "ready for cleaning" state:
> Say compaction2 is picked up by the cleaner first. The Cleaner deletes all 
> obsolete files.  Then compaction1 is picked up by the cleaner; the cleaner 
> doesn't remove any files and compaction1 will stay in the queue in a "ready 
> for cleaning" state.
> HIVE-24291 already solves this issue but if it isn't usable (for example if 
> HMS schema changes are out the question) then HIVE-24314 + this change will 
> fix the issue of the Cleaner not removing all obsolete files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24460) Refactor Get Next Event ID for DbNotificationListener

2020-12-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24460?focusedWorklogId=519034=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519034
 ]

ASF GitHub Bot logged work on HIVE-24460:
-

Author: ASF GitHub Bot
Created on: 02/Dec/20 14:22
Start Date: 02/Dec/20 14:22
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1725:
URL: https://github.com/apache/hive/pull/1725#discussion_r534203356



##
File path: 
hcatalog/server-extensions/src/main/java/org/apache/hive/hcatalog/listener/DbNotificationListener.java
##
@@ -1217,7 +1251,7 @@ private void addNotificationLog(NotificationEvent event, 
ListenerEvent listenerE
 params.add(catName);
   }
 
-  s = "insert into \"NOTIFICATION_LOG\" (" + columns + ") VALUES (" + 
insertVal + ")";
+  String s = "insert into \"NOTIFICATION_LOG\" (" + columns + ") VALUES (" 
+ insertVal + ")";

Review comment:
   @miklosgergely Yes, for sure.  And I can take a look at that in a future 
refactoring, but this request is out of scope for this change which only 
affects the generation of the "Next Event ID."  Thanks for the review.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519034)
Time Spent: 0.5h  (was: 20m)

> Refactor Get Next Event ID for DbNotificationListener
> -
>
> Key: HIVE-24460
> URL: https://issues.apache.org/jira/browse/HIVE-24460
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Refactor event ID generation to match notification log ID generation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24450) DbNotificationListener Request Notification IDs in Batches

2020-12-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24450?focusedWorklogId=519033=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519033
 ]

ASF GitHub Bot logged work on HIVE-24450:
-

Author: ASF GitHub Bot
Created on: 02/Dec/20 14:20
Start Date: 02/Dec/20 14:20
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #1718:
URL: https://github.com/apache/hive/pull/1718#issuecomment-737258804


   Thank you all for the feedback.  Please review my notes:
   
   https://issues.apache.org/jira/browse/HIVE-24450



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519033)
Time Spent: 50m  (was: 40m)

> DbNotificationListener Request Notification IDs in Batches
> --
>
> Key: HIVE-24450
> URL: https://issues.apache.org/jira/browse/HIVE-24450
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Every time a new notification event is logged into the database, the sequence 
> number for the ID of the even is incremented by one.  It is very standard in 
> database design to instead request a block of IDs for each fetch from the 
> database.  The sequence numbers are then handed out locally until the block 
> of IDs is exhausted.  This allows for fewer database round-trips and 
> transactions, at the expense of perhaps burning a few IDs.
> Burning of IDs happens when the server is restarted in the middle of a block 
> of sequence IDs.  That is, if the HMS requests a block of 10 ids, and only 
> three have been assigned, after the restart, the HMS will request another 
> block of 10, burning (wasting) 7 IDs.  As long as the blocks are not too 
> small, and restarts are infrequent, then few IDs are lost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24450) DbNotificationListener Request Notification IDs in Batches

2020-12-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24450?focusedWorklogId=519032=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519032
 ]

ASF GitHub Bot logged work on HIVE-24450:
-

Author: ASF GitHub Bot
Created on: 02/Dec/20 14:20
Start Date: 02/Dec/20 14:20
Worklog Time Spent: 10m 
  Work Description: belugabehr closed pull request #1718:
URL: https://github.com/apache/hive/pull/1718


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519032)
Time Spent: 40m  (was: 0.5h)

> DbNotificationListener Request Notification IDs in Batches
> --
>
> Key: HIVE-24450
> URL: https://issues.apache.org/jira/browse/HIVE-24450
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Every time a new notification event is logged into the database, the sequence 
> number for the ID of the even is incremented by one.  It is very standard in 
> database design to instead request a block of IDs for each fetch from the 
> database.  The sequence numbers are then handed out locally until the block 
> of IDs is exhausted.  This allows for fewer database round-trips and 
> transactions, at the expense of perhaps burning a few IDs.
> Burning of IDs happens when the server is restarted in the middle of a block 
> of sequence IDs.  That is, if the HMS requests a block of 10 ids, and only 
> three have been assigned, after the restart, the HMS will request another 
> block of 10, burning (wasting) 7 IDs.  As long as the blocks are not too 
> small, and restarts are infrequent, then few IDs are lost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-24450) DbNotificationListener Request Notification IDs in Batches

2020-12-02 Thread David Mollitor (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HIVE-24450.
---
Resolution: Won't Fix

> DbNotificationListener Request Notification IDs in Batches
> --
>
> Key: HIVE-24450
> URL: https://issues.apache.org/jira/browse/HIVE-24450
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Every time a new notification event is logged into the database, the sequence 
> number for the ID of the even is incremented by one.  It is very standard in 
> database design to instead request a block of IDs for each fetch from the 
> database.  The sequence numbers are then handed out locally until the block 
> of IDs is exhausted.  This allows for fewer database round-trips and 
> transactions, at the expense of perhaps burning a few IDs.
> Burning of IDs happens when the server is restarted in the middle of a block 
> of sequence IDs.  That is, if the HMS requests a block of 10 ids, and only 
> three have been assigned, after the restart, the HMS will request another 
> block of 10, burning (wasting) 7 IDs.  As long as the blocks are not too 
> small, and restarts are infrequent, then few IDs are lost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24450) DbNotificationListener Request Notification IDs in Batches

2020-12-02 Thread David Mollitor (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17242389#comment-17242389
 ] 

David Mollitor commented on HIVE-24450:
---

Would be great if you could also look at HIVE-24468 as well.  Thanks.

> DbNotificationListener Request Notification IDs in Batches
> --
>
> Key: HIVE-24450
> URL: https://issues.apache.org/jira/browse/HIVE-24450
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Every time a new notification event is logged into the database, the sequence 
> number for the ID of the even is incremented by one.  It is very standard in 
> database design to instead request a block of IDs for each fetch from the 
> database.  The sequence numbers are then handed out locally until the block 
> of IDs is exhausted.  This allows for fewer database round-trips and 
> transactions, at the expense of perhaps burning a few IDs.
> Burning of IDs happens when the server is restarted in the middle of a block 
> of sequence IDs.  That is, if the HMS requests a block of 10 ids, and only 
> three have been assigned, after the restart, the HMS will request another 
> block of 10, burning (wasting) 7 IDs.  As long as the blocks are not too 
> small, and restarts are infrequent, then few IDs are lost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24468) Use Event Time instead of Current Time in Notification Log DB Entry

2020-12-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24468?focusedWorklogId=519029=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519029
 ]

ASF GitHub Bot logged work on HIVE-24468:
-

Author: ASF GitHub Bot
Created on: 02/Dec/20 14:18
Start Date: 02/Dec/20 14:18
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #1728:
URL: https://github.com/apache/hive/pull/1728


   …g DB Entry
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519029)
Remaining Estimate: 0h
Time Spent: 10m

> Use Event Time instead of Current Time in Notification Log DB Entry
> ---
>
> Key: HIVE-24468
> URL: https://issues.apache.org/jira/browse/HIVE-24468
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24468) Use Event Time instead of Current Time in Notification Log DB Entry

2020-12-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24468:
--
Labels: pull-request-available  (was: )

> Use Event Time instead of Current Time in Notification Log DB Entry
> ---
>
> Key: HIVE-24468
> URL: https://issues.apache.org/jira/browse/HIVE-24468
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24468) Use Event Time instead of Current Time in Notification Log DB Entry

2020-12-02 Thread David Mollitor (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-24468:
-


> Use Event Time instead of Current Time in Notification Log DB Entry
> ---
>
> Key: HIVE-24468
> URL: https://issues.apache.org/jira/browse/HIVE-24468
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24450) DbNotificationListener Request Notification IDs in Batches

2020-12-02 Thread David Mollitor (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17242383#comment-17242383
 ] 

David Mollitor commented on HIVE-24450:
---

[~aasha] [~pvargacl] [~anishek],

 

Thanks for the review!  That is unfortunate (re: performance) but thank you for 
clarifying.  Can you please take a look at HIVE-24463 ? I have added a special 
case for MySQL to better performance and I have changed the code so that 
incrementing by 1 is hardcoded.  As it is currently written, the code makes the 
reader believe that the counter can be incremented by an arbitrary amount.

> DbNotificationListener Request Notification IDs in Batches
> --
>
> Key: HIVE-24450
> URL: https://issues.apache.org/jira/browse/HIVE-24450
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Every time a new notification event is logged into the database, the sequence 
> number for the ID of the even is incremented by one.  It is very standard in 
> database design to instead request a block of IDs for each fetch from the 
> database.  The sequence numbers are then handed out locally until the block 
> of IDs is exhausted.  This allows for fewer database round-trips and 
> transactions, at the expense of perhaps burning a few IDs.
> Burning of IDs happens when the server is restarted in the middle of a block 
> of sequence IDs.  That is, if the HMS requests a block of 10 ids, and only 
> three have been assigned, after the restart, the HMS will request another 
> block of 10, burning (wasting) 7 IDs.  As long as the blocks are not too 
> small, and restarts are infrequent, then few IDs are lost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-12-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23965?focusedWorklogId=519022=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519022
 ]

ASF GitHub Bot logged work on HIVE-23965:
-

Author: ASF GitHub Bot
Created on: 02/Dec/20 14:00
Start Date: 02/Dec/20 14:00
Worklog Time Spent: 10m 
  Work Description: zabetak commented on pull request #1714:
URL: https://github.com/apache/hive/pull/1714#issuecomment-737247269


   Hey @kgyrtkirk can you have another look mostly on 
https://github.com/apache/hive/pull/1714/commits/df6e610c7f7b11b0bf06b500b25613c1a811c055
 please? Thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519022)
Time Spent: 5h 50m  (was: 5h 40m)

> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: master355.tgz
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-21919) Refactor Driver

2020-12-02 Thread Miklos Gergely (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely resolved HIVE-21919.
---
Resolution: Fixed

> Refactor Driver
> ---
>
> Key: HIVE-21919
> URL: https://issues.apache.org/jira/browse/HIVE-21919
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: refactor-driver
> Fix For: 4.0.0
>
>
> The Driver class is 3000+ lines long. It does a lot of things, it's structure 
> is hard to follow. Need to put it into a cleaner form, and thus make it more 
> readable. It should be cut into many pieces for having separate classes for 
> different subtasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-24333) Cut long methods in Driver to smaller, more manageable pieces

2020-12-02 Thread Miklos Gergely (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely resolved HIVE-24333.
---
Resolution: Fixed

Merged to master, thank you [~belugabehr]

> Cut long methods in Driver to smaller, more manageable pieces
> -
>
> Key: HIVE-24333
> URL: https://issues.apache.org/jira/browse/HIVE-24333
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Some methods in Driver are too long to be easily understandable. They should 
> be cut into pieces to make them easier to understand.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24333) Cut long methods in Driver to smaller, more manageable pieces

2020-12-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24333?focusedWorklogId=518984=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-518984
 ]

ASF GitHub Bot logged work on HIVE-24333:
-

Author: ASF GitHub Bot
Created on: 02/Dec/20 12:51
Start Date: 02/Dec/20 12:51
Worklog Time Spent: 10m 
  Work Description: miklosgergely merged pull request #1629:
URL: https://github.com/apache/hive/pull/1629


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 518984)
Time Spent: 2h 50m  (was: 2h 40m)

> Cut long methods in Driver to smaller, more manageable pieces
> -
>
> Key: HIVE-24333
> URL: https://issues.apache.org/jira/browse/HIVE-24333
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Some methods in Driver are too long to be easily understandable. They should 
> be cut into pieces to make them easier to understand.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24450) DbNotificationListener Request Notification IDs in Batches

2020-12-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24450?focusedWorklogId=518891=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-518891
 ]

ASF GitHub Bot logged work on HIVE-24450:
-

Author: ASF GitHub Bot
Created on: 02/Dec/20 09:19
Start Date: 02/Dec/20 09:19
Worklog Time Spent: 10m 
  Work Description: aasha commented on pull request #1718:
URL: https://github.com/apache/hive/pull/1718#issuecomment-737101280


   In HA case how will the ordering of events be maintained? Acid Replication 
relies on the event sequence. So the ordering needs to be maintained.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 518891)
Time Spent: 0.5h  (was: 20m)

> DbNotificationListener Request Notification IDs in Batches
> --
>
> Key: HIVE-24450
> URL: https://issues.apache.org/jira/browse/HIVE-24450
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Every time a new notification event is logged into the database, the sequence 
> number for the ID of the even is incremented by one.  It is very standard in 
> database design to instead request a block of IDs for each fetch from the 
> database.  The sequence numbers are then handed out locally until the block 
> of IDs is exhausted.  This allows for fewer database round-trips and 
> transactions, at the expense of perhaps burning a few IDs.
> Burning of IDs happens when the server is restarted in the middle of a block 
> of sequence IDs.  That is, if the HMS requests a block of 10 ids, and only 
> three have been assigned, after the restart, the HMS will request another 
> block of 10, burning (wasting) 7 IDs.  As long as the blocks are not too 
> small, and restarts are infrequent, then few IDs are lost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

46 matches

Mail list logo