[jira] [Updated] (HIVE-25649) Backport HIVE-20638 and HIVE-22090 to branch-3 to upgrade Jetty to 9.3.27
[ https://issues.apache.org/jira/browse/HIVE-25649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma updated HIVE-25649: Status: Patch Available (was: Open) > Backport HIVE-20638 and HIVE-22090 to branch-3 to upgrade Jetty to 9.3.27 > - > > Key: HIVE-25649 > URL: https://issues.apache.org/jira/browse/HIVE-25649 > Project: Hive > Issue Type: Task >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Jetty in branch-3 is very old. So let's update to 9.3.27, which is the same > as the master branch. Although 9.3.27 is not the latest version, it fixes > some vulnerabilities. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25649) Backport HIVE-20638 and HIVE-22090 to branch-3 to upgrade Jetty to 9.3.27
[ https://issues.apache.org/jira/browse/HIVE-25649?focusedWorklogId=669882=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-669882 ] ASF GitHub Bot logged work on HIVE-25649: - Author: ASF GitHub Bot Created on: 26/Oct/21 03:03 Start Date: 26/Oct/21 03:03 Worklog Time Spent: 10m Work Description: tasanuma opened a new pull request #2746: URL: https://github.com/apache/hive/pull/2746 ### What changes were proposed in this pull request? Backport HIVE-20638 and HIVE-22090 to branch-3 to upgrade Jetty to 9.3.27. ### Why are the changes needed? Jetty in branch-3 is very old. So let's update to 9.3.27, which is the same as the master branch. Although 9.3.27 is not the latest version, it fixes some vulnerabilities. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 669882) Remaining Estimate: 0h Time Spent: 10m > Backport HIVE-20638 and HIVE-22090 to branch-3 to upgrade Jetty to 9.3.27 > - > > Key: HIVE-25649 > URL: https://issues.apache.org/jira/browse/HIVE-25649 > Project: Hive > Issue Type: Task >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Jetty in branch-3 is very old. So let's update to 9.3.27, which is the same > as the master branch. Although 9.3.27 is not the latest version, it fixes > some vulnerabilities. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25649) Backport HIVE-20638 and HIVE-22090 to branch-3 to upgrade Jetty to 9.3.27
[ https://issues.apache.org/jira/browse/HIVE-25649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25649: -- Labels: pull-request-available (was: ) > Backport HIVE-20638 and HIVE-22090 to branch-3 to upgrade Jetty to 9.3.27 > - > > Key: HIVE-25649 > URL: https://issues.apache.org/jira/browse/HIVE-25649 > Project: Hive > Issue Type: Task >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Jetty in branch-3 is very old. So let's update to 9.3.27, which is the same > as the master branch. Although 9.3.27 is not the latest version, it fixes > some vulnerabilities. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25649) Backport HIVE-20638 and HIVE-22090 to branch-3 to upgrade Jetty to 9.3.27
[ https://issues.apache.org/jira/browse/HIVE-25649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma reassigned HIVE-25649: --- > Backport HIVE-20638 and HIVE-22090 to branch-3 to upgrade Jetty to 9.3.27 > - > > Key: HIVE-25649 > URL: https://issues.apache.org/jira/browse/HIVE-25649 > Project: Hive > Issue Type: Task >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > > Jetty in branch-3 is very old. So let's update to 9.3.27, which is the same > as the master branch. Although 9.3.27 is not the latest version, it fixes > some vulnerabilities. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24590) Operation Logging still leaks the log4j Appenders
[ https://issues.apache.org/jira/browse/HIVE-24590?focusedWorklogId=669876=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-669876 ] ASF GitHub Bot logged work on HIVE-24590: - Author: ASF GitHub Bot Created on: 26/Oct/21 02:19 Start Date: 26/Oct/21 02:19 Worklog Time Spent: 10m Work Description: belugabehr commented on pull request #2432: URL: https://github.com/apache/hive/pull/2432#issuecomment-951493240 @zabetak Thanks for the clarification. Makes sense. So, what would the affect be if the time were to pass and another write request came in? Would it create a second file? Append to the end of an existing file? Is the original log files always deleted when the logger goes idle? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 669876) Time Spent: 3h 50m (was: 3h 40m) > Operation Logging still leaks the log4j Appenders > - > > Key: HIVE-24590 > URL: https://issues.apache.org/jira/browse/HIVE-24590 > Project: Hive > Issue Type: Bug > Components: Logging >Reporter: Eugene Chung >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Attachments: Screen Shot 2021-01-06 at 18.42.05.png, Screen Shot > 2021-01-06 at 18.42.24.png, Screen Shot 2021-01-06 at 18.42.55.png, Screen > Shot 2021-01-06 at 21.38.32.png, Screen Shot 2021-01-06 at 21.47.28.png, > Screen Shot 2021-01-08 at 21.01.40.png, add_debug_log_and_trace.patch > > Time Spent: 3h 50m > Remaining Estimate: 0h > > I'm using Hive 3.1.2 with options below. > * hive.server2.logging.operation.enabled=true > * hive.server2.logging.operation.level=VERBOSE > * hive.async.log.enabled=false > I already know the ticket, https://issues.apache.org/jira/browse/HIVE-17128 > but HS2 still leaks log4j RandomAccessFileManager. > !Screen Shot 2021-01-06 at 18.42.05.png|width=756,height=197! > I checked the operation log file which is not closed/deleted properly. > !Screen Shot 2021-01-06 at 18.42.24.png|width=603,height=272! > Then there's the log, > {code:java} > client.TezClient: Shutting down Tez Session, sessionName= {code} > !Screen Shot 2021-01-06 at 18.42.55.png|width=1372,height=26! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25648) HiveMetaHook not work well in HiveMetaStoreClient when commitCreateTable table failed!
[ https://issues.apache.org/jira/browse/HIVE-25648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LiuJun updated HIVE-25648: -- Description: {code:java} // source code in HiveMetaStoreClient.java -- createTable func public void createTable(Table tbl, EnvironmentContext envContext) throws AlreadyExistsException, InvalidObjectException, MetaException, NoSuchObjectException, TException { if (!tbl.isSetCatName()) { tbl.setCatName(getDefaultCatalog(conf)); } HiveMetaHook hook = getHook(tbl); if (hook != null) { hook.preCreateTable(tbl); } boolean success = false; try { // Subclasses can override this step (for example, for temporary tables) create_table_with_environment_context(tbl, envContext); *//create metadata record* if (hook != null) { hook.commitCreateTable(tbl); *//create table in external catalog* } success = true; } finally { if (!success && (hook != null)) { try { * // roll back from external catalog but without roll back from hive meta* hook.rollbackCreateTable(tbl); } catch (Exception e){ LOG.error("Create rollback failed with", e); } } } } {code} Accoriding to the source code above, when implementing hivemetastore's HiveMetaHook to create external catalog tables(may be hbase),firstly create meta records to the database such as pg, then call the commitCreateTable function to create table in hbase. Here comes the question: What if exception thrown when creating the real table in hbase, because meta data has been created so it is not in sync between Hive's metastore and hbase. I think it is necessary to rollback metadata from hivemetastore when failed to create table in external catalog by calling commitCreateTable, so that we can keep external catalog in sync with Hive's metastore. Please let me know if my idea is correct or I had an misunderstanding on how to use the HiveMetaHook mechanism correctly! was: {code:java} // source code in HiveMetaStoreClient.java -- createTable func public void createTable(Table tbl, EnvironmentContext envContext) throws AlreadyExistsException, InvalidObjectException, MetaException, NoSuchObjectException, TException { if (!tbl.isSetCatName()) { tbl.setCatName(getDefaultCatalog(conf)); } HiveMetaHook hook = getHook(tbl); if (hook != null) { hook.preCreateTable(tbl); } boolean success = false; try { // Subclasses can override this step (for example, for temporary tables) create_table_with_environment_context(tbl, envContext); *//create metadata record* if (hook != null) { hook.commitCreateTable(tbl); *//create table in external catalog* } success = true; } finally { if (!success && (hook != null)) { try { * // roll back from external catalog but without roll back from hive meta* hook.rollbackCreateTable(tbl); } catch (Exception e){ LOG.error("Create rollback failed with", e); } } } } {code} Accoriding to the source code above, when implementing hivemetastore's HiveMetaHook to create external catalog tables(may be hbase),firstly create meta records to the database such as pg, then call the commitCreateTable function to create table in hbase. Here comes the question: What if exception thrown when creating the real table in hbase, because meta data has been created so it is not in sync between Hive's metastore and hbase. I think it is necessary to rollback metadata from hivemetastore when failed to create table in external catalog by calling commitCreateTable, so that we can keep external catalog in sync with Hive's metastore. Please let me know if my idea is correct or I had an misunderstanding on how to use the HiveMetaHook mechanism correctly! Summary: HiveMetaHook not work well in HiveMetaStoreClient when commitCreateTable table failed! (was: Hook not work when commitdrop table failed) > HiveMetaHook not work well in HiveMetaStoreClient when commitCreateTable > table failed! > -- > > Key: HIVE-25648 > URL: https://issues.apache.org/jira/browse/HIVE-25648 > Project: Hive > Issue Type: Bug > Components: API, Hooks, Standalone Metastore >Affects Versions: 3.1.2 >Reporter: LiuJun >Priority: Major > > {code:java} > // source code in HiveMetaStoreClient.java -- createTable func > public void createTable(Table tbl, EnvironmentContext envContext) throws > AlreadyExistsException, > InvalidObjectException, MetaException, NoSuchObjectException, > TException { > if (!tbl.isSetCatName()) { > tbl.setCatName(getDefaultCatalog(conf)); > } >
[jira] [Work logged] (HIVE-25522) NullPointerException in TxnHandler
[ https://issues.apache.org/jira/browse/HIVE-25522?focusedWorklogId=669858=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-669858 ] ASF GitHub Bot logged work on HIVE-25522: - Author: ASF GitHub Bot Created on: 26/Oct/21 00:34 Start Date: 26/Oct/21 00:34 Worklog Time Spent: 10m Work Description: sunchao commented on a change in pull request #2647: URL: https://github.com/apache/hive/pull/2647#discussion_r736058006 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java ## @@ -5567,10 +5570,16 @@ private void removeTxnsFromMinHistoryLevel(Connection dbConn, List txnids) } } - private static synchronized DataSource setupJdbcConnectionPool(Configuration conf, int maxPoolSize, long getConnectionTimeoutMs) throws SQLException { + private static DataSource setupJdbcConnectionPool(Configuration conf, int maxPoolSize, long getConnectionTimeoutMs) { Review comment: hmm why remove `synchronized`? ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java ## @@ -5567,10 +5570,16 @@ private void removeTxnsFromMinHistoryLevel(Connection dbConn, List txnids) } } - private static synchronized DataSource setupJdbcConnectionPool(Configuration conf, int maxPoolSize, long getConnectionTimeoutMs) throws SQLException { + private static DataSource setupJdbcConnectionPool(Configuration conf, int maxPoolSize, long getConnectionTimeoutMs) { DataSourceProvider dsp = DataSourceProviderFactory.tryGetDataSourceProviderOrNull(conf); if (dsp != null) { - return dsp.create(conf); + try { +return dsp.create(conf); + } catch (SQLException e) { +String msg = "Unable to instantiate JDBC connection pooling, " + e.getMessage(); Review comment: maybe `LOG.error("Unable to instantiate JDBC connection pooling", e);`? ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java ## @@ -368,33 +368,37 @@ public TxnHandler() { public void setConf(Configuration conf){ this.conf = conf; +int maxPoolSize = MetastoreConf.getIntVar(conf, ConfVars.CONNECTION_POOLING_MAX_CONNECTIONS); +long getConnectionTimeoutMs = 3; synchronized (TxnHandler.class) { if (connPool == null) { -Connection dbConn = null; -// Set up the JDBC connection pool -try { - int maxPoolSize = MetastoreConf.getIntVar(conf, ConfVars.CONNECTION_POOLING_MAX_CONNECTIONS); - long getConnectionTimeoutMs = 3; - connPool = setupJdbcConnectionPool(conf, maxPoolSize, getConnectionTimeoutMs); - /*the mutex pools should ideally be somewhat larger since some operations require 1 +connPool = setupJdbcConnectionPool(conf, maxPoolSize, getConnectionTimeoutMs); + } + + if (connPoolMutex == null) { +/*the mutex pools should ideally be somewhat larger since some operations require 1 connection from each pool and we want to avoid taking a connection from primary pool and then blocking because mutex pool is empty. There is only 1 thread in any HMS trying to mutex on each MUTEX_KEY except MUTEX_KEY.CheckLock. The CheckLock operation gets a connection from connPool first, then connPoolMutex. All others, go in the opposite order (not very elegant...). So number of connection requests for connPoolMutex cannot exceed (size of connPool + MUTEX_KEY.values().length - 1).*/ - connPoolMutex = setupJdbcConnectionPool(conf, maxPoolSize + MUTEX_KEY.values().length, getConnectionTimeoutMs); - dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED); +connPoolMutex = setupJdbcConnectionPool(conf, maxPoolSize + MUTEX_KEY.values().length, getConnectionTimeoutMs); + } + + if (dbProduct == null) { +try (Connection dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED)) { determineDatabaseProduct(dbConn); - sqlGenerator = new SQLGenerator(dbProduct, conf); } catch (SQLException e) { - String msg = "Unable to instantiate JDBC connection pooling, " + e.getMessage(); + String msg = "Unable to determine database product, " + e.getMessage(); Review comment: ditto: `LOG.error("Unable to determine database product", e);` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ---
[jira] [Work logged] (HIVE-18920) CBO: Initialize the Janino providers ahead of 1st query
[ https://issues.apache.org/jira/browse/HIVE-18920?focusedWorklogId=669850=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-669850 ] ASF GitHub Bot logged work on HIVE-18920: - Author: ASF GitHub Bot Created on: 26/Oct/21 00:10 Start Date: 26/Oct/21 00:10 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #2596: URL: https://github.com/apache/hive/pull/2596#issuecomment-951435532 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 669850) Time Spent: 0.5h (was: 20m) > CBO: Initialize the Janino providers ahead of 1st query > --- > > Key: HIVE-18920 > URL: https://issues.apache.org/jira/browse/HIVE-18920 > Project: Hive > Issue Type: Bug > Components: CBO >Reporter: Gopal Vijayaraghavan >Assignee: Jesus Camacho Rodriguez >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-18920.01.patch, HIVE-18920.02.patch, > HIVE-18920.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > Hive Calcite metadata providers are compiled when the 1st query comes in. > If a second query arrives before the 1st one has built a metadata provider, > it will also try to do the same thing, because the cache is not populated yet. > With 1024 concurrent users, it takes 6 minutes for the 1st query to finish > fighting all the other queries which are trying to load that cache. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25647) hadoop指令备忘录
[ https://issues.apache.org/jira/browse/HIVE-25647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] St Li updated HIVE-25647: - Attachment: worldip.csv > hadoop指令备忘录 > --- > > Key: HIVE-25647 > URL: https://issues.apache.org/jira/browse/HIVE-25647 > Project: Hive > Issue Type: Wish > Components: Configuration >Affects Versions: 3.1.2 > Environment: hadoop 2.7.3 >Reporter: St Li >Assignee: St Li >Priority: Major > Fix For: All Versions > > Attachments: worldip.csv > > > do not care this just test > 关闭防火墙:systemctl stop firewalld > 查看状态:systemctl status firewalld > 选择时区:tzselect > echo "TZ='Asia/Shanghai'; export TZ" >> /etc/profile && source /etc/profile > yum install -y ntp > vim /etc/ntp.conf 注释掉server0-3 > 添加 fudge 127.127.1.0 stratum 10 > /bin/systemctl restart ntpd.service > ntpdate master //slaveshang > service crond status > /sbin/service crond start > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25647) hadoop指令备忘录
[ https://issues.apache.org/jira/browse/HIVE-25647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] St Li updated HIVE-25647: - Description: do not care this just test 关闭防火墙:systemctl stop firewalld 查看状态:systemctl status firewalld 选择时区:tzselect echo "TZ='Asia/Shanghai'; export TZ" >> /etc/profile && source /etc/profile yum install -y ntp vim /etc/ntp.conf 注释掉server0-3 添加 fudge 127.127.1.0 stratum 10 /bin/systemctl restart ntpd.service ntpdate master //slaveshang service crond status /sbin/service crond start was: do not care this just test > hadoop指令备忘录 > --- > > Key: HIVE-25647 > URL: https://issues.apache.org/jira/browse/HIVE-25647 > Project: Hive > Issue Type: Wish > Components: Configuration >Affects Versions: 3.1.2 > Environment: hadoop 2.7.3 >Reporter: St Li >Assignee: St Li >Priority: Major > Fix For: All Versions > > > do not care this just test > 关闭防火墙:systemctl stop firewalld > 查看状态:systemctl status firewalld > 选择时区:tzselect > echo "TZ='Asia/Shanghai'; export TZ" >> /etc/profile && source /etc/profile > yum install -y ntp > vim /etc/ntp.conf 注释掉server0-3 > 添加 fudge 127.127.1.0 stratum 10 > /bin/systemctl restart ntpd.service > ntpdate master //slaveshang > service crond status > /sbin/service crond start > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25647) hadoop指令备忘录
[ https://issues.apache.org/jira/browse/HIVE-25647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] St Li updated HIVE-25647: - Description: do not care this just test was: do not care this just test > hadoop指令备忘录 > --- > > Key: HIVE-25647 > URL: https://issues.apache.org/jira/browse/HIVE-25647 > Project: Hive > Issue Type: Wish > Components: Configuration >Affects Versions: 3.1.2 > Environment: hadoop 2.7.3 >Reporter: St Li >Assignee: St Li >Priority: Major > Fix For: All Versions > > > do not care this just test > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25647) hadoop指令备忘录
[ https://issues.apache.org/jira/browse/HIVE-25647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] St Li reassigned HIVE-25647: > hadoop指令备忘录 > --- > > Key: HIVE-25647 > URL: https://issues.apache.org/jira/browse/HIVE-25647 > Project: Hive > Issue Type: Wish > Components: Configuration >Affects Versions: 3.1.2 > Environment: hadoop 2.7.3 >Reporter: St Li >Assignee: St Li >Priority: Major > Fix For: All Versions > > > do not care this > just test -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24590) Operation Logging still leaks the log4j Appenders
[ https://issues.apache.org/jira/browse/HIVE-24590?focusedWorklogId=669778=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-669778 ] ASF GitHub Bot logged work on HIVE-24590: - Author: ASF GitHub Bot Created on: 25/Oct/21 20:23 Start Date: 25/Oct/21 20:23 Worklog Time Spent: 10m Work Description: zabetak commented on pull request #2432: URL: https://github.com/apache/hive/pull/2432#issuecomment-951288032 > Ya, nothing about this solution feels ideal. How was the default of 60s chosen? That seems pretty short to me if there's a stall in the query engine (or anywhere else). Even if there is a big delay and the 60sec window passes there is no problem. The appender will be closed and it will reopen again when the next log event arrives. Having a purge policy guarantees that there will be no leak no matter what happens. Something that may become problematic is the arrival of many (in the order of thousands) queries in the 60 sec window. This will lead to the creation of many appenders and usage of many file descriptors. However, this might never be a problem for Hive, at least not before other parts of the system need to be fixed first. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 669778) Time Spent: 3h 40m (was: 3.5h) > Operation Logging still leaks the log4j Appenders > - > > Key: HIVE-24590 > URL: https://issues.apache.org/jira/browse/HIVE-24590 > Project: Hive > Issue Type: Bug > Components: Logging >Reporter: Eugene Chung >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Attachments: Screen Shot 2021-01-06 at 18.42.05.png, Screen Shot > 2021-01-06 at 18.42.24.png, Screen Shot 2021-01-06 at 18.42.55.png, Screen > Shot 2021-01-06 at 21.38.32.png, Screen Shot 2021-01-06 at 21.47.28.png, > Screen Shot 2021-01-08 at 21.01.40.png, add_debug_log_and_trace.patch > > Time Spent: 3h 40m > Remaining Estimate: 0h > > I'm using Hive 3.1.2 with options below. > * hive.server2.logging.operation.enabled=true > * hive.server2.logging.operation.level=VERBOSE > * hive.async.log.enabled=false > I already know the ticket, https://issues.apache.org/jira/browse/HIVE-17128 > but HS2 still leaks log4j RandomAccessFileManager. > !Screen Shot 2021-01-06 at 18.42.05.png|width=756,height=197! > I checked the operation log file which is not closed/deleted properly. > !Screen Shot 2021-01-06 at 18.42.24.png|width=603,height=272! > Then there's the log, > {code:java} > client.TezClient: Shutting down Tez Session, sessionName= {code} > !Screen Shot 2021-01-06 at 18.42.55.png|width=1372,height=26! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24590) Operation Logging still leaks the log4j Appenders
[ https://issues.apache.org/jira/browse/HIVE-24590?focusedWorklogId=669740=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-669740 ] ASF GitHub Bot logged work on HIVE-24590: - Author: ASF GitHub Bot Created on: 25/Oct/21 19:37 Start Date: 25/Oct/21 19:37 Worklog Time Spent: 10m Work Description: belugabehr commented on pull request #2432: URL: https://github.com/apache/hive/pull/2432#issuecomment-951245726 Hey, > Ideally the whole programmatic registration of the appender should be replaced by log4j.properties or other more standard mechanism but this is a bit more complicated to do so I would leave it outside the scope of this PR. Ya, nothing about this solution feels ideal. How was the default of 60s chosen? That seems pretty short to me if there's a stall in the query engine (or anywhere else). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 669740) Time Spent: 3.5h (was: 3h 20m) > Operation Logging still leaks the log4j Appenders > - > > Key: HIVE-24590 > URL: https://issues.apache.org/jira/browse/HIVE-24590 > Project: Hive > Issue Type: Bug > Components: Logging >Reporter: Eugene Chung >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Attachments: Screen Shot 2021-01-06 at 18.42.05.png, Screen Shot > 2021-01-06 at 18.42.24.png, Screen Shot 2021-01-06 at 18.42.55.png, Screen > Shot 2021-01-06 at 21.38.32.png, Screen Shot 2021-01-06 at 21.47.28.png, > Screen Shot 2021-01-08 at 21.01.40.png, add_debug_log_and_trace.patch > > Time Spent: 3.5h > Remaining Estimate: 0h > > I'm using Hive 3.1.2 with options below. > * hive.server2.logging.operation.enabled=true > * hive.server2.logging.operation.level=VERBOSE > * hive.async.log.enabled=false > I already know the ticket, https://issues.apache.org/jira/browse/HIVE-17128 > but HS2 still leaks log4j RandomAccessFileManager. > !Screen Shot 2021-01-06 at 18.42.05.png|width=756,height=197! > I checked the operation log file which is not closed/deleted properly. > !Screen Shot 2021-01-06 at 18.42.24.png|width=603,height=272! > Then there's the log, > {code:java} > client.TezClient: Shutting down Tez Session, sessionName= {code} > !Screen Shot 2021-01-06 at 18.42.55.png|width=1372,height=26! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25522) NullPointerException in TxnHandler
[ https://issues.apache.org/jira/browse/HIVE-25522?focusedWorklogId=669684=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-669684 ] ASF GitHub Bot logged work on HIVE-25522: - Author: ASF GitHub Bot Created on: 25/Oct/21 17:57 Start Date: 25/Oct/21 17:57 Worklog Time Spent: 10m Work Description: szehon-ho commented on pull request #2647: URL: https://github.com/apache/hive/pull/2647#issuecomment-951166539 Looks like random error? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 669684) Time Spent: 9h (was: 8h 50m) > NullPointerException in TxnHandler > -- > > Key: HIVE-25522 > URL: https://issues.apache.org/jira/browse/HIVE-25522 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Affects Versions: 3.1.2 >Reporter: Szehon Ho >Assignee: Szehon Ho >Priority: Major > Labels: pull-request-available > Time Spent: 9h > Remaining Estimate: 0h > > Environment: Using Iceberg on Hive 3.1.2 standalone metastore. Iceberg > issues a lot of lock() calls for commits. > We hit randomly a strange NPE that fails Iceberg commits. > {noformat} > 2021-08-21T11:08:05,665 ERROR [pool-6-thread-195] > metastore.RetryingHMSHandler: java.lang.NullPointerException > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.enqueueLockWithRetry(TxnHandler.java:1903) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:1827) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:7217) > at jdk.internal.reflect.GeneratedMethodAccessor52.invoke(Unknown Source) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) > at com.sun.proxy.$Proxy27.lock(Unknown Source) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$lock.getResult(ThriftHiveMetastore.java:18111) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$lock.getResult(ThriftHiveMetastore.java:18095) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107) > at java.base/java.security.AccessController.doPrivileged(Native Method) > at java.base/javax.security.auth.Subject.doAs(Subject.java:423) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > 2021-08-21T11:08:05,665 ERROR [pool-6-thread-195] server.TThreadPoolServer: > Error occurred during processing of message. > java.lang.NullPointerException: null > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.enqueueLockWithRetry(TxnHandler.java:1903) > ~[hive-exec-3.1.2.jar:3.1.2] > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:1827) > ~[hive-exec-3.1.2.jar:3.1.2] > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:7217) > ~[hive-exec-3.1.2.jar:3.1.2] > at jdk.internal.reflect.GeneratedMethodAccessor52.invoke(Unknown > Source) ~[?:?] > at > jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:?] > at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?] > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) > ~[hive-exec-3.1.2.jar:3.1.2] > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) >
[jira] [Assigned] (HIVE-25645) Query-based compaction doesn't work when partition column type is boolean
[ https://issues.apache.org/jira/browse/HIVE-25645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denys Kuzmenko reassigned HIVE-25645: - Assignee: Denys Kuzmenko > Query-based compaction doesn't work when partition column type is boolean > - > > Key: HIVE-25645 > URL: https://issues.apache.org/jira/browse/HIVE-25645 > Project: Hive > Issue Type: Task >Reporter: Denys Kuzmenko >Assignee: Denys Kuzmenko >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25553) Support Map data-type natively in Arrow format
[ https://issues.apache.org/jira/browse/HIVE-25553?focusedWorklogId=669599=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-669599 ] ASF GitHub Bot logged work on HIVE-25553: - Author: ASF GitHub Bot Created on: 25/Oct/21 15:09 Start Date: 25/Oct/21 15:09 Worklog Time Spent: 10m Work Description: warriersruthi commented on pull request #2689: URL: https://github.com/apache/hive/pull/2689#issuecomment-951025684 Thanks, Sankar for your prompt reviews and assistance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 669599) Time Spent: 2.5h (was: 2h 20m) > Support Map data-type natively in Arrow format > -- > > Key: HIVE-25553 > URL: https://issues.apache.org/jira/browse/HIVE-25553 > Project: Hive > Issue Type: Improvement > Components: llap, Serializers/Deserializers >Reporter: Adesh Kumar Rao >Assignee: Sruthi Mooriyathvariam >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > Currently ArrowColumnarBatchSerDe converts map datatype as a list of structs > data-type (where stuct is containing the key-value pair of the map). This > causes issues when reading Map datatype using llap-ext-client as it reads a > list of structs instead. > HiveWarehouseConnector which uses the llap-ext-client throws exception when > the schema (containing Map data type) is different from actual data (list of > structs). > > Fixing this issue requires upgrading arrow version (where map data-type is > supported), modifying ArrowColumnarBatchSerDe and corresponding > Serializer/Deserializer to not use list as a workaround for map and use the > arrow map data-type instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25642) Log a warning if multiple Compaction Worker versions are running compactions
[ https://issues.apache.org/jira/browse/HIVE-25642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17433791#comment-17433791 ] Csomor Viktor commented on HIVE-25642: -- Pull request: https://github.com/apache/hive/pull/2743 > Log a warning if multiple Compaction Worker versions are running compactions > > > Key: HIVE-25642 > URL: https://issues.apache.org/jira/browse/HIVE-25642 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 4.0.0 >Reporter: Csomor Viktor >Assignee: Csomor Viktor >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Log a warning if multiple versions of a Compaction Workers are running > compactions. > The start time of the individual HMS services are not stored at the moment, > however this information could proved a good baseline of detecting multiple > Worker versions. > Due to the lack of this information we can check periodically in the past N > hours to detect the versions. > The N hours can be configured by > {{metastore.compactor.worker.detect_multiple_versions.threshold}} property. > This periodical check only make sense if the Compaction are enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25555) ArrowColumnarBatchSerDe should store map natively instead of converting to list
[ https://issues.apache.org/jira/browse/HIVE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-2: Component/s: Serializers/Deserializers llap > ArrowColumnarBatchSerDe should store map natively instead of converting to > list > --- > > Key: HIVE-2 > URL: https://issues.apache.org/jira/browse/HIVE-2 > Project: Hive > Issue Type: Sub-task > Components: llap, Reader, Serializers/Deserializers >Affects Versions: 3.1.2 >Reporter: Adesh Kumar Rao >Assignee: Sruthi Mooriyathvariam >Priority: Major > Fix For: 4.0.0 > > > This should also take of creating non-nullable struct and non-nullable key > type for the map data-type. Currently, list does not care about child type to > be nullable/non-nullable. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25554) Upgrade arrow version to 0.15
[ https://issues.apache.org/jira/browse/HIVE-25554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-25554: Affects Version/s: 3.1.2 > Upgrade arrow version to 0.15 > - > > Key: HIVE-25554 > URL: https://issues.apache.org/jira/browse/HIVE-25554 > Project: Hive > Issue Type: Sub-task >Affects Versions: 3.1.2 >Reporter: Adesh Kumar Rao >Assignee: Sruthi Mooriyathvariam >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25554) Upgrade arrow version to 0.15
[ https://issues.apache.org/jira/browse/HIVE-25554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-25554: Component/s: Serializers/Deserializers Reader llap > Upgrade arrow version to 0.15 > - > > Key: HIVE-25554 > URL: https://issues.apache.org/jira/browse/HIVE-25554 > Project: Hive > Issue Type: Sub-task > Components: llap, Reader, Serializers/Deserializers >Affects Versions: 3.1.2 >Reporter: Adesh Kumar Rao >Assignee: Sruthi Mooriyathvariam >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-25555) ArrowColumnarBatchSerDe should store map natively instead of converting to list
[ https://issues.apache.org/jira/browse/HIVE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan resolved HIVE-2. - Fix Version/s: 4.0.0 Target Version/s: 4.0.0 Resolution: Fixed Refer https://issues.apache.org/jira/browse/HIVE-25553 > ArrowColumnarBatchSerDe should store map natively instead of converting to > list > --- > > Key: HIVE-2 > URL: https://issues.apache.org/jira/browse/HIVE-2 > Project: Hive > Issue Type: Sub-task >Reporter: Adesh Kumar Rao >Assignee: Sruthi Mooriyathvariam >Priority: Major > Fix For: 4.0.0 > > > This should also take of creating non-nullable struct and non-nullable key > type for the map data-type. Currently, list does not care about child type to > be nullable/non-nullable. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25555) ArrowColumnarBatchSerDe should store map natively instead of converting to list
[ https://issues.apache.org/jira/browse/HIVE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-2: Affects Version/s: 3.1.2 > ArrowColumnarBatchSerDe should store map natively instead of converting to > list > --- > > Key: HIVE-2 > URL: https://issues.apache.org/jira/browse/HIVE-2 > Project: Hive > Issue Type: Sub-task >Affects Versions: 3.1.2 >Reporter: Adesh Kumar Rao >Assignee: Sruthi Mooriyathvariam >Priority: Major > Fix For: 4.0.0 > > > This should also take of creating non-nullable struct and non-nullable key > type for the map data-type. Currently, list does not care about child type to > be nullable/non-nullable. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25555) ArrowColumnarBatchSerDe should store map natively instead of converting to list
[ https://issues.apache.org/jira/browse/HIVE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-2: Component/s: Reader > ArrowColumnarBatchSerDe should store map natively instead of converting to > list > --- > > Key: HIVE-2 > URL: https://issues.apache.org/jira/browse/HIVE-2 > Project: Hive > Issue Type: Sub-task > Components: Reader >Affects Versions: 3.1.2 >Reporter: Adesh Kumar Rao >Assignee: Sruthi Mooriyathvariam >Priority: Major > Fix For: 4.0.0 > > > This should also take of creating non-nullable struct and non-nullable key > type for the map data-type. Currently, list does not care about child type to > be nullable/non-nullable. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-25554) Upgrade arrow version to 0.15
[ https://issues.apache.org/jira/browse/HIVE-25554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan resolved HIVE-25554. - Fix Version/s: 4.0.0 Resolution: Fixed Refer https://issues.apache.org/jira/browse/HIVE-25553 > Upgrade arrow version to 0.15 > - > > Key: HIVE-25554 > URL: https://issues.apache.org/jira/browse/HIVE-25554 > Project: Hive > Issue Type: Sub-task >Reporter: Adesh Kumar Rao >Assignee: Sruthi Mooriyathvariam >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-25553) Support Map data-type natively in Arrow format
[ https://issues.apache.org/jira/browse/HIVE-25553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan resolved HIVE-25553. - Target Version/s: 4.0.0 Resolution: Fixed Thanks [~warriersruthi] for the contribution! Patch merged to master! > Support Map data-type natively in Arrow format > -- > > Key: HIVE-25553 > URL: https://issues.apache.org/jira/browse/HIVE-25553 > Project: Hive > Issue Type: Improvement > Components: llap, Serializers/Deserializers >Reporter: Adesh Kumar Rao >Assignee: Sruthi Mooriyathvariam >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Currently ArrowColumnarBatchSerDe converts map datatype as a list of structs > data-type (where stuct is containing the key-value pair of the map). This > causes issues when reading Map datatype using llap-ext-client as it reads a > list of structs instead. > HiveWarehouseConnector which uses the llap-ext-client throws exception when > the schema (containing Map data type) is different from actual data (list of > structs). > > Fixing this issue requires upgrading arrow version (where map data-type is > supported), modifying ArrowColumnarBatchSerDe and corresponding > Serializer/Deserializer to not use list as a workaround for map and use the > arrow map data-type instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25553) Support Map data-type natively in Arrow format
[ https://issues.apache.org/jira/browse/HIVE-25553?focusedWorklogId=669562=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-669562 ] ASF GitHub Bot logged work on HIVE-25553: - Author: ASF GitHub Bot Created on: 25/Oct/21 14:15 Start Date: 25/Oct/21 14:15 Worklog Time Spent: 10m Work Description: sankarh merged pull request #2689: URL: https://github.com/apache/hive/pull/2689 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 669562) Time Spent: 2h 20m (was: 2h 10m) > Support Map data-type natively in Arrow format > -- > > Key: HIVE-25553 > URL: https://issues.apache.org/jira/browse/HIVE-25553 > Project: Hive > Issue Type: Improvement > Components: llap, Serializers/Deserializers >Reporter: Adesh Kumar Rao >Assignee: Sruthi Mooriyathvariam >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Currently ArrowColumnarBatchSerDe converts map datatype as a list of structs > data-type (where stuct is containing the key-value pair of the map). This > causes issues when reading Map datatype using llap-ext-client as it reads a > list of structs instead. > HiveWarehouseConnector which uses the llap-ext-client throws exception when > the schema (containing Map data type) is different from actual data (list of > structs). > > Fixing this issue requires upgrading arrow version (where map data-type is > supported), modifying ArrowColumnarBatchSerDe and corresponding > Serializer/Deserializer to not use list as a workaround for map and use the > arrow map data-type instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HIVE-25639) Exclude tomcat-embed-core from libthrift
[ https://issues.apache.org/jira/browse/HIVE-25639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-25639 started by Ranith Sardar. > Exclude tomcat-embed-core from libthrift > > > Key: HIVE-25639 > URL: https://issues.apache.org/jira/browse/HIVE-25639 > Project: Hive > Issue Type: Bug > Components: Thrift API >Reporter: Ranith Sardar >Assignee: Ranith Sardar >Priority: Major > > After Thrift dependency up-gradation to 0.14.1 to fix a known CVE but a > dependency issue in libthrift brings in tomcat-embed-core which has many > vulnerabilities. See: THRIFT-5375 > Since this dependency is used in Thrift only for a test we can safely exclude > it inside Hive. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25643) Disable replace cols and change col commands for migrated Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-25643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marton Bod updated HIVE-25643: -- Description: Since the Iceberg table migration will intentionally not rewrite the data files, the migrated table will end up with data files that do not contain the Iceberg field IDs necessary for safe, reliable schema evolution. For this purpose, we should disallow the REPLACE COLUMNS and CHANGE COLUMN commands for these migrated Iceberg tables. ADD COLUMNS are still permitted. (was: Since the Iceberg table migration will intentionally not rewrite the data files, the migrated table will end up with data files that do not contain the Iceberg field IDs necessary for safe, reliable schema migration. For this purpose, we should disallow the REPLACE COLUMNS and CHANGE COLUMN commands for these migrated Iceberg tables. ADD COLUMNS are still permitted.) > Disable replace cols and change col commands for migrated Iceberg tables > > > Key: HIVE-25643 > URL: https://issues.apache.org/jira/browse/HIVE-25643 > Project: Hive > Issue Type: Improvement >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > > Since the Iceberg table migration will intentionally not rewrite the data > files, the migrated table will end up with data files that do not contain the > Iceberg field IDs necessary for safe, reliable schema evolution. For this > purpose, we should disallow the REPLACE COLUMNS and CHANGE COLUMN commands > for these migrated Iceberg tables. ADD COLUMNS are still permitted. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25643) Disable replace cols and change col commands for migrated Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-25643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marton Bod reassigned HIVE-25643: - > Disable replace cols and change col commands for migrated Iceberg tables > > > Key: HIVE-25643 > URL: https://issues.apache.org/jira/browse/HIVE-25643 > Project: Hive > Issue Type: Improvement >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > > Since the Iceberg table migration will intentionally not rewrite the data > files, the migrated table will end up with data files that do not contain the > Iceberg field IDs necessary for safe, reliable schema migration. For this > purpose, we should disallow the REPLACE COLUMNS and CHANGE COLUMN commands > for these migrated Iceberg tables. ADD COLUMNS are still permitted. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25555) ArrowColumnarBatchSerDe should store map natively instead of converting to list
[ https://issues.apache.org/jira/browse/HIVE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sruthi Mooriyathvariam reassigned HIVE-2: - Assignee: Sruthi Mooriyathvariam (was: Adesh Kumar Rao) > ArrowColumnarBatchSerDe should store map natively instead of converting to > list > --- > > Key: HIVE-2 > URL: https://issues.apache.org/jira/browse/HIVE-2 > Project: Hive > Issue Type: Sub-task >Reporter: Adesh Kumar Rao >Assignee: Sruthi Mooriyathvariam >Priority: Major > > This should also take of creating non-nullable struct and non-nullable key > type for the map data-type. Currently, list does not care about child type to > be nullable/non-nullable. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25554) Upgrade arrow version to 0.15
[ https://issues.apache.org/jira/browse/HIVE-25554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sruthi Mooriyathvariam reassigned HIVE-25554: - Assignee: Sruthi Mooriyathvariam (was: Adesh Kumar Rao) > Upgrade arrow version to 0.15 > - > > Key: HIVE-25554 > URL: https://issues.apache.org/jira/browse/HIVE-25554 > Project: Hive > Issue Type: Sub-task >Reporter: Adesh Kumar Rao >Assignee: Sruthi Mooriyathvariam >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25553) Support Map data-type natively in Arrow format
[ https://issues.apache.org/jira/browse/HIVE-25553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sruthi Mooriyathvariam reassigned HIVE-25553: - Assignee: Sruthi Mooriyathvariam (was: Adesh Kumar Rao) > Support Map data-type natively in Arrow format > -- > > Key: HIVE-25553 > URL: https://issues.apache.org/jira/browse/HIVE-25553 > Project: Hive > Issue Type: Improvement > Components: llap, Serializers/Deserializers >Reporter: Adesh Kumar Rao >Assignee: Sruthi Mooriyathvariam >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Currently ArrowColumnarBatchSerDe converts map datatype as a list of structs > data-type (where stuct is containing the key-value pair of the map). This > causes issues when reading Map datatype using llap-ext-client as it reads a > list of structs instead. > HiveWarehouseConnector which uses the llap-ext-client throws exception when > the schema (containing Map data type) is different from actual data (list of > structs). > > Fixing this issue requires upgrading arrow version (where map data-type is > supported), modifying ArrowColumnarBatchSerDe and corresponding > Serializer/Deserializer to not use list as a workaround for map and use the > arrow map data-type instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25642) Log a warning if multiple Compaction Worker versions are running compactions
[ https://issues.apache.org/jira/browse/HIVE-25642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25642: -- Labels: pull-request-available (was: ) > Log a warning if multiple Compaction Worker versions are running compactions > > > Key: HIVE-25642 > URL: https://issues.apache.org/jira/browse/HIVE-25642 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 4.0.0 >Reporter: Csomor Viktor >Assignee: Csomor Viktor >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Log a warning if multiple versions of a Compaction Workers are running > compactions. > The start time of the individual HMS services are not stored at the moment, > however this information could proved a good baseline of detecting multiple > Worker versions. > Due to the lack of this information we can check periodically in the past N > hours to detect the versions. > The N hours can be configured by > {{metastore.compactor.worker.detect_multiple_versions.threshold}} property. > This periodical check only make sense if the Compaction are enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25642) Log a warning if multiple Compaction Worker versions are running compactions
[ https://issues.apache.org/jira/browse/HIVE-25642?focusedWorklogId=669473=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-669473 ] ASF GitHub Bot logged work on HIVE-25642: - Author: ASF GitHub Bot Created on: 25/Oct/21 11:52 Start Date: 25/Oct/21 11:52 Worklog Time Spent: 10m Work Description: vcsomor opened a new pull request #2743: URL: https://github.com/apache/hive/pull/2743 Log a warning if multiple Compaction Worker versions are running compactions - New property has been added `metastore.compactor.worker.detect_multiple_versions.threshold` - The detection implementation has added to the AcidMetricService and work only if the metrics are enabled -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 669473) Remaining Estimate: 0h Time Spent: 10m > Log a warning if multiple Compaction Worker versions are running compactions > > > Key: HIVE-25642 > URL: https://issues.apache.org/jira/browse/HIVE-25642 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 4.0.0 >Reporter: Csomor Viktor >Assignee: Csomor Viktor >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > Log a warning if multiple versions of a Compaction Workers are running > compactions. > The start time of the individual HMS services are not stored at the moment, > however this information could proved a good baseline of detecting multiple > Worker versions. > Due to the lack of this information we can check periodically in the past N > hours to detect the versions. > The N hours can be configured by > {{metastore.compactor.worker.detect_multiple_versions.threshold}} property. > This periodical check only make sense if the Compaction are enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25553) Support Map data-type natively in Arrow format
[ https://issues.apache.org/jira/browse/HIVE-25553?focusedWorklogId=669471=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-669471 ] ASF GitHub Bot logged work on HIVE-25553: - Author: ASF GitHub Bot Created on: 25/Oct/21 11:43 Start Date: 25/Oct/21 11:43 Worklog Time Spent: 10m Work Description: warriersruthi commented on a change in pull request #2689: URL: https://github.com/apache/hive/pull/2689#discussion_r735516231 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/arrow/Serializer.java ## @@ -226,7 +226,7 @@ public ArrowWrapperWritable serializeBatch(VectorizedRowBatch vectorizedRowBatch } private static FieldType toFieldType(TypeInfo typeInfo) { -return new FieldType(true, toArrowType(typeInfo), null); +return new FieldType(false, toArrowType(typeInfo), null); Review comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 669471) Time Spent: 2h 10m (was: 2h) > Support Map data-type natively in Arrow format > -- > > Key: HIVE-25553 > URL: https://issues.apache.org/jira/browse/HIVE-25553 > Project: Hive > Issue Type: Improvement > Components: llap, Serializers/Deserializers >Reporter: Adesh Kumar Rao >Assignee: Adesh Kumar Rao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Currently ArrowColumnarBatchSerDe converts map datatype as a list of structs > data-type (where stuct is containing the key-value pair of the map). This > causes issues when reading Map datatype using llap-ext-client as it reads a > list of structs instead. > HiveWarehouseConnector which uses the llap-ext-client throws exception when > the schema (containing Map data type) is different from actual data (list of > structs). > > Fixing this issue requires upgrading arrow version (where map data-type is > supported), modifying ArrowColumnarBatchSerDe and corresponding > Serializer/Deserializer to not use list as a workaround for map and use the > arrow map data-type instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25553) Support Map data-type natively in Arrow format
[ https://issues.apache.org/jira/browse/HIVE-25553?focusedWorklogId=669470=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-669470 ] ASF GitHub Bot logged work on HIVE-25553: - Author: ASF GitHub Bot Created on: 25/Oct/21 11:41 Start Date: 25/Oct/21 11:41 Worklog Time Spent: 10m Work Description: warriersruthi commented on a change in pull request #2689: URL: https://github.com/apache/hive/pull/2689#discussion_r735515232 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java ## @@ -160,7 +161,7 @@ private static Field toField(String name, TypeInfo typeInfo) { final ListTypeInfo listTypeInfo = (ListTypeInfo) typeInfo; final TypeInfo elementTypeInfo = listTypeInfo.getListElementTypeInfo(); return new Field(name, FieldType.nullable(MinorType.LIST.getType()), -Lists.newArrayList(toField(DEFAULT_ARROW_FIELD_NAME, elementTypeInfo))); +Lists.newArrayList(toField(name, elementTypeInfo))); Review comment: This change is not required. Removed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 669470) Time Spent: 2h (was: 1h 50m) > Support Map data-type natively in Arrow format > -- > > Key: HIVE-25553 > URL: https://issues.apache.org/jira/browse/HIVE-25553 > Project: Hive > Issue Type: Improvement > Components: llap, Serializers/Deserializers >Reporter: Adesh Kumar Rao >Assignee: Adesh Kumar Rao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Currently ArrowColumnarBatchSerDe converts map datatype as a list of structs > data-type (where stuct is containing the key-value pair of the map). This > causes issues when reading Map datatype using llap-ext-client as it reads a > list of structs instead. > HiveWarehouseConnector which uses the llap-ext-client throws exception when > the schema (containing Map data type) is different from actual data (list of > structs). > > Fixing this issue requires upgrading arrow version (where map data-type is > supported), modifying ArrowColumnarBatchSerDe and corresponding > Serializer/Deserializer to not use list as a workaround for map and use the > arrow map data-type instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25553) Support Map data-type natively in Arrow format
[ https://issues.apache.org/jira/browse/HIVE-25553?focusedWorklogId=669469=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-669469 ] ASF GitHub Bot logged work on HIVE-25553: - Author: ASF GitHub Bot Created on: 25/Oct/21 11:37 Start Date: 25/Oct/21 11:37 Worklog Time Spent: 10m Work Description: warriersruthi commented on a change in pull request #2689: URL: https://github.com/apache/hive/pull/2689#discussion_r735512505 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java ## @@ -170,7 +171,7 @@ private static Field toField(String name, TypeInfo typeInfo) { for (int i = 0; i < structSize; i++) { structFields.add(toField(fieldNames.get(i), fieldTypeInfos.get(i))); } -return new Field(name, FieldType.nullable(MinorType.STRUCT.getType()), structFields); +return new Field(name, new FieldType(false, new ArrowType.Struct(), null), structFields); Review comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 669469) Time Spent: 1h 50m (was: 1h 40m) > Support Map data-type natively in Arrow format > -- > > Key: HIVE-25553 > URL: https://issues.apache.org/jira/browse/HIVE-25553 > Project: Hive > Issue Type: Improvement > Components: llap, Serializers/Deserializers >Reporter: Adesh Kumar Rao >Assignee: Adesh Kumar Rao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Currently ArrowColumnarBatchSerDe converts map datatype as a list of structs > data-type (where stuct is containing the key-value pair of the map). This > causes issues when reading Map datatype using llap-ext-client as it reads a > list of structs instead. > HiveWarehouseConnector which uses the llap-ext-client throws exception when > the schema (containing Map data type) is different from actual data (list of > structs). > > Fixing this issue requires upgrading arrow version (where map data-type is > supported), modifying ArrowColumnarBatchSerDe and corresponding > Serializer/Deserializer to not use list as a workaround for map and use the > arrow map data-type instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25553) Support Map data-type natively in Arrow format
[ https://issues.apache.org/jira/browse/HIVE-25553?focusedWorklogId=669468=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-669468 ] ASF GitHub Bot logged work on HIVE-25553: - Author: ASF GitHub Bot Created on: 25/Oct/21 11:36 Start Date: 25/Oct/21 11:36 Worklog Time Spent: 10m Work Description: warriersruthi commented on a change in pull request #2689: URL: https://github.com/apache/hive/pull/2689#discussion_r735512043 ## File path: ql/src/java/org/apache/hadoop/hive/llap/WritableByteChannelAdapter.java ## @@ -93,7 +93,7 @@ public int write(ByteBuffer src) throws IOException { int size = src.remaining(); //Down the semaphore or block until available takeWriteResources(1); -ByteBuf buf = allocator.buffer(size); +ByteBuf buf = allocator.getAsByteBufAllocator().buffer(size); Review comment: Due to a change in the corresponding API in the latest arrow version, this fix is required to avoid build failures. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 669468) Time Spent: 1h 40m (was: 1.5h) > Support Map data-type natively in Arrow format > -- > > Key: HIVE-25553 > URL: https://issues.apache.org/jira/browse/HIVE-25553 > Project: Hive > Issue Type: Improvement > Components: llap, Serializers/Deserializers >Reporter: Adesh Kumar Rao >Assignee: Adesh Kumar Rao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Currently ArrowColumnarBatchSerDe converts map datatype as a list of structs > data-type (where stuct is containing the key-value pair of the map). This > causes issues when reading Map datatype using llap-ext-client as it reads a > list of structs instead. > HiveWarehouseConnector which uses the llap-ext-client throws exception when > the schema (containing Map data type) is different from actual data (list of > structs). > > Fixing this issue requires upgrading arrow version (where map data-type is > supported), modifying ArrowColumnarBatchSerDe and corresponding > Serializer/Deserializer to not use list as a workaround for map and use the > arrow map data-type instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25642) Log a warning if multiple Compaction Worker versions are running compactions
[ https://issues.apache.org/jira/browse/HIVE-25642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Csomor Viktor updated HIVE-25642: - Description: Log a warning if multiple versions of a Compaction Workers are running compactions. The start time of the individual HMS services are not stored at the moment, however this information could proved a good baseline of detecting multiple Worker versions. Due to the lack of this information we can check periodically in the past N hours to detect the versions. The N hours can be configured by {{metastore.compactor.worker.detect_multiple_versions.threshold}} property. This periodical check only make sense if the Compaction are enabled. was: Log a warning if multiple versions of a Compaction Workers are running. The start time of the individual HMS services are not stored at the moment, however this information could proved a good baseline of detecting multiple Worker versions. Due to the lack of this information we can check periodically in the past N hours to detect the versions. The N hours can be configured by {{metastore.compactor.worker.detect_multiple_versions.threshold}} property. This periodical check only make sense if the Compaction are enabled. > Log a warning if multiple Compaction Worker versions are running compactions > > > Key: HIVE-25642 > URL: https://issues.apache.org/jira/browse/HIVE-25642 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 4.0.0 >Reporter: Csomor Viktor >Assignee: Csomor Viktor >Priority: Minor > > Log a warning if multiple versions of a Compaction Workers are running > compactions. > The start time of the individual HMS services are not stored at the moment, > however this information could proved a good baseline of detecting multiple > Worker versions. > Due to the lack of this information we can check periodically in the past N > hours to detect the versions. > The N hours can be configured by > {{metastore.compactor.worker.detect_multiple_versions.threshold}} property. > This periodical check only make sense if the Compaction are enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HIVE-25642) Log a warning if multiple Compaction Worker versions are running compactions
[ https://issues.apache.org/jira/browse/HIVE-25642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-25642 started by Csomor Viktor. > Log a warning if multiple Compaction Worker versions are running compactions > > > Key: HIVE-25642 > URL: https://issues.apache.org/jira/browse/HIVE-25642 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 4.0.0 >Reporter: Csomor Viktor >Assignee: Csomor Viktor >Priority: Minor > > Log a warning if multiple versions of a Compaction Workers are running. > The start time of the individual HMS services are not stored at the moment, > however this information could proved a good baseline of detecting multiple > Worker versions. > Due to the lack of this information we can check periodically in the past N > hours to detect the versions. > The N hours can be configured by > {{metastore.compactor.worker.detect_multiple_versions.threshold}} property. > This periodical check only make sense if the Compaction are enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25642) Log a warning if multiple Compaction Worker versions are running compactions
[ https://issues.apache.org/jira/browse/HIVE-25642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Csomor Viktor reassigned HIVE-25642: > Log a warning if multiple Compaction Worker versions are running compactions > > > Key: HIVE-25642 > URL: https://issues.apache.org/jira/browse/HIVE-25642 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 4.0.0 >Reporter: Csomor Viktor >Assignee: Csomor Viktor >Priority: Minor > > Log a warning if multiple versions of a Compaction Workers are running. > The start time of the individual HMS services are not stored at the moment, > however this information could proved a good baseline of detecting multiple > Worker versions. > Due to the lack of this information we can check periodically in the past N > hours to detect the versions. > The N hours can be configured by > {{metastore.compactor.worker.detect_multiple_versions.threshold}} property. > This periodical check only make sense if the Compaction are enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25553) Support Map data-type natively in Arrow format
[ https://issues.apache.org/jira/browse/HIVE-25553?focusedWorklogId=669467=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-669467 ] ASF GitHub Bot logged work on HIVE-25553: - Author: ASF GitHub Bot Created on: 25/Oct/21 11:35 Start Date: 25/Oct/21 11:35 Worklog Time Spent: 10m Work Description: warriersruthi commented on a change in pull request #2689: URL: https://github.com/apache/hive/pull/2689#discussion_r735511241 ## File path: itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcWithMiniLlapVectorArrow.java ## @@ -64,8 +65,8 @@ public static void beforeTest() throws Exception { return new LlapArrowRowInputFormat(Long.MAX_VALUE); } - // Currently MAP type is not supported. Add it back when Arrow 1.0 is released. - // See: SPARK-21187 + // Currently, loading from a text file gives errors with Map dataType. + // This needs to be fixed when adding support for non-ORC writes (text and parquet) for the llap-ext-client. Review comment: Its working now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 669467) Time Spent: 1.5h (was: 1h 20m) > Support Map data-type natively in Arrow format > -- > > Key: HIVE-25553 > URL: https://issues.apache.org/jira/browse/HIVE-25553 > Project: Hive > Issue Type: Improvement > Components: llap, Serializers/Deserializers >Reporter: Adesh Kumar Rao >Assignee: Adesh Kumar Rao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Currently ArrowColumnarBatchSerDe converts map datatype as a list of structs > data-type (where stuct is containing the key-value pair of the map). This > causes issues when reading Map datatype using llap-ext-client as it reads a > list of structs instead. > HiveWarehouseConnector which uses the llap-ext-client throws exception when > the schema (containing Map data type) is different from actual data (list of > structs). > > Fixing this issue requires upgrading arrow version (where map data-type is > supported), modifying ArrowColumnarBatchSerDe and corresponding > Serializer/Deserializer to not use list as a workaround for map and use the > arrow map data-type instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25553) Support Map data-type natively in Arrow format
[ https://issues.apache.org/jira/browse/HIVE-25553?focusedWorklogId=669466=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-669466 ] ASF GitHub Bot logged work on HIVE-25553: - Author: ASF GitHub Bot Created on: 25/Oct/21 11:35 Start Date: 25/Oct/21 11:35 Worklog Time Spent: 10m Work Description: warriersruthi commented on a change in pull request #2689: URL: https://github.com/apache/hive/pull/2689#discussion_r735511083 ## File path: itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcWithMiniLlapArrow.java ## @@ -123,8 +123,8 @@ public static void afterTest() { return new LlapArrowRowInputFormat(Long.MAX_VALUE); } - // Currently MAP type is not supported. Add it back when Arrow 1.0 is released. - // See: SPARK-21187 + // Currently, loading from a text file gives errors with Map dataType. + // This needs to be fixed when adding support for non-ORC writes (text and parquet) for the llap-ext-client. Review comment: Its working now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 669466) Time Spent: 1h 20m (was: 1h 10m) > Support Map data-type natively in Arrow format > -- > > Key: HIVE-25553 > URL: https://issues.apache.org/jira/browse/HIVE-25553 > Project: Hive > Issue Type: Improvement > Components: llap, Serializers/Deserializers >Reporter: Adesh Kumar Rao >Assignee: Adesh Kumar Rao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently ArrowColumnarBatchSerDe converts map datatype as a list of structs > data-type (where stuct is containing the key-value pair of the map). This > causes issues when reading Map datatype using llap-ext-client as it reads a > list of structs instead. > HiveWarehouseConnector which uses the llap-ext-client throws exception when > the schema (containing Map data type) is different from actual data (list of > structs). > > Fixing this issue requires upgrading arrow version (where map data-type is > supported), modifying ArrowColumnarBatchSerDe and corresponding > Serializer/Deserializer to not use list as a workaround for map and use the > arrow map data-type instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25630) Transformer fixes
[ https://issues.apache.org/jira/browse/HIVE-25630?focusedWorklogId=669454=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-669454 ] ASF GitHub Bot logged work on HIVE-25630: - Author: ASF GitHub Bot Created on: 25/Oct/21 10:41 Start Date: 25/Oct/21 10:41 Worklog Time Spent: 10m Work Description: kasakrisz commented on a change in pull request #2738: URL: https://github.com/apache/hive/pull/2738#discussion_r735474163 ## File path: standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestMetastoreTransformer.java ## @@ -0,0 +1,130 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.metastore; + +import java.util.ArrayList; +import org.apache.hadoop.hive.metastore.client.builder.DatabaseBuilder; +import org.apache.hadoop.hive.metastore.client.builder.TableBuilder; +import org.apache.hadoop.hive.metastore.conf.MetastoreConf; +import org.apache.hadoop.hive.metastore.conf.MetastoreConf.ConfVars; +import org.junit.After; +import org.junit.Before; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hive.metastore.api.FieldSchema; +import org.apache.hadoop.hive.metastore.api.InvalidOperationException; +import org.apache.hadoop.hive.metastore.api.MetaException; +import org.apache.hadoop.hive.metastore.api.NoSuchObjectException; +import org.apache.hadoop.hive.metastore.api.Table; +import org.apache.hadoop.util.StringUtils; +import org.apache.thrift.TException; +import org.junit.Test; + +public class TestMetastoreTransformer { + private static final Logger LOG = LoggerFactory.getLogger(TestMetastoreTransformer.class); + protected static HiveMetaStoreClient client; + protected static Configuration conf = null; + protected static Warehouse warehouse; + protected static boolean isThriftClient = false; + + @Before + public void setUp() throws Exception { +initConf(); +warehouse = new Warehouse(conf); + +// set some values to use for getting conf. vars +MetastoreConf.setBoolVar(conf, ConfVars.METRICS_ENABLED, true); +conf.set("datanucleus.autoCreateTables", "false"); +conf.set("hive.in.test", "true"); + +MetaStoreTestUtils.setConfForStandloneMode(conf); + +warehouse = new Warehouse(conf); +client = createClient(); + } + + @After + public void tearDown() throws Exception { +client.close(); + } + + protected HiveMetaStoreClient createClient() throws Exception { +try { + return new HiveMetaStoreClient(conf); +} catch (Throwable e) { + System.err.println("Unable to open the metastore"); + System.err.println(StringUtils.stringifyException(e)); + throw new Exception(e); +} + } + + protected void initConf() { +if (null == conf) { + conf = MetastoreConf.newMetastoreConf(); +} + } + + private static void silentDropDatabase(String dbName) throws TException { +try { + for (String tableName : client.getTables(dbName, "*")) { +client.dropTable(dbName, tableName); + } + client.dropDatabase(dbName); +} catch (NoSuchObjectException | InvalidOperationException | MetaException e) { + // NOP +} + } + + @Test + public void testAlterTableIsCaseInSensitive() throws Exception { +String dbName = "alterdb"; +String tblName = "altertbl"; + +client.dropTable(dbName, tblName); +silentDropDatabase(dbName); + +String dbLocation = MetastoreConf.getVar(conf, ConfVars.WAREHOUSE_EXTERNAL) + "/_testDB_table_create_"; +String mgdLocation = MetastoreConf.getVar(conf, ConfVars.WAREHOUSE) + "/_testDB_table_create_"; +new DatabaseBuilder().setName(dbName).setLocation(dbLocation).setManagedLocation(mgdLocation).create(client, conf); + +ArrayList invCols = new ArrayList<>(2); +invCols.add(new FieldSchema("n-ame", ColumnType.STRING_TYPE_NAME, "")); +invCols.add(new FieldSchema("in.come", ColumnType.INT_TYPE_NAME, "")); + +Table tbl = new TableBuilder().setDbName(dbName).setTableName(tblName).setCols(invCols).build(conf); + +
[jira] [Assigned] (HIVE-25641) Hive多分区表添加字段提示Partition not found
[ https://issues.apache.org/jira/browse/HIVE-25641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhusijie reassigned HIVE-25641: --- Assignee: (was: zhusijie) > Hive多分区表添加字段提示Partition not found > - > > Key: HIVE-25641 > URL: https://issues.apache.org/jira/browse/HIVE-25641 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.0.1 >Reporter: zhusijie >Priority: Major > Attachments: image-2021-10-25-17-32-32-207.png > > > 执行 ALTER TABLE cf_rds.cf_rds_jxd_clue_basic_di ADD COLUMNS (channel bigint > COMMENT '渠道号') CASCADE; > 提示:Partition not found > !image-2021-10-25-17-32-32-207.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25641) Hive多分区表添加字段提示Partition not found
[ https://issues.apache.org/jira/browse/HIVE-25641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhusijie reassigned HIVE-25641: --- > Hive多分区表添加字段提示Partition not found > - > > Key: HIVE-25641 > URL: https://issues.apache.org/jira/browse/HIVE-25641 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.0.1 >Reporter: zhusijie >Assignee: zhusijie >Priority: Major > Attachments: image-2021-10-25-17-32-32-207.png > > > 执行 ALTER TABLE cf_rds.cf_rds_jxd_clue_basic_di ADD COLUMNS (channel bigint > COMMENT '渠道号') CASCADE; > 提示:Partition not found > !image-2021-10-25-17-32-32-207.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25397) Snapshot support for controlled failover
[ https://issues.apache.org/jira/browse/HIVE-25397?focusedWorklogId=669408=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-669408 ] ASF GitHub Bot logged work on HIVE-25397: - Author: ASF GitHub Bot Created on: 25/Oct/21 08:54 Start Date: 25/Oct/21 08:54 Worklog Time Spent: 10m Work Description: ayushtkn commented on a change in pull request #2539: URL: https://github.com/apache/hive/pull/2539#discussion_r734993099 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosUsingSnapshots.java ## @@ -593,6 +642,128 @@ public void testFailureScenarios() throws Throwable { .verifyResults(new String[] {"delhi", "noida"}); } + /* + * test to check reuse of diff snapshots when incremental fails with irrecoverable error during data-copy (target modified) + * and re-bootstrap is required but overwrite is off. + */ + @Test + public void testRebootstrapDiffCopy() throws Throwable { + +DistributedFileSystem fs = primary.miniDFSCluster.getFileSystem(); +DistributedFileSystem fsTarget = replica.miniDFSCluster.getFileSystem(); +Path externalTableLocation1 = new Path("/" + testName.getMethodName() + "/table1/"); +fs.mkdirs(externalTableLocation1, new FsPermission("777")); + +List withClause = ReplicationTestUtils.includeExternalTableClause(true); +withClause.add("'" + HiveConf.ConfVars.REPLDIR.varname + "'='" + primary.repldDir + "'"); +withClause.add("'hive.repl.external.warehouse.single.copy.task.paths'='" + externalTableLocation1 +.makeQualified(fs.getUri(), fs.getWorkingDirectory()).toString() + "'"); + +WarehouseInstance.Tuple tuple = primary.run("use " + primaryDbName) +.run("create external table table1 (place string) partitioned by (country string) row format " ++ "delimited fields terminated by ',' location '" + externalTableLocation1.toString() + "'") +.run("create external table table2 (id int)") +.run("create external table table3 (id int)") +.run("insert into table1 partition(country='nepal') values ('kathmandu')") +.run("insert into table1 partition(country='china') values ('beejing')") +.run("insert into table2 values(1)") +.run("insert into table3 values(5)") +.dump(primaryDbName, withClause); + +replica.load(replicatedDbName, primaryDbName, withClause) +.run("use " + replicatedDbName) +.run("show tables like 'table1'") +.verifyResults(new String[] {"table1"}) +.run("select place from table1 where country='nepal'") +.verifyResults(new String[] {"kathmandu"}) +.run("select place from table1 where country='china'") +.verifyResults(new String[] {"beejing"}) +.run("select id from table3") +.verifyResults(new String[]{"5"}) +.run("select id from table2") +.verifyResults(new String[] {"1"}) +.verifyReplTargetProperty(replicatedDbName); + +// Check if the table1 directory is snapshotoble and the snapshot is there. +validateInitialSnapshotsCreated(externalTableLocation1.toString()); + +// Add some more data and do a dump & load +primary.run("use " + primaryDbName) +.run("insert into table1 partition(country='china') values ('wuhan')") +.run("insert into table2 values(2)") +.run("insert into table3 values(6)") +.dump(primaryDbName, withClause); + +replica.load(replicatedDbName, primaryDbName, withClause) +.run("use " + replicatedDbName) +.run("select place from table1 where country='china'") +.verifyResults(new String[] {"beejing", "wuhan"}) +.run("select id from table3") +.verifyResults(new String[]{"5", "6"}) +.run("select id from table2") +.verifyResults(new String[] {"1", "2"}) +.verifyReplTargetProperty(replicatedDbName); + +// Verify if diff snapshots is there. +validateDiffSnapshotsCreated(externalTableLocation1.toString()); + +Path targetWhPath = externalTableDataPath(replicaConf, REPLICA_EXTERNAL_BASE, +new Path(primary.getDatabase(primaryDbName).getLocationUri())); +DistributedFileSystem replicaDfs = (DistributedFileSystem) targetWhPath.getFileSystem(replicaConf); + +// Emulate the situation of a rebootstrap with incomplete data copied in the previous incremental cycle for some paths +// a. add some data to some paths +// b. do a dump load with snapshot disabled +// c. Now, some paths should have outdated snapshots in both source and target. +//Re-enable snapshot and check whether diff-copy takes place for a fresh bootstrap +
[jira] [Assigned] (HIVE-25639) Exclude tomcat-embed-core from libthrift
[ https://issues.apache.org/jira/browse/HIVE-25639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ranith Sardar reassigned HIVE-25639: > Exclude tomcat-embed-core from libthrift > > > Key: HIVE-25639 > URL: https://issues.apache.org/jira/browse/HIVE-25639 > Project: Hive > Issue Type: Bug > Components: Thrift API >Reporter: Ranith Sardar >Assignee: Ranith Sardar >Priority: Major > > After Thrift dependency up-gradation to 0.14.1 to fix a known CVE but a > dependency issue in libthrift brings in tomcat-embed-core which has many > vulnerabilities. See: THRIFT-5375 > Since this dependency is used in Thrift only for a test we can safely exclude > it inside Hive. -- This message was sent by Atlassian Jira (v8.3.4#803005)