[jira] [Commented] (HIVE-8744) hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large
[ https://issues.apache.org/jira/browse/HIVE-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199451#comment-14199451 ] Sergio Peña commented on HIVE-8744: --- Thanks [~szehon]. I submit the correct patch. hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large - Key: HIVE-8744 URL: https://issues.apache.org/jira/browse/HIVE-8744 Project: Hive Issue Type: Bug Affects Versions: 0.15.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-8744.1.patch This test is related to the bug HIVE-8065 where I am trying to support HDFS encryption. One of the enhancements to support it is to create a .hive-staging directory on the same table directory location where the query is executed. Now, when running the hbase_stats3.q test from a temporary directory that has a large path, then the new path, a combination of table location + .hive-staging + random temporary subdirectories, is too large to fit into the statistics table, so the path is truncated. This causes the following error: {noformat} 2014-11-04 08:57:36,680 ERROR [LocalJobRunner Map Task Executor #0]: jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(199)) - Error during publishing statistics. java.sql.SQLDataException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown Source) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:148) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:145) at org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2667) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:161) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1031) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:870) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:579) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.sql.SQLException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source) ... 30 more Caused by: ERROR 22001: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at org.apache.derby.iapi.error.StandardException.newException(Unknown Source
[jira] [Updated] (HIVE-8744) hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large
[ https://issues.apache.org/jira/browse/HIVE-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8744: -- Status: Open (was: Patch Available) hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large - Key: HIVE-8744 URL: https://issues.apache.org/jira/browse/HIVE-8744 Project: Hive Issue Type: Bug Affects Versions: 0.15.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-8744.1.patch This test is related to the bug HIVE-8065 where I am trying to support HDFS encryption. One of the enhancements to support it is to create a .hive-staging directory on the same table directory location where the query is executed. Now, when running the hbase_stats3.q test from a temporary directory that has a large path, then the new path, a combination of table location + .hive-staging + random temporary subdirectories, is too large to fit into the statistics table, so the path is truncated. This causes the following error: {noformat} 2014-11-04 08:57:36,680 ERROR [LocalJobRunner Map Task Executor #0]: jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(199)) - Error during publishing statistics. java.sql.SQLDataException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown Source) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:148) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:145) at org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2667) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:161) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1031) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:870) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:579) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.sql.SQLException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source) ... 30 more Caused by: ERROR 22001: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at org.apache.derby.iapi.error.StandardException.newException(Unknown Source) at org.apache.derby.iapi.types.SQLChar.hasNonBlankChars(Unknown Source
[jira] [Updated] (HIVE-8744) hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large
[ https://issues.apache.org/jira/browse/HIVE-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8744: -- Attachment: HIVE-8744.2.patch hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large - Key: HIVE-8744 URL: https://issues.apache.org/jira/browse/HIVE-8744 Project: Hive Issue Type: Bug Affects Versions: 0.15.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-8744.1.patch, HIVE-8744.2.patch This test is related to the bug HIVE-8065 where I am trying to support HDFS encryption. One of the enhancements to support it is to create a .hive-staging directory on the same table directory location where the query is executed. Now, when running the hbase_stats3.q test from a temporary directory that has a large path, then the new path, a combination of table location + .hive-staging + random temporary subdirectories, is too large to fit into the statistics table, so the path is truncated. This causes the following error: {noformat} 2014-11-04 08:57:36,680 ERROR [LocalJobRunner Map Task Executor #0]: jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(199)) - Error during publishing statistics. java.sql.SQLDataException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown Source) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:148) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:145) at org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2667) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:161) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1031) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:870) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:579) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.sql.SQLException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source) ... 30 more Caused by: ERROR 22001: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at org.apache.derby.iapi.error.StandardException.newException(Unknown Source) at org.apache.derby.iapi.types.SQLChar.hasNonBlankChars(Unknown Source
[jira] [Updated] (HIVE-8744) hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large
[ https://issues.apache.org/jira/browse/HIVE-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8744: -- Status: Patch Available (was: Open) Submitted new patch that changes the stats table name to v3 hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large - Key: HIVE-8744 URL: https://issues.apache.org/jira/browse/HIVE-8744 Project: Hive Issue Type: Bug Affects Versions: 0.15.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-8744.1.patch, HIVE-8744.2.patch This test is related to the bug HIVE-8065 where I am trying to support HDFS encryption. One of the enhancements to support it is to create a .hive-staging directory on the same table directory location where the query is executed. Now, when running the hbase_stats3.q test from a temporary directory that has a large path, then the new path, a combination of table location + .hive-staging + random temporary subdirectories, is too large to fit into the statistics table, so the path is truncated. This causes the following error: {noformat} 2014-11-04 08:57:36,680 ERROR [LocalJobRunner Map Task Executor #0]: jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(199)) - Error during publishing statistics. java.sql.SQLDataException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown Source) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:148) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:145) at org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2667) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:161) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1031) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:870) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:579) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.sql.SQLException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source) ... 30 more Caused by: ERROR 22001: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at org.apache.derby.iapi.error.StandardException.newException(Unknown Source
[jira] [Commented] (HIVE-8065) Support HDFS encryption functionality on Hive
[ https://issues.apache.org/jira/browse/HIVE-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200462#comment-14200462 ] Sergio Peña commented on HIVE-8065: --- HI [~Ferd] Thanks for trying to help. There is already basic work done for this issue in local branch for hive 0.13. I will apply the patch for trunk and commit the changes to the HIVE-8065 branch. What we don't have yet are the unit query tests. Would you like to take that task? Support HDFS encryption functionality on Hive - Key: HIVE-8065 URL: https://issues.apache.org/jira/browse/HIVE-8065 Project: Hive Issue Type: Improvement Affects Versions: 0.13.1 Reporter: Sergio Peña Assignee: Sergio Peña The new encryption support on HDFS makes Hive incompatible and unusable when this feature is used. HDFS encryption is designed so that an user can configure different encryption zones (or directories) for multi-tenant environments. An encryption zone has an exclusive encryption key, such as AES-128 or AES-256. Because of security compliance, the HDFS does not allow to move/rename files between encryption zones. Renames are allowed only inside the same encryption zone. A copy is allowed between encryption zones. See HDFS-6134 for more details about HDFS encryption design. Hive currently uses a scratch directory (like /tmp/$user/$random). This scratch directory is used for the output of intermediate data (between MR jobs) and for the final output of the hive query which is later moved to the table directory location. If Hive tables are in different encryption zones than the scratch directory, then Hive won't be able to renames those files/directories, and it will make Hive unusable. To handle this problem, we can change the scratch directory of the query/statement to be inside the same encryption zone of the table directory location. This way, the renaming process will be successful. Also, for statements that move files between encryption zones (i.e. LOAD DATA), a copy may be executed instead of a rename. This will cause an overhead when copying large data files, but it won't break the encryption on Hive. Another security thing to consider is when using joins selects. If Hive joins different tables with different encryption key strengths, then the results of the select might break the security compliance of the tables. Let's say two tables with 128 bits and 256 bits encryption are joined, then the temporary results might be stored in the 128 bits encryption zone. This will conflict with the table encrypted with 256 bits temporary. To fix this, Hive should be able to select the scratch directory that is more secured/encrypted in order to save the intermediate data temporary with no compliance issues. For instance: {noformat} SELECT * FROM table-aes128 t1 JOIN table-aes256 t2 WHERE t1.id == t2.id; {noformat} - This should use a scratch directory (or staging directory) inside the table-aes256 table location. {noformat} INSERT OVERWRITE TABLE table-unencrypted SELECT * FROM table-aes1; {noformat} - This should use a scratch directory inside the table-aes1 location. {noformat} FROM table-unencrypted INSERT OVERWRITE TABLE table-aes128 SELECT id, name INSERT OVERWRITE TABLE table-aes256 SELECT id, name {noformat} - This should use a scratch directory on each of the tables locations. - The first SELECT will have its scratch directory on table-aes128 directory. - The second SELECT will have its scratch directory on table-aes256 directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8744) hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large
[ https://issues.apache.org/jira/browse/HIVE-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200933#comment-14200933 ] Sergio Peña commented on HIVE-8744: --- That patch works well [~prasanth_j]. We can use the one from HIVE-8735 instead. hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large - Key: HIVE-8744 URL: https://issues.apache.org/jira/browse/HIVE-8744 Project: Hive Issue Type: Bug Affects Versions: 0.15.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-8744.1.patch, HIVE-8744.2.patch This test is related to the bug HIVE-8065 where I am trying to support HDFS encryption. One of the enhancements to support it is to create a .hive-staging directory on the same table directory location where the query is executed. Now, when running the hbase_stats3.q test from a temporary directory that has a large path, then the new path, a combination of table location + .hive-staging + random temporary subdirectories, is too large to fit into the statistics table, so the path is truncated. This causes the following error: {noformat} 2014-11-04 08:57:36,680 ERROR [LocalJobRunner Map Task Executor #0]: jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(199)) - Error during publishing statistics. java.sql.SQLDataException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown Source) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:148) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:145) at org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2667) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:161) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1031) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:870) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:579) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.sql.SQLException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source) ... 30 more Caused by: ERROR 22001: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at org.apache.derby.iapi.error.StandardException.newException(Unknown Source
[jira] [Updated] (HIVE-8435) Add identity project remover optimization
[ https://issues.apache.org/jira/browse/HIVE-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesús Camacho Rodríguez updated HIVE-8435: -- Attachment: HIVE-8435.07.patch Add identity project remover optimization - Key: HIVE-8435 URL: https://issues.apache.org/jira/browse/HIVE-8435 Project: Hive Issue Type: New Feature Components: Logical Optimizer Affects Versions: 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0 Reporter: Ashutosh Chauhan Assignee: Sergey Shelukhin Attachments: HIVE-8435.02.patch, HIVE-8435.03.patch, HIVE-8435.03.patch, HIVE-8435.04.patch, HIVE-8435.05.patch, HIVE-8435.05.patch, HIVE-8435.06.patch, HIVE-8435.07.patch, HIVE-8435.1.patch, HIVE-8435.patch In some cases there is an identity project in plan which is useless. Better to optimize it away to avoid evaluating it without any benefit at runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8435) Add identity project remover optimization
[ https://issues.apache.org/jira/browse/HIVE-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesús Camacho Rodríguez updated HIVE-8435: -- Assignee: Jesús Camacho Rodríguez (was: Sergey Shelukhin) Status: In Progress (was: Patch Available) Add identity project remover optimization - Key: HIVE-8435 URL: https://issues.apache.org/jira/browse/HIVE-8435 Project: Hive Issue Type: New Feature Components: Logical Optimizer Affects Versions: 0.13.0, 0.12.0, 0.11.0, 0.10.0, 0.9.0 Reporter: Ashutosh Chauhan Assignee: Jesús Camacho Rodríguez Attachments: HIVE-8435.02.patch, HIVE-8435.03.patch, HIVE-8435.03.patch, HIVE-8435.04.patch, HIVE-8435.05.patch, HIVE-8435.05.patch, HIVE-8435.06.patch, HIVE-8435.07.patch, HIVE-8435.1.patch, HIVE-8435.patch In some cases there is an identity project in plan which is useless. Better to optimize it away to avoid evaluating it without any benefit at runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8827) Remove SSLv2Hello from list of disabled protocols
[ https://issues.apache.org/jira/browse/HIVE-8827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8827: -- Attachment: HIVE-8827.1.patch Remove SSLv2Hello from list of disabled protocols - Key: HIVE-8827 URL: https://issues.apache.org/jira/browse/HIVE-8827 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-8827.1.patch Turns out SSLv2Hello is not the same as SSLv2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8827) Remove SSLv2Hello from list of disabled protocols
[ https://issues.apache.org/jira/browse/HIVE-8827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8827: -- Status: Patch Available (was: Open) Remove SSLv2Hello from list of disabled protocols - Key: HIVE-8827 URL: https://issues.apache.org/jira/browse/HIVE-8827 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-8827.1.patch Turns out SSLv2Hello is not the same as SSLv2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8857) hive release has SNAPSHOT dependency, which is not on maven central
André Kelpe created HIVE-8857: - Summary: hive release has SNAPSHOT dependency, which is not on maven central Key: HIVE-8857 URL: https://issues.apache.org/jira/browse/HIVE-8857 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: André Kelpe I just tried building a project, which uses hive-exec as a dependency and it bails out, since hive 0.14.0 introduced a SNAPSHOT dependency to apache calcite, which is not on maven central. Do we have to include another repository now? Next to that it also seems problematic to rely on a SNAPSHOT dependency, which can change any time. {code} :compileJava Download http://repo1.maven.org/maven2/org/apache/hive/hive-exec/0.14.0/hive-exec-0.14.0.pom Download http://repo1.maven.org/maven2/org/apache/hive/hive/0.14.0/hive-0.14.0.pom Download http://repo1.maven.org/maven2/org/apache/hive/hive-ant/0.14.0/hive-ant-0.14.0.pom Download http://repo1.maven.org/maven2/org/apache/hive/hive-metastore/0.14.0/hive-metastore-0.14.0.pom Download http://repo1.maven.org/maven2/org/apache/hive/hive-shims/0.14.0/hive-shims-0.14.0.pom Download http://repo1.maven.org/maven2/org/fusesource/jansi/jansi/1.11/jansi-1.11.pom Download http://repo1.maven.org/maven2/org/fusesource/jansi/jansi-project/1.11/jansi-project-1.11.pom Download http://repo1.maven.org/maven2/org/fusesource/fusesource-pom/1.8/fusesource-pom-1.8.pom Download http://repo1.maven.org/maven2/org/apache/hive/hive-serde/0.14.0/hive-serde-0.14.0.pom Download http://repo1.maven.org/maven2/org/apache/hive/shims/hive-shims-common/0.14.0/hive-shims-common-0.14.0.pom Download http://repo1.maven.org/maven2/org/apache/hive/shims/hive-shims-common-secure/0.14.0/hive-shims-common-secure-0.14.0.pom Download http://repo1.maven.org/maven2/org/apache/hive/shims/hive-shims-0.20/0.14.0/hive-shims-0.20-0.14.0.pom Download http://repo1.maven.org/maven2/org/apache/hive/shims/hive-shims-0.20S/0.14.0/hive-shims-0.20S-0.14.0.pom Download http://repo1.maven.org/maven2/org/apache/hive/shims/hive-shims-0.23/0.14.0/hive-shims-0.23-0.14.0.pom Download http://repo1.maven.org/maven2/org/apache/hive/hive-common/0.14.0/hive-common-0.14.0.pom Download http://repo1.maven.org/maven2/org/apache/curator/curator-framework/2.6.0/curator-framework-2.6.0.pom Download http://repo1.maven.org/maven2/org/apache/curator/apache-curator/2.6.0/apache-curator-2.6.0.pom Download http://repo1.maven.org/maven2/org/apache/curator/curator-client/2.6.0/curator-client-2.6.0.pom Download http://repo1.maven.org/maven2/org/slf4j/slf4j-api/1.7.6/slf4j-api-1.7.6.pom Download http://repo1.maven.org/maven2/org/slf4j/slf4j-parent/1.7.6/slf4j-parent-1.7.6.pom FAILURE: Build failed with an exception. * What went wrong: Could not resolve all dependencies for configuration ':provided'. Could not find org.apache.calcite:calcite-core:0.9.2-incubating-SNAPSHOT. Required by: cascading:cascading-hive:1.1.0-wip-dev org.apache.hive:hive-exec:0.14.0 Could not find org.apache.calcite:calcite-avatica:0.9.2-incubating-SNAPSHOT. Required by: cascading:cascading-hive:1.1.0-wip-dev org.apache.hive:hive-exec:0.14.0 * Try: Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. BUILD FAILED Total time: 16.956 secs {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8862) Fix ordering diferences on TestParse tests due to Java8
[ https://issues.apache.org/jira/browse/HIVE-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8862: -- Status: Patch Available (was: Open) Fix ordering diferences on TestParse tests due to Java8 --- Key: HIVE-8862 URL: https://issues.apache.org/jira/browse/HIVE-8862 Project: Hive Issue Type: Sub-task Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-8862.1.patch This bug is related to HIVE-8607. All TestParse tests are failing on Java8 due to XML serialization incomptabilities with JDK7. This serialization issues are just ordering differences with the XML files generated with JDK7 because the hash function for HashMap/HashSet. In order to fix this, we should use LinkedHashMap/LinkedHashSet instead, so we can get the correct. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8862) Fix ordering diferences on TestParse tests due to Java8
[ https://issues.apache.org/jira/browse/HIVE-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8862: -- Attachment: HIVE-8862.1.patch This patch replace HashMap/HashSet by LinkedHashMap/LinkedHashSet in all the places where its values will be serialized in TestParse tests. It also has a fix on QTestUtil.java to fix some incompatibilities between JDK7 and JDK8. There was a need to re-generate all .q.xml files because of this changes. Fix ordering diferences on TestParse tests due to Java8 --- Key: HIVE-8862 URL: https://issues.apache.org/jira/browse/HIVE-8862 Project: Hive Issue Type: Sub-task Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-8862.1.patch This bug is related to HIVE-8607. All TestParse tests are failing on Java8 due to XML serialization incomptabilities with JDK7. This serialization issues are just ordering differences with the XML files generated with JDK7 because the hash function for HashMap/HashSet. In order to fix this, we should use LinkedHashMap/LinkedHashSet instead, so we can get the correct. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HIVE-8749) Change Hadoop version on HIVE-8065 to 2.6-SNAPSHOT
[ https://issues.apache.org/jira/browse/HIVE-8749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-8749 started by Sergio Peña. - Change Hadoop version on HIVE-8065 to 2.6-SNAPSHOT -- Key: HIVE-8749 URL: https://issues.apache.org/jira/browse/HIVE-8749 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Sergio Peña -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8359) Map containing null values are not correctly written in Parquet files
[ https://issues.apache.org/jira/browse/HIVE-8359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14211466#comment-14211466 ] Sergio Peña commented on HIVE-8359: --- I'll take a look at the code. Map containing null values are not correctly written in Parquet files - Key: HIVE-8359 URL: https://issues.apache.org/jira/browse/HIVE-8359 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.13.1 Reporter: Frédéric TERRAZZONI Assignee: Sergio Peña Attachments: HIVE-8359.1.patch, map_null_val.avro Tried write a mapstring,string column in a Parquet file. The table should contain : {code} {key3:val3,key4:null} {key3:val3,key4:null} {key1:null,key2:val2} {key3:val3,key4:null} {key3:val3,key4:null} {code} ... and when you do a query like {code}SELECT * from mytable{code} We can see that the table is corrupted : {code} {key3:val3} {key4:val3} {key3:val2} {key4:val3} {key1:val3} {code} I've not been able to read the Parquet file in our software afterwards, and consequently I suspect it to be corrupted. For those who are interested, I generated this Parquet table from an Avro file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8749) Change Hadoop version on HIVE-8065 to 2.6-SNAPSHOT
[ https://issues.apache.org/jira/browse/HIVE-8749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8749: -- Attachment: HIVE-8749.1.patch Change Hadoop version on HIVE-8065 to 2.6-SNAPSHOT -- Key: HIVE-8749 URL: https://issues.apache.org/jira/browse/HIVE-8749 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Sergio Peña Attachments: HIVE-8749.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8749) Change Hadoop version on HIVE-8065 to 2.6-SNAPSHOT
[ https://issues.apache.org/jira/browse/HIVE-8749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8749: -- Status: Patch Available (was: In Progress) Change Hadoop version on HIVE-8065 to 2.6-SNAPSHOT -- Key: HIVE-8749 URL: https://issues.apache.org/jira/browse/HIVE-8749 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Sergio Peña Attachments: HIVE-8749.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HIVE-8750) Commit initial encryption work
[ https://issues.apache.org/jira/browse/HIVE-8750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-8750 started by Sergio Peña. - Commit initial encryption work -- Key: HIVE-8750 URL: https://issues.apache.org/jira/browse/HIVE-8750 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Sergio Peña I believe Sergio has some work done for encryption. In this item we'll commit it to branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8869) RowSchema not updated for some ops when columns are pruned
Jesús Camacho Rodríguez created HIVE-8869: - Summary: RowSchema not updated for some ops when columns are pruned Key: HIVE-8869 URL: https://issues.apache.org/jira/browse/HIVE-8869 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.15.0 Reporter: Jesús Camacho Rodríguez Assignee: Jesús Camacho Rodríguez When columns are pruned in ColumnPrunerProcFactory, updating the row schema behavior is not consistent among operators: some will update their RowSchema, while some others will not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HIVE-8869) RowSchema not updated for some ops when columns are pruned
[ https://issues.apache.org/jira/browse/HIVE-8869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-8869 started by Jesús Camacho Rodríguez. - RowSchema not updated for some ops when columns are pruned -- Key: HIVE-8869 URL: https://issues.apache.org/jira/browse/HIVE-8869 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.15.0 Reporter: Jesús Camacho Rodríguez Assignee: Jesús Camacho Rodríguez When columns are pruned in ColumnPrunerProcFactory, updating the row schema behavior is not consistent among operators: some will update their RowSchema, while some others will not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8862) Fix ordering diferences on TestParse tests due to Java8
[ https://issues.apache.org/jira/browse/HIVE-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8862: -- Status: Open (was: Patch Available) Fix ordering diferences on TestParse tests due to Java8 --- Key: HIVE-8862 URL: https://issues.apache.org/jira/browse/HIVE-8862 Project: Hive Issue Type: Sub-task Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-8862.1.patch, HIVE-8862.2.patch This bug is related to HIVE-8607. All TestParse tests are failing on Java8 due to XML serialization incomptabilities with JDK7. This serialization issues are just ordering differences with the XML files generated with JDK7 because the hash function for HashMap/HashSet. In order to fix this, we should use LinkedHashMap/LinkedHashSet instead, so we can get the correct. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8862) Fix ordering diferences on TestParse tests due to Java8
[ https://issues.apache.org/jira/browse/HIVE-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8862: -- Status: Patch Available (was: Open) Fix ordering diferences on TestParse tests due to Java8 --- Key: HIVE-8862 URL: https://issues.apache.org/jira/browse/HIVE-8862 Project: Hive Issue Type: Sub-task Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-8862.1.patch, HIVE-8862.2.patch This bug is related to HIVE-8607. All TestParse tests are failing on Java8 due to XML serialization incomptabilities with JDK7. This serialization issues are just ordering differences with the XML files generated with JDK7 because the hash function for HashMap/HashSet. In order to fix this, we should use LinkedHashMap/LinkedHashSet instead, so we can get the correct. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8869) RowSchema not updated for some ops when columns are pruned
[ https://issues.apache.org/jira/browse/HIVE-8869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesús Camacho Rodríguez updated HIVE-8869: -- Status: Patch Available (was: In Progress) The row schema of the lateral views and table scan operators are updated now after column pruning is applied; thus, the schema conforms to the tuples that go through the operator, which is important e.g. for HIVE-8435. As a side effect, the statistics for the lateral views change, so I have uploaded the changes in those tests files. [~ashutoshc], can you check it please? RowSchema not updated for some ops when columns are pruned -- Key: HIVE-8869 URL: https://issues.apache.org/jira/browse/HIVE-8869 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.15.0 Reporter: Jesús Camacho Rodríguez Assignee: Jesús Camacho Rodríguez When columns are pruned in ColumnPrunerProcFactory, updating the row schema behavior is not consistent among operators: some will update their RowSchema, while some others will not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8869) RowSchema not updated for some ops when columns are pruned
[ https://issues.apache.org/jira/browse/HIVE-8869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesús Camacho Rodríguez updated HIVE-8869: -- Attachment: HIVE-8869.patch RowSchema not updated for some ops when columns are pruned -- Key: HIVE-8869 URL: https://issues.apache.org/jira/browse/HIVE-8869 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.15.0 Reporter: Jesús Camacho Rodríguez Assignee: Jesús Camacho Rodríguez Attachments: HIVE-8869.patch When columns are pruned in ColumnPrunerProcFactory, updating the row schema behavior is not consistent among operators: some will update their RowSchema, while some others will not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8869) RowSchema not updated for some ops when columns are pruned
[ https://issues.apache.org/jira/browse/HIVE-8869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesús Camacho Rodríguez updated HIVE-8869: -- Attachment: HIVE-8869.01.patch When we update the schema of the TableScan, we need to check whether the prunecols list is null or empty. RowSchema not updated for some ops when columns are pruned -- Key: HIVE-8869 URL: https://issues.apache.org/jira/browse/HIVE-8869 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.15.0 Reporter: Jesús Camacho Rodríguez Assignee: Jesús Camacho Rodríguez Attachments: HIVE-8869.01.patch, HIVE-8869.patch When columns are pruned in ColumnPrunerProcFactory, updating the row schema behavior is not consistent among operators: some will update their RowSchema, while some others will not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8869) RowSchema not updated for some ops when columns are pruned
[ https://issues.apache.org/jira/browse/HIVE-8869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesús Camacho Rodríguez updated HIVE-8869: -- Attachment: HIVE-8869.01.patch RowSchema not updated for some ops when columns are pruned -- Key: HIVE-8869 URL: https://issues.apache.org/jira/browse/HIVE-8869 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.15.0 Reporter: Jesús Camacho Rodríguez Assignee: Jesús Camacho Rodríguez Attachments: HIVE-8869.01.patch, HIVE-8869.patch When columns are pruned in ColumnPrunerProcFactory, updating the row schema behavior is not consistent among operators: some will update their RowSchema, while some others will not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8869) RowSchema not updated for some ops when columns are pruned
[ https://issues.apache.org/jira/browse/HIVE-8869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesús Camacho Rodríguez updated HIVE-8869: -- Attachment: (was: HIVE-8869.01.patch) RowSchema not updated for some ops when columns are pruned -- Key: HIVE-8869 URL: https://issues.apache.org/jira/browse/HIVE-8869 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.15.0 Reporter: Jesús Camacho Rodríguez Assignee: Jesús Camacho Rodríguez Attachments: HIVE-8869.01.patch, HIVE-8869.patch When columns are pruned in ColumnPrunerProcFactory, updating the row schema behavior is not consistent among operators: some will update their RowSchema, while some others will not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8896) expose (hadoop/tez) job ids in API
André Kelpe created HIVE-8896: - Summary: expose (hadoop/tez) job ids in API Key: HIVE-8896 URL: https://issues.apache.org/jira/browse/HIVE-8896 Project: Hive Issue Type: Improvement Components: Clients Reporter: André Kelpe In many cases it would be very useful to be able to map the hadoop/tez jobs back to a query that was executed/is currently being executed. Especially when hive queries are run within a bigger process the ability to get the job ids and query for counters is very beneficial to projects embeddding hive. I saw that cloudera's hue is parsing the logs produced by hive in order to get to the job ids. That seems rather brittle and can easily break, whenever the log format changes. Exposing the jobids in the API would make it a lot easier to build integrations like hue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8869) RowSchema not updated for some ops when columns are pruned
[ https://issues.apache.org/jira/browse/HIVE-8869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesús Camacho Rodríguez updated HIVE-8869: -- Attachment: HIVE-8869.02.patch It seems some readers (Accumulo, Vectorization) assume that the schema should contain all the columns of the tuples that are read, even if the columns are pruned... I changed the patch so they schema of the TableScan is not changed. RowSchema not updated for some ops when columns are pruned -- Key: HIVE-8869 URL: https://issues.apache.org/jira/browse/HIVE-8869 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.15.0 Reporter: Jesús Camacho Rodríguez Assignee: Jesús Camacho Rodríguez Attachments: HIVE-8869.01.patch, HIVE-8869.02.patch, HIVE-8869.patch When columns are pruned in ColumnPrunerProcFactory, updating the row schema behavior is not consistent among operators: some will update their RowSchema, while some others will not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8359) Map containing null values are not correctly written in Parquet files
[ https://issues.apache.org/jira/browse/HIVE-8359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8359: -- Status: Open (was: Patch Available) Map containing null values are not correctly written in Parquet files - Key: HIVE-8359 URL: https://issues.apache.org/jira/browse/HIVE-8359 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.13.1 Reporter: Frédéric TERRAZZONI Assignee: Sergio Peña Attachments: HIVE-8359.1.patch, HIVE-8359.2.patch, HIVE-8359.4.patch, map_null_val.avro Tried write a mapstring,string column in a Parquet file. The table should contain : {code} {key3:val3,key4:null} {key3:val3,key4:null} {key1:null,key2:val2} {key3:val3,key4:null} {key3:val3,key4:null} {code} ... and when you do a query like {code}SELECT * from mytable{code} We can see that the table is corrupted : {code} {key3:val3} {key4:val3} {key3:val2} {key4:val3} {key1:val3} {code} I've not been able to read the Parquet file in our software afterwards, and consequently I suspect it to be corrupted. For those who are interested, I generated this Parquet table from an Avro file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8359) Map containing null values are not correctly written in Parquet files
[ https://issues.apache.org/jira/browse/HIVE-8359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8359: -- Status: Patch Available (was: Open) Map containing null values are not correctly written in Parquet files - Key: HIVE-8359 URL: https://issues.apache.org/jira/browse/HIVE-8359 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.13.1 Reporter: Frédéric TERRAZZONI Assignee: Sergio Peña Attachments: HIVE-8359.1.patch, HIVE-8359.2.patch, HIVE-8359.4.patch, map_null_val.avro Tried write a mapstring,string column in a Parquet file. The table should contain : {code} {key3:val3,key4:null} {key3:val3,key4:null} {key1:null,key2:val2} {key3:val3,key4:null} {key3:val3,key4:null} {code} ... and when you do a query like {code}SELECT * from mytable{code} We can see that the table is corrupted : {code} {key3:val3} {key4:val3} {key3:val2} {key4:val3} {key1:val3} {code} I've not been able to read the Parquet file in our software afterwards, and consequently I suspect it to be corrupted. For those who are interested, I generated this Parquet table from an Avro file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8359) Map containing null values are not correctly written in Parquet files
[ https://issues.apache.org/jira/browse/HIVE-8359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8359: -- Attachment: HIVE-8359.4.patch Map containing null values are not correctly written in Parquet files - Key: HIVE-8359 URL: https://issues.apache.org/jira/browse/HIVE-8359 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.13.1 Reporter: Frédéric TERRAZZONI Assignee: Sergio Peña Attachments: HIVE-8359.1.patch, HIVE-8359.2.patch, HIVE-8359.4.patch, map_null_val.avro Tried write a mapstring,string column in a Parquet file. The table should contain : {code} {key3:val3,key4:null} {key3:val3,key4:null} {key1:null,key2:val2} {key3:val3,key4:null} {key3:val3,key4:null} {code} ... and when you do a query like {code}SELECT * from mytable{code} We can see that the table is corrupted : {code} {key3:val3} {key4:val3} {key3:val2} {key4:val3} {key1:val3} {code} I've not been able to read the Parquet file in our software afterwards, and consequently I suspect it to be corrupted. For those who are interested, I generated this Parquet table from an Avro file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8750) Commit initial encryption work
[ https://issues.apache.org/jira/browse/HIVE-8750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8750: -- Status: Patch Available (was: In Progress) Commit initial encryption work -- Key: HIVE-8750 URL: https://issues.apache.org/jira/browse/HIVE-8750 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Sergio Peña Attachments: HIVE-8750.1.patch I believe Sergio has some work done for encryption. In this item we'll commit it to branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8750) Commit initial encryption work
[ https://issues.apache.org/jira/browse/HIVE-8750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8750: -- Attachment: HIVE-8750.1.patch Commit initial encryption work -- Key: HIVE-8750 URL: https://issues.apache.org/jira/browse/HIVE-8750 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Sergio Peña Attachments: HIVE-8750.1.patch I believe Sergio has some work done for encryption. In this item we'll commit it to branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8869) RowSchema not updated for some ops when columns are pruned
[ https://issues.apache.org/jira/browse/HIVE-8869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215212#comment-14215212 ] Jesús Camacho Rodríguez commented on HIVE-8869: --- [~gopalv], the last version of the patch does not change anything for the schema of the TableScan operator, so nothing will break; it just takes into account column pruning wrt lateral views. RowSchema not updated for some ops when columns are pruned -- Key: HIVE-8869 URL: https://issues.apache.org/jira/browse/HIVE-8869 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.15.0 Reporter: Jesús Camacho Rodríguez Assignee: Jesús Camacho Rodríguez Fix For: 0.15.0 Attachments: HIVE-8869.01.patch, HIVE-8869.02.patch, HIVE-8869.patch When columns are pruned in ColumnPrunerProcFactory, updating the row schema behavior is not consistent among operators: some will update their RowSchema, while some others will not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8359) Map containing null values are not correctly written in Parquet files
[ https://issues.apache.org/jira/browse/HIVE-8359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216336#comment-14216336 ] Sergio Peña commented on HIVE-8359: --- Thanks [~mickaellcr]. Sorry for the confusion. I did not see you uploaded another patch here. I just added two extra lines to the patch you uploaded. I will integrate your fixes there, and upload the patch again. Map containing null values are not correctly written in Parquet files - Key: HIVE-8359 URL: https://issues.apache.org/jira/browse/HIVE-8359 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.13.1 Reporter: Frédéric TERRAZZONI Assignee: Sergio Peña Attachments: HIVE-8359.1.patch, HIVE-8359.2.patch, HIVE-8359.4.patch, map_null_val.avro Tried write a mapstring,string column in a Parquet file. The table should contain : {code} {key3:val3,key4:null} {key3:val3,key4:null} {key1:null,key2:val2} {key3:val3,key4:null} {key3:val3,key4:null} {code} ... and when you do a query like {code}SELECT * from mytable{code} We can see that the table is corrupted : {code} {key3:val3} {key4:val3} {key3:val2} {key4:val3} {key1:val3} {code} I've not been able to read the Parquet file in our software afterwards, and consequently I suspect it to be corrupted. For those who are interested, I generated this Parquet table from an Avro file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8359) Map containing null values are not correctly written in Parquet files
[ https://issues.apache.org/jira/browse/HIVE-8359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8359: -- Status: Open (was: Patch Available) Map containing null values are not correctly written in Parquet files - Key: HIVE-8359 URL: https://issues.apache.org/jira/browse/HIVE-8359 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.13.1 Reporter: Frédéric TERRAZZONI Assignee: Sergio Peña Attachments: HIVE-8359.1.patch, HIVE-8359.2.patch, HIVE-8359.4.patch, HIVE-8359.5.patch, map_null_val.avro Tried write a mapstring,string column in a Parquet file. The table should contain : {code} {key3:val3,key4:null} {key3:val3,key4:null} {key1:null,key2:val2} {key3:val3,key4:null} {key3:val3,key4:null} {code} ... and when you do a query like {code}SELECT * from mytable{code} We can see that the table is corrupted : {code} {key3:val3} {key4:val3} {key3:val2} {key4:val3} {key1:val3} {code} I've not been able to read the Parquet file in our software afterwards, and consequently I suspect it to be corrupted. For those who are interested, I generated this Parquet table from an Avro file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8359) Map containing null values are not correctly written in Parquet files
[ https://issues.apache.org/jira/browse/HIVE-8359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8359: -- Status: Patch Available (was: Open) Map containing null values are not correctly written in Parquet files - Key: HIVE-8359 URL: https://issues.apache.org/jira/browse/HIVE-8359 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.13.1 Reporter: Frédéric TERRAZZONI Assignee: Sergio Peña Attachments: HIVE-8359.1.patch, HIVE-8359.2.patch, HIVE-8359.4.patch, HIVE-8359.5.patch, map_null_val.avro Tried write a mapstring,string column in a Parquet file. The table should contain : {code} {key3:val3,key4:null} {key3:val3,key4:null} {key1:null,key2:val2} {key3:val3,key4:null} {key3:val3,key4:null} {code} ... and when you do a query like {code}SELECT * from mytable{code} We can see that the table is corrupted : {code} {key3:val3} {key4:val3} {key3:val2} {key4:val3} {key1:val3} {code} I've not been able to read the Parquet file in our software afterwards, and consequently I suspect it to be corrupted. For those who are interested, I generated this Parquet table from an Avro file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8359) Map containing null values are not correctly written in Parquet files
[ https://issues.apache.org/jira/browse/HIVE-8359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8359: -- Attachment: HIVE-8359.5.patch Attach new patch that integrates Mickael Lacour HIVE-6994 fix. Map containing null values are not correctly written in Parquet files - Key: HIVE-8359 URL: https://issues.apache.org/jira/browse/HIVE-8359 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.13.1 Reporter: Frédéric TERRAZZONI Assignee: Sergio Peña Attachments: HIVE-8359.1.patch, HIVE-8359.2.patch, HIVE-8359.4.patch, HIVE-8359.5.patch, map_null_val.avro Tried write a mapstring,string column in a Parquet file. The table should contain : {code} {key3:val3,key4:null} {key3:val3,key4:null} {key1:null,key2:val2} {key3:val3,key4:null} {key3:val3,key4:null} {code} ... and when you do a query like {code}SELECT * from mytable{code} We can see that the table is corrupted : {code} {key3:val3} {key4:val3} {key3:val2} {key4:val3} {key1:val3} {code} I've not been able to read the Parquet file in our software afterwards, and consequently I suspect it to be corrupted. For those who are interested, I generated this Parquet table from an Avro file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8435) Add identity project remover optimization
[ https://issues.apache.org/jira/browse/HIVE-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesús Camacho Rodríguez updated HIVE-8435: -- Attachment: HIVE-8435.08.patch Starting over. Add identity project remover optimization - Key: HIVE-8435 URL: https://issues.apache.org/jira/browse/HIVE-8435 Project: Hive Issue Type: New Feature Components: Logical Optimizer Affects Versions: 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0 Reporter: Ashutosh Chauhan Assignee: Jesús Camacho Rodríguez Attachments: HIVE-8435.02.patch, HIVE-8435.03.patch, HIVE-8435.03.patch, HIVE-8435.04.patch, HIVE-8435.05.patch, HIVE-8435.05.patch, HIVE-8435.06.patch, HIVE-8435.07.patch, HIVE-8435.08.patch, HIVE-8435.1.patch, HIVE-8435.patch In some cases there is an identity project in plan which is useless. Better to optimize it away to avoid evaluating it without any benefit at runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8435) Add identity project remover optimization
[ https://issues.apache.org/jira/browse/HIVE-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesús Camacho Rodríguez updated HIVE-8435: -- Attachment: HIVE-8435.09.patch This patch is the same than .08, but it contains the changes in the test results files. Add identity project remover optimization - Key: HIVE-8435 URL: https://issues.apache.org/jira/browse/HIVE-8435 Project: Hive Issue Type: New Feature Components: Logical Optimizer Affects Versions: 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0 Reporter: Ashutosh Chauhan Assignee: Jesús Camacho Rodríguez Attachments: HIVE-8435.02.patch, HIVE-8435.03.patch, HIVE-8435.03.patch, HIVE-8435.04.patch, HIVE-8435.05.patch, HIVE-8435.05.patch, HIVE-8435.06.patch, HIVE-8435.07.patch, HIVE-8435.08.patch, HIVE-8435.09.patch, HIVE-8435.1.patch, HIVE-8435.patch In some cases there is an identity project in plan which is useless. Better to optimize it away to avoid evaluating it without any benefit at runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6914) parquet-hive cannot write nested map (map value is map)
[ https://issues.apache.org/jira/browse/HIVE-6914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218250#comment-14218250 ] Sergio Peña commented on HIVE-6914: --- Hi [~mickaellcr], It sounds good if you use the patch from HIVE-8359 for this bug. Regarding adding the qtests to HIVE-8909, I think that ticket is meant to fix the reading part of different nested types formats generated by Thrift and Avro tools (it does not touch the writing part); so I think it should be good to have these writing tests separated from the reading tests. parquet-hive cannot write nested map (map value is map) --- Key: HIVE-6914 URL: https://issues.apache.org/jira/browse/HIVE-6914 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.12.0, 0.13.0 Reporter: Tongjie Chen Labels: parquet, serialization Attachments: HIVE-6914.1.patch, HIVE-6914.2.patch // table schema (identical for both plain text version and parquet version) desc hive desc text_mmap; m map // sample nested map entry {level1:{level2_key1:value1,level2_key2:value2}} The following query will fail, insert overwrite table parquet_mmap select * from text_mmap; Caused by: parquet.io.ParquetEncodingException: This should be an ArrayWritable or MapWritable: org.apache.hadoop.hive.ql.io.parquet.writable.BinaryWritable@f2f8106 at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:85) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeArray(DataWritableWriter.java:118) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:80) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:82) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:55) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31) at parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:115) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:81) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:37) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:77) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:90) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:622) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540) ... 9 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8909) Hive doesn't correctly read Parquet nested types
[ https://issues.apache.org/jira/browse/HIVE-8909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218252#comment-14218252 ] Sergio Peña commented on HIVE-8909: --- [~rdblue], is this ticket related to the different nested types found on this document? https://github.com/rdblue/incubator-parquet-format/blob/PARQUET-113-add-list-and-map-spec/LogicalTypes.md Hive doesn't correctly read Parquet nested types Key: HIVE-8909 URL: https://issues.apache.org/jira/browse/HIVE-8909 Project: Hive Issue Type: Bug Reporter: Ryan Blue Assignee: Ryan Blue Attachments: HIVE-8909-1.patch Parquet's Avro and Thrift object models don't produce the same parquet type representation for lists and maps that Hive does. In the Parquet community, we've defined what should be written and backward-compatibility rules for existing data written by parquet-avro and parquet-thrift in PARQUET-113. We need to implement those rules in the Hive Converter classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8745) Joins on decimal keys return different results whether they are run as reduce join or map join
[ https://issues.apache.org/jira/browse/HIVE-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218342#comment-14218342 ] Sergio Peña commented on HIVE-8745: --- [~leftylev] I believed you added a statement on documentation for HIVE-7373 fix; but this patch is reverting the trailing zeroes fix. So you might wanna revert that document statement as well. Joins on decimal keys return different results whether they are run as reduce join or map join -- Key: HIVE-8745 URL: https://issues.apache.org/jira/browse/HIVE-8745 Project: Hive Issue Type: Bug Components: Types Affects Versions: 0.14.0 Reporter: Gunther Hagleitner Assignee: Jason Dere Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8745.1.patch, HIVE-8745.2.patch, HIVE-8745.3.patch, join_test.q See attached .q file to reproduce. The difference seems to be whether trailing 0s are considered the same value or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8918) Beeline terminal cannot be initialized due to jline2 change
Sergio Peña created HIVE-8918: - Summary: Beeline terminal cannot be initialized due to jline2 change Key: HIVE-8918 URL: https://issues.apache.org/jira/browse/HIVE-8918 Project: Hive Issue Type: Bug Affects Versions: 0.15.0 Reporter: Sergio Peña I fetched the latest changes from trunk, and I got the following error when attempting to execute beeline: {noformat} ERROR] Terminal initialization failed; falling back to unsupported java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected at jline.TerminalFactory.create(TerminalFactory.java:101) at jline.TerminalFactory.get(TerminalFactory.java:158) at org.apache.hive.beeline.BeeLineOpts.init(BeeLineOpts.java:73) at org.apache.hive.beeline.BeeLine.init(BeeLine.java:117) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:469) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Exception in thread main java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected at org.apache.hive.beeline.BeeLineOpts.init(BeeLineOpts.java:101) at org.apache.hive.beeline.BeeLine.init(BeeLine.java:117) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:469) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {noformat} I executed the following command: {noformat} hive --service beeline -u jdbc:hive2://localhost:1 -n sergio {noformat} The commit before the jline2 is working fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8918) Beeline terminal cannot be initialized due to jline2 change
[ https://issues.apache.org/jira/browse/HIVE-8918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218574#comment-14218574 ] Sergio Peña commented on HIVE-8918: --- FYI [~Ferd]. You worked on moving jline2, so you might have some ideas about what is happening. Beeline terminal cannot be initialized due to jline2 change --- Key: HIVE-8918 URL: https://issues.apache.org/jira/browse/HIVE-8918 Project: Hive Issue Type: Bug Affects Versions: 0.15.0 Reporter: Sergio Peña I fetched the latest changes from trunk, and I got the following error when attempting to execute beeline: {noformat} ERROR] Terminal initialization failed; falling back to unsupported java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected at jline.TerminalFactory.create(TerminalFactory.java:101) at jline.TerminalFactory.get(TerminalFactory.java:158) at org.apache.hive.beeline.BeeLineOpts.init(BeeLineOpts.java:73) at org.apache.hive.beeline.BeeLine.init(BeeLine.java:117) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:469) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Exception in thread main java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected at org.apache.hive.beeline.BeeLineOpts.init(BeeLineOpts.java:101) at org.apache.hive.beeline.BeeLine.init(BeeLine.java:117) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:469) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {noformat} I executed the following command: {noformat} hive --service beeline -u jdbc:hive2://localhost:1 -n sergio {noformat} The commit before the jline2 is working fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8919) Fix FileUtils.copy() method to call distcp only for HDFS files (not local files)
Sergio Peña created HIVE-8919: - Summary: Fix FileUtils.copy() method to call distcp only for HDFS files (not local files) Key: HIVE-8919 URL: https://issues.apache.org/jira/browse/HIVE-8919 Project: Hive Issue Type: Sub-task Reporter: Sergio Peña Assignee: Sergio Peña When loading a big file ( 32Mb) from the local filesystem to the HDFS filesystem, Hive fails because the local filesystem cannot load the 'distcp' class. The 'distcp' class is used only by HDFS filesystem. We should use distcp only when copying files between the HDFS filesystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HIVE-8919) Fix FileUtils.copy() method to call distcp only for HDFS files (not local files)
[ https://issues.apache.org/jira/browse/HIVE-8919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-8919 started by Sergio Peña. - Fix FileUtils.copy() method to call distcp only for HDFS files (not local files) Key: HIVE-8919 URL: https://issues.apache.org/jira/browse/HIVE-8919 Project: Hive Issue Type: Sub-task Reporter: Sergio Peña Assignee: Sergio Peña When loading a big file ( 32Mb) from the local filesystem to the HDFS filesystem, Hive fails because the local filesystem cannot load the 'distcp' class. The 'distcp' class is used only by HDFS filesystem. We should use distcp only when copying files between the HDFS filesystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8919) Fix FileUtils.copy() method to call distcp only for HDFS files (not local files)
[ https://issues.apache.org/jira/browse/HIVE-8919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8919: -- Status: Patch Available (was: In Progress) Fix FileUtils.copy() method to call distcp only for HDFS files (not local files) Key: HIVE-8919 URL: https://issues.apache.org/jira/browse/HIVE-8919 Project: Hive Issue Type: Sub-task Reporter: Sergio Peña Assignee: Sergio Peña Attachments: CDH-23392.1.patch When loading a big file ( 32Mb) from the local filesystem to the HDFS filesystem, Hive fails because the local filesystem cannot load the 'distcp' class. The 'distcp' class is used only by HDFS filesystem. We should use distcp only when copying files between the HDFS filesystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8919) Fix FileUtils.copy() method to call distcp only for HDFS files (not local files)
[ https://issues.apache.org/jira/browse/HIVE-8919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8919: -- Attachment: CDH-23392.1.patch Fix FileUtils.copy() method to call distcp only for HDFS files (not local files) Key: HIVE-8919 URL: https://issues.apache.org/jira/browse/HIVE-8919 Project: Hive Issue Type: Sub-task Reporter: Sergio Peña Assignee: Sergio Peña Attachments: CDH-23392.1.patch When loading a big file ( 32Mb) from the local filesystem to the HDFS filesystem, Hive fails because the local filesystem cannot load the 'distcp' class. The 'distcp' class is used only by HDFS filesystem. We should use distcp only when copying files between the HDFS filesystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-8918) Beeline terminal cannot be initialized due to jline2 change
[ https://issues.apache.org/jira/browse/HIVE-8918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña resolved HIVE-8918. --- Resolution: Invalid Beeline terminal cannot be initialized due to jline2 change --- Key: HIVE-8918 URL: https://issues.apache.org/jira/browse/HIVE-8918 Project: Hive Issue Type: Bug Affects Versions: 0.15.0 Reporter: Sergio Peña I fetched the latest changes from trunk, and I got the following error when attempting to execute beeline: {noformat} ERROR] Terminal initialization failed; falling back to unsupported java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected at jline.TerminalFactory.create(TerminalFactory.java:101) at jline.TerminalFactory.get(TerminalFactory.java:158) at org.apache.hive.beeline.BeeLineOpts.init(BeeLineOpts.java:73) at org.apache.hive.beeline.BeeLine.init(BeeLine.java:117) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:469) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Exception in thread main java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected at org.apache.hive.beeline.BeeLineOpts.init(BeeLineOpts.java:101) at org.apache.hive.beeline.BeeLine.init(BeeLine.java:117) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:469) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {noformat} I executed the following command: {noformat} hive --service beeline -u jdbc:hive2://localhost:1 -n sergio {noformat} The commit before the jline2 is working fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8918) Beeline terminal cannot be initialized due to jline2 change
[ https://issues.apache.org/jira/browse/HIVE-8918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218979#comment-14218979 ] Sergio Peña commented on HIVE-8918: --- Thanks [~Ferd]. That was the problem. I deleted the jline-0.9.94.jar found on the hadoop lib directory and it worked. Beeline terminal cannot be initialized due to jline2 change --- Key: HIVE-8918 URL: https://issues.apache.org/jira/browse/HIVE-8918 Project: Hive Issue Type: Bug Affects Versions: 0.15.0 Reporter: Sergio Peña I fetched the latest changes from trunk, and I got the following error when attempting to execute beeline: {noformat} ERROR] Terminal initialization failed; falling back to unsupported java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected at jline.TerminalFactory.create(TerminalFactory.java:101) at jline.TerminalFactory.get(TerminalFactory.java:158) at org.apache.hive.beeline.BeeLineOpts.init(BeeLineOpts.java:73) at org.apache.hive.beeline.BeeLine.init(BeeLine.java:117) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:469) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Exception in thread main java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected at org.apache.hive.beeline.BeeLineOpts.init(BeeLineOpts.java:101) at org.apache.hive.beeline.BeeLine.init(BeeLine.java:117) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:469) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {noformat} I executed the following command: {noformat} hive --service beeline -u jdbc:hive2://localhost:1 -n sergio {noformat} The commit before the jline2 is working fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HIVE-8435) Add identity project remover optimization
[ https://issues.apache.org/jira/browse/HIVE-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesús Camacho Rodríguez reopened HIVE-8435: --- Add identity project remover optimization - Key: HIVE-8435 URL: https://issues.apache.org/jira/browse/HIVE-8435 Project: Hive Issue Type: New Feature Components: Logical Optimizer Affects Versions: 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0 Reporter: Ashutosh Chauhan Assignee: Jesús Camacho Rodríguez Labels: TODOC15 Fix For: 0.15.0 Attachments: HIVE-8435.02.patch, HIVE-8435.03.patch, HIVE-8435.03.patch, HIVE-8435.04.patch, HIVE-8435.05.patch, HIVE-8435.05.patch, HIVE-8435.06.patch, HIVE-8435.07.patch, HIVE-8435.08.patch, HIVE-8435.09.patch, HIVE-8435.1.patch, HIVE-8435.10.patch, HIVE-8435.patch In some cases there is an identity project in plan which is useless. Better to optimize it away to avoid evaluating it without any benefit at runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8435) Add identity project remover optimization
[ https://issues.apache.org/jira/browse/HIVE-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesús Camacho Rodríguez updated HIVE-8435: -- Attachment: (was: HIVE-8435.10.patch) Add identity project remover optimization - Key: HIVE-8435 URL: https://issues.apache.org/jira/browse/HIVE-8435 Project: Hive Issue Type: New Feature Components: Logical Optimizer Affects Versions: 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0 Reporter: Ashutosh Chauhan Assignee: Jesús Camacho Rodríguez Labels: TODOC15 Fix For: 0.15.0 Attachments: HIVE-8435.02.patch, HIVE-8435.03.patch, HIVE-8435.03.patch, HIVE-8435.04.patch, HIVE-8435.05.patch, HIVE-8435.05.patch, HIVE-8435.06.patch, HIVE-8435.07.patch, HIVE-8435.08.patch, HIVE-8435.09.patch, HIVE-8435.1.patch, HIVE-8435.10.patch, HIVE-8435.patch In some cases there is an identity project in plan which is useless. Better to optimize it away to avoid evaluating it without any benefit at runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8435) Add identity project remover optimization
[ https://issues.apache.org/jira/browse/HIVE-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesús Camacho Rodríguez updated HIVE-8435: -- Attachment: HIVE-8435.10.patch I added an additional check to know whether although the SelectOp keeps all the columns in the inputs, it swaps them; in this case, it should not be removed from the plan. Let's see how the tests go. Add identity project remover optimization - Key: HIVE-8435 URL: https://issues.apache.org/jira/browse/HIVE-8435 Project: Hive Issue Type: New Feature Components: Logical Optimizer Affects Versions: 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0 Reporter: Ashutosh Chauhan Assignee: Jesús Camacho Rodríguez Labels: TODOC15 Fix For: 0.15.0 Attachments: HIVE-8435.02.patch, HIVE-8435.03.patch, HIVE-8435.03.patch, HIVE-8435.04.patch, HIVE-8435.05.patch, HIVE-8435.05.patch, HIVE-8435.06.patch, HIVE-8435.07.patch, HIVE-8435.08.patch, HIVE-8435.09.patch, HIVE-8435.1.patch, HIVE-8435.10.patch, HIVE-8435.patch In some cases there is an identity project in plan which is useless. Better to optimize it away to avoid evaluating it without any benefit at runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8435) Add identity project remover optimization
[ https://issues.apache.org/jira/browse/HIVE-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesús Camacho Rodríguez updated HIVE-8435: -- Attachment: HIVE-8435.10.patch Add identity project remover optimization - Key: HIVE-8435 URL: https://issues.apache.org/jira/browse/HIVE-8435 Project: Hive Issue Type: New Feature Components: Logical Optimizer Affects Versions: 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0 Reporter: Ashutosh Chauhan Assignee: Jesús Camacho Rodríguez Labels: TODOC15 Fix For: 0.15.0 Attachments: HIVE-8435.02.patch, HIVE-8435.03.patch, HIVE-8435.03.patch, HIVE-8435.04.patch, HIVE-8435.05.patch, HIVE-8435.05.patch, HIVE-8435.06.patch, HIVE-8435.07.patch, HIVE-8435.08.patch, HIVE-8435.09.patch, HIVE-8435.1.patch, HIVE-8435.10.patch, HIVE-8435.patch In some cases there is an identity project in plan which is useless. Better to optimize it away to avoid evaluating it without any benefit at runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-8435) Add identity project remover optimization
[ https://issues.apache.org/jira/browse/HIVE-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesús Camacho Rodríguez resolved HIVE-8435. --- Resolution: Fixed Add identity project remover optimization - Key: HIVE-8435 URL: https://issues.apache.org/jira/browse/HIVE-8435 Project: Hive Issue Type: New Feature Components: Logical Optimizer Affects Versions: 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0 Reporter: Ashutosh Chauhan Assignee: Jesús Camacho Rodríguez Labels: TODOC15 Fix For: 0.15.0 Attachments: HIVE-8435.02.patch, HIVE-8435.03.patch, HIVE-8435.03.patch, HIVE-8435.04.patch, HIVE-8435.05.patch, HIVE-8435.05.patch, HIVE-8435.06.patch, HIVE-8435.07.patch, HIVE-8435.08.patch, HIVE-8435.09.patch, HIVE-8435.1.patch, HIVE-8435.10.patch, HIVE-8435.patch In some cases there is an identity project in plan which is useless. Better to optimize it away to avoid evaluating it without any benefit at runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8435) Add identity project remover optimization
[ https://issues.apache.org/jira/browse/HIVE-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesús Camacho Rodríguez updated HIVE-8435: -- Attachment: (was: HIVE-8435.10.patch) Add identity project remover optimization - Key: HIVE-8435 URL: https://issues.apache.org/jira/browse/HIVE-8435 Project: Hive Issue Type: New Feature Components: Logical Optimizer Affects Versions: 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0 Reporter: Ashutosh Chauhan Assignee: Jesús Camacho Rodríguez Labels: TODOC15 Fix For: 0.15.0 Attachments: HIVE-8435.02.patch, HIVE-8435.03.patch, HIVE-8435.03.patch, HIVE-8435.04.patch, HIVE-8435.05.patch, HIVE-8435.05.patch, HIVE-8435.06.patch, HIVE-8435.07.patch, HIVE-8435.08.patch, HIVE-8435.09.patch, HIVE-8435.1.patch, HIVE-8435.patch In some cases there is an identity project in plan which is useless. Better to optimize it away to avoid evaluating it without any benefit at runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8926) Projections that only swap input columns are identified incorrectly as identity projections
Jesús Camacho Rodríguez created HIVE-8926: - Summary: Projections that only swap input columns are identified incorrectly as identity projections Key: HIVE-8926 URL: https://issues.apache.org/jira/browse/HIVE-8926 Project: Hive Issue Type: Bug Affects Versions: 0.15.0 Reporter: Jesús Camacho Rodríguez Assignee: Jesús Camacho Rodríguez Fix For: 0.15.0 Projection operators that only swap the input columns in the tuples are identified as identity projections, and thus incorrectly deleted from the plan by the _identity project removal_ optimization. This bug was introduced in HIVE-8435. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HIVE-8926) Projections that only swap input columns are identified incorrectly as identity projections
[ https://issues.apache.org/jira/browse/HIVE-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-8926 started by Jesús Camacho Rodríguez. - Projections that only swap input columns are identified incorrectly as identity projections --- Key: HIVE-8926 URL: https://issues.apache.org/jira/browse/HIVE-8926 Project: Hive Issue Type: Bug Affects Versions: 0.15.0 Reporter: Jesús Camacho Rodríguez Assignee: Jesús Camacho Rodríguez Fix For: 0.15.0 Projection operators that only swap the input columns in the tuples are identified as identity projections, and thus incorrectly deleted from the plan by the _identity project removal_ optimization. This bug was introduced in HIVE-8435. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8926) Projections that only swap input columns are identified incorrectly as identity projections
[ https://issues.apache.org/jira/browse/HIVE-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesús Camacho Rodríguez updated HIVE-8926: -- Attachment: HIVE-8926.patch [~ashutoshc], I added an additional check to know whether although the SelectOp keeps all the columns in the inputs, it swaps them; in this case, it should not be removed from the plan. Let's see how the tests go. Projections that only swap input columns are identified incorrectly as identity projections --- Key: HIVE-8926 URL: https://issues.apache.org/jira/browse/HIVE-8926 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 0.15.0 Reporter: Jesús Camacho Rodríguez Assignee: Jesús Camacho Rodríguez Fix For: 0.15.0 Attachments: HIVE-8926.patch Projection operators that only swap the input columns in the tuples are identified as identity projections, and thus incorrectly deleted from the plan by the _identity project removal_ optimization. This bug was introduced in HIVE-8435. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8926) Projections that only swap input columns are identified incorrectly as identity projections
[ https://issues.apache.org/jira/browse/HIVE-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesús Camacho Rodríguez updated HIVE-8926: -- Status: Patch Available (was: In Progress) Projections that only swap input columns are identified incorrectly as identity projections --- Key: HIVE-8926 URL: https://issues.apache.org/jira/browse/HIVE-8926 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 0.15.0 Reporter: Jesús Camacho Rodríguez Assignee: Jesús Camacho Rodríguez Fix For: 0.15.0 Attachments: HIVE-8926.patch Projection operators that only swap the input columns in the tuples are identified as identity projections, and thus incorrectly deleted from the plan by the _identity project removal_ optimization. This bug was introduced in HIVE-8435. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-8870) errors when selecting a struct field within an array from ORC based tables
[ https://issues.apache.org/jira/browse/HIVE-8870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña reassigned HIVE-8870: - Assignee: Sergio Peña errors when selecting a struct field within an array from ORC based tables -- Key: HIVE-8870 URL: https://issues.apache.org/jira/browse/HIVE-8870 Project: Hive Issue Type: Bug Components: File Formats, Query Processor Affects Versions: 0.13.0, 0.14.0 Environment: HDP 2.1 / HDP 2.2 (YARN, but no Tez) Reporter: Michael Haeusler Assignee: Sergio Peña When using ORC as storage for a table, we get errors on selecting a struct field within an array. These errors do not appear with default format. {code:sql} CREATE TABLE `foobar_orc`( `uid` bigint, `elements` arraystructelementid:bigint,foo:structbar:string) STORED AS ORC; {code} When selecting from this _empty_ table, we get a direct NPE within the Hive CLI: {code:sql} SELECT elements.elementId FROM foobar_orc; -- FAILED: RuntimeException java.lang.NullPointerException {code} A more real-world query produces a RuntimeException / NullPointerException in the mapper: {code:sql} SELECT uid, element.elementId FROM foobar_orc LATERAL VIEW EXPLODE(elements) e AS element; Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) [...] Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:61) [...] FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask {code} Both queries run fine on a non-orc table: {code:sql} CREATE TABLE `foobar`( `uid` bigint, `elements` arraystructelementid:bigint,foo:structbar:string); SELECT elements.elementId FROM foobar; -- OK -- Time taken: 0.225 seconds SELECT uid, element.elementId FROM foobar LATERAL VIEW EXPLODE(elements) e AS element; -- Total MapReduce CPU Time Spent: 1 seconds 920 msec -- OK -- Time taken: 25.905 seconds {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8870) errors when selecting a struct field within an array from ORC based tables
[ https://issues.apache.org/jira/browse/HIVE-8870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219815#comment-14219815 ] Sergio Peña commented on HIVE-8870: --- I found the issue is because ORC is case sensitive. SELECT element.elementId FROM foobar_orc; (fails) SELECT element.elementid FROM foobar_orc; (success) I'll fix this by lowering the case on the query elements. errors when selecting a struct field within an array from ORC based tables -- Key: HIVE-8870 URL: https://issues.apache.org/jira/browse/HIVE-8870 Project: Hive Issue Type: Bug Components: File Formats, Query Processor Affects Versions: 0.13.0, 0.14.0 Environment: HDP 2.1 / HDP 2.2 (YARN, but no Tez) Reporter: Michael Haeusler Assignee: Sergio Peña When using ORC as storage for a table, we get errors on selecting a struct field within an array. These errors do not appear with default format. {code:sql} CREATE TABLE `foobar_orc`( `uid` bigint, `elements` arraystructelementid:bigint,foo:structbar:string) STORED AS ORC; {code} When selecting from this _empty_ table, we get a direct NPE within the Hive CLI: {code:sql} SELECT elements.elementId FROM foobar_orc; -- FAILED: RuntimeException java.lang.NullPointerException {code} A more real-world query produces a RuntimeException / NullPointerException in the mapper: {code:sql} SELECT uid, element.elementId FROM foobar_orc LATERAL VIEW EXPLODE(elements) e AS element; Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) [...] Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:61) [...] FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask {code} Both queries run fine on a non-orc table: {code:sql} CREATE TABLE `foobar`( `uid` bigint, `elements` arraystructelementid:bigint,foo:structbar:string); SELECT elements.elementId FROM foobar; -- OK -- Time taken: 0.225 seconds SELECT uid, element.elementId FROM foobar LATERAL VIEW EXPLODE(elements) e AS element; -- Total MapReduce CPU Time Spent: 1 seconds 920 msec -- OK -- Time taken: 25.905 seconds {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8870) errors when selecting a struct field within an array from ORC based tables
[ https://issues.apache.org/jira/browse/HIVE-8870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8870: -- Attachment: HIVE-8870.1.patch This patch converts the query columns to lower case in order to search for the correct struct column. errors when selecting a struct field within an array from ORC based tables -- Key: HIVE-8870 URL: https://issues.apache.org/jira/browse/HIVE-8870 Project: Hive Issue Type: Bug Components: File Formats, Query Processor Affects Versions: 0.13.0, 0.14.0 Environment: HDP 2.1 / HDP 2.2 (YARN, but no Tez) Reporter: Michael Haeusler Assignee: Sergio Peña Attachments: HIVE-8870.1.patch When using ORC as storage for a table, we get errors on selecting a struct field within an array. These errors do not appear with default format. {code:sql} CREATE TABLE `foobar_orc`( `uid` bigint, `elements` arraystructelementid:bigint,foo:structbar:string) STORED AS ORC; {code} When selecting from this _empty_ table, we get a direct NPE within the Hive CLI: {code:sql} SELECT elements.elementId FROM foobar_orc; -- FAILED: RuntimeException java.lang.NullPointerException {code} A more real-world query produces a RuntimeException / NullPointerException in the mapper: {code:sql} SELECT uid, element.elementId FROM foobar_orc LATERAL VIEW EXPLODE(elements) e AS element; Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) [...] Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:61) [...] FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask {code} Both queries run fine on a non-orc table: {code:sql} CREATE TABLE `foobar`( `uid` bigint, `elements` arraystructelementid:bigint,foo:structbar:string); SELECT elements.elementId FROM foobar; -- OK -- Time taken: 0.225 seconds SELECT uid, element.elementId FROM foobar LATERAL VIEW EXPLODE(elements) e AS element; -- Total MapReduce CPU Time Spent: 1 seconds 920 msec -- OK -- Time taken: 25.905 seconds {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8870) errors when selecting a struct field within an array from ORC based tables
[ https://issues.apache.org/jira/browse/HIVE-8870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8870: -- Status: Patch Available (was: Open) errors when selecting a struct field within an array from ORC based tables -- Key: HIVE-8870 URL: https://issues.apache.org/jira/browse/HIVE-8870 Project: Hive Issue Type: Bug Components: File Formats, Query Processor Affects Versions: 0.13.0, 0.14.0 Environment: HDP 2.1 / HDP 2.2 (YARN, but no Tez) Reporter: Michael Haeusler Assignee: Sergio Peña Attachments: HIVE-8870.1.patch When using ORC as storage for a table, we get errors on selecting a struct field within an array. These errors do not appear with default format. {code:sql} CREATE TABLE `foobar_orc`( `uid` bigint, `elements` arraystructelementid:bigint,foo:structbar:string) STORED AS ORC; {code} When selecting from this _empty_ table, we get a direct NPE within the Hive CLI: {code:sql} SELECT elements.elementId FROM foobar_orc; -- FAILED: RuntimeException java.lang.NullPointerException {code} A more real-world query produces a RuntimeException / NullPointerException in the mapper: {code:sql} SELECT uid, element.elementId FROM foobar_orc LATERAL VIEW EXPLODE(elements) e AS element; Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) [...] Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:61) [...] FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask {code} Both queries run fine on a non-orc table: {code:sql} CREATE TABLE `foobar`( `uid` bigint, `elements` arraystructelementid:bigint,foo:structbar:string); SELECT elements.elementId FROM foobar; -- OK -- Time taken: 0.225 seconds SELECT uid, element.elementId FROM foobar LATERAL VIEW EXPLODE(elements) e AS element; -- Total MapReduce CPU Time Spent: 1 seconds 920 msec -- OK -- Time taken: 25.905 seconds {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8909) Hive doesn't correctly read Parquet nested types
[ https://issues.apache.org/jira/browse/HIVE-8909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220018#comment-14220018 ] Sergio Peña commented on HIVE-8909: --- [~rdblue] The are a few parquet query tests in the following path: ql/src/test/queries/clientpositive/parquet_*.q ql/src/test/results/clientpositive/parquet_*.q.out The data files that are used by those queries tests are here (just read the *.q file to know which one is used): data/files/* Hive doesn't correctly read Parquet nested types Key: HIVE-8909 URL: https://issues.apache.org/jira/browse/HIVE-8909 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Ryan Blue Assignee: Ryan Blue Attachments: HIVE-8909-1.patch, HIVE-8909-2.patch, HIVE-8909.2.patch Parquet's Avro and Thrift object models don't produce the same parquet type representation for lists and maps that Hive does. In the Parquet community, we've defined what should be written and backward-compatibility rules for existing data written by parquet-avro and parquet-thrift in PARQUET-113. We need to implement those rules in the Hive Converter classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8926) Projections that only swap input columns are identified incorrectly as identity projections
[ https://issues.apache.org/jira/browse/HIVE-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesús Camacho Rodríguez updated HIVE-8926: -- Attachment: HIVE-8926.01.patch Now SelStarNoCompute is checked in the configuration object, as the field SelStarNoCompute in the operator is not initialized till initializeOp method is called. I also added an additional check on the length of colList. Projections that only swap input columns are identified incorrectly as identity projections --- Key: HIVE-8926 URL: https://issues.apache.org/jira/browse/HIVE-8926 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 0.15.0 Reporter: Jesús Camacho Rodríguez Assignee: Jesús Camacho Rodríguez Fix For: 0.15.0 Attachments: HIVE-8926.01.patch, HIVE-8926.patch Projection operators that only swap the input columns in the tuples are identified as identity projections, and thus incorrectly deleted from the plan by the _identity project removal_ optimization. This bug was introduced in HIVE-8435. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8909) Hive doesn't correctly read Parquet nested types
[ https://issues.apache.org/jira/browse/HIVE-8909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14221230#comment-14221230 ] Sergio Peña commented on HIVE-8909: --- Hey [~rdblue] Could you post the patch in the review board? Hive doesn't correctly read Parquet nested types Key: HIVE-8909 URL: https://issues.apache.org/jira/browse/HIVE-8909 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Ryan Blue Assignee: Ryan Blue Attachments: HIVE-8909-1.patch, HIVE-8909-2.patch, HIVE-8909.2.patch, HIVE-8909.3.patch, parquet-test-data.tar.gz Parquet's Avro and Thrift object models don't produce the same parquet type representation for lists and maps that Hive does. In the Parquet community, we've defined what should be written and backward-compatibility rules for existing data written by parquet-avro and parquet-thrift in PARQUET-113. We need to implement those rules in the Hive Converter classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8435) Add identity project remover optimization
[ https://issues.apache.org/jira/browse/HIVE-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14221276#comment-14221276 ] Jesús Camacho Rodríguez commented on HIVE-8435: --- [~sershe], sure, I'll help with HIVE-8395, it's not a problem, one way or another we knew there was going to be a lot of reviewing work. Just let me know where to start. By the way, HIVE-8926 will introduce a few more changes in test results (~10 tests). Add identity project remover optimization - Key: HIVE-8435 URL: https://issues.apache.org/jira/browse/HIVE-8435 Project: Hive Issue Type: New Feature Components: Logical Optimizer Affects Versions: 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0 Reporter: Ashutosh Chauhan Assignee: Jesús Camacho Rodríguez Labels: TODOC15 Fix For: 0.15.0 Attachments: HIVE-8435.02.patch, HIVE-8435.03.patch, HIVE-8435.03.patch, HIVE-8435.04.patch, HIVE-8435.05.patch, HIVE-8435.05.patch, HIVE-8435.06.patch, HIVE-8435.07.patch, HIVE-8435.08.patch, HIVE-8435.09.patch, HIVE-8435.1.patch, HIVE-8435.patch In some cases there is an identity project in plan which is useless. Better to optimize it away to avoid evaluating it without any benefit at runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8909) Hive doesn't correctly read Parquet nested types
[ https://issues.apache.org/jira/browse/HIVE-8909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14221386#comment-14221386 ] Sergio Peña commented on HIVE-8909: --- Looks good [~rdblue]. I run the tests locally and they're running correctly as well. +1 Hive doesn't correctly read Parquet nested types Key: HIVE-8909 URL: https://issues.apache.org/jira/browse/HIVE-8909 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Ryan Blue Assignee: Ryan Blue Attachments: HIVE-8909-1.patch, HIVE-8909-2.patch, HIVE-8909.2.patch, HIVE-8909.3.patch, HIVE-8909.4.patch, parquet-test-data.tar.gz Parquet's Avro and Thrift object models don't produce the same parquet type representation for lists and maps that Hive does. In the Parquet community, we've defined what should be written and backward-compatibility rules for existing data written by parquet-avro and parquet-thrift in PARQUET-113. We need to implement those rules in the Hive Converter classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8945) Allow user to read encrypted read-only tables only if the scratch directory is encrypted
Sergio Peña created HIVE-8945: - Summary: Allow user to read encrypted read-only tables only if the scratch directory is encrypted Key: HIVE-8945 URL: https://issues.apache.org/jira/browse/HIVE-8945 Project: Hive Issue Type: Sub-task Reporter: Sergio Peña Assignee: Sergio Peña With the changes for hdfs encryption, hive creates a staging directory inside table locations. If an user access a table as read-only access, then the staging directory is created in the old scratch directory (hive.exec.scratchdir). This does not work if the table to access is encrypted for security reasons. We don't want that encrypted data is written to an unencrypted zone. But, if the scratch directory is encrypted? Then we should allow the access. This bug is to fix that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8909) Hive doesn't correctly read Parquet nested types
[ https://issues.apache.org/jira/browse/HIVE-8909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14221428#comment-14221428 ] Sergio Peña commented on HIVE-8909: --- They didn't {noformat} mvn test -Phadoop-2 -Dtest=TestCliDriver -Dqfile=parquet_array_of_optional_elements.q,parquet_array_of_required_elements.q,parquet_array_of_single_field_struct.q,parquet_array_of_structs.q,parquet_array_of_unannotated_groups.q,parquet_array_of_unannotated_primitives.q,parquet_avro_array_of_primitives.q,parquet_avro_array_of_single_field_struct.q,parquet_nested_complex.q,parquet_thrift_array_of_primitives.q,parquet_thrift_array_of_single_field_struct.q --- T E S T S --- Running org.apache.hadoop.hive.cli.TestCliDriver Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 80.949 sec - in org.apache.hadoop.hive.cli.TestCliDriver Results : Tests run: 12, Failures: 0, Errors: 0, Skipped: 0 {noformat} I run the first two tests (parquet_array_null_element,parquet_array_of_multi_field_struct) manually before running the 12 tests, and they passed. Hive doesn't correctly read Parquet nested types Key: HIVE-8909 URL: https://issues.apache.org/jira/browse/HIVE-8909 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Ryan Blue Assignee: Ryan Blue Attachments: HIVE-8909-1.patch, HIVE-8909-2.patch, HIVE-8909.2.patch, HIVE-8909.3.patch, HIVE-8909.4.patch, parquet-test-data.tar.gz Parquet's Avro and Thrift object models don't produce the same parquet type representation for lists and maps that Hive does. In the Parquet community, we've defined what should be written and backward-compatibility rules for existing data written by parquet-avro and parquet-thrift in PARQUET-113. We need to implement those rules in the Hive Converter classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8945) Allow user to read encrypted read-only tables only if the scratch directory is encrypted
[ https://issues.apache.org/jira/browse/HIVE-8945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8945: -- Status: Patch Available (was: Open) Allow user to read encrypted read-only tables only if the scratch directory is encrypted Key: HIVE-8945 URL: https://issues.apache.org/jira/browse/HIVE-8945 Project: Hive Issue Type: Sub-task Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-8945.1.patch With the changes for hdfs encryption, hive creates a staging directory inside table locations. If an user access a table as read-only access, then the staging directory is created in the old scratch directory (hive.exec.scratchdir). This does not work if the table to access is encrypted for security reasons. We don't want that encrypted data is written to an unencrypted zone. But, if the scratch directory is encrypted? Then we should allow the access. This bug is to fix that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8945) Allow user to read encrypted read-only tables only if the scratch directory is encrypted
[ https://issues.apache.org/jira/browse/HIVE-8945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8945: -- Attachment: HIVE-8945.1.patch Allow user to read encrypted read-only tables only if the scratch directory is encrypted Key: HIVE-8945 URL: https://issues.apache.org/jira/browse/HIVE-8945 Project: Hive Issue Type: Sub-task Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-8945.1.patch With the changes for hdfs encryption, hive creates a staging directory inside table locations. If an user access a table as read-only access, then the staging directory is created in the old scratch directory (hive.exec.scratchdir). This does not work if the table to access is encrypted for security reasons. We don't want that encrypted data is written to an unencrypted zone. But, if the scratch directory is encrypted? Then we should allow the access. This bug is to fix that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8909) Hive doesn't correctly read Parquet nested types
[ https://issues.apache.org/jira/browse/HIVE-8909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14221540#comment-14221540 ] Sergio Peña commented on HIVE-8909: --- +1 again :) Hive doesn't correctly read Parquet nested types Key: HIVE-8909 URL: https://issues.apache.org/jira/browse/HIVE-8909 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Ryan Blue Assignee: Ryan Blue Attachments: HIVE-8909-1.patch, HIVE-8909-2.patch, HIVE-8909.2.patch, HIVE-8909.3.patch, HIVE-8909.4.patch, HIVE-8909.5.patch, HIVE-8909.6.patch, parquet-test-data.tar.gz Parquet's Avro and Thrift object models don't produce the same parquet type representation for lists and maps that Hive does. In the Parquet community, we've defined what should be written and backward-compatibility rules for existing data written by parquet-avro and parquet-thrift in PARQUET-113. We need to implement those rules in the Hive Converter classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8926) Projections that only swap input columns are identified incorrectly as identity projections
[ https://issues.apache.org/jira/browse/HIVE-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesús Camacho Rodríguez updated HIVE-8926: -- Attachment: HIVE-8926.01.patch Re-uploading for tests... Projections that only swap input columns are identified incorrectly as identity projections --- Key: HIVE-8926 URL: https://issues.apache.org/jira/browse/HIVE-8926 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 0.15.0 Reporter: Jesús Camacho Rodríguez Assignee: Jesús Camacho Rodríguez Fix For: 0.15.0 Attachments: HIVE-8926.01.patch, HIVE-8926.01.patch, HIVE-8926.patch Projection operators that only swap the input columns in the tuples are identified as identity projections, and thus incorrectly deleted from the plan by the _identity project removal_ optimization. This bug was introduced in HIVE-8435. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8926) Projections that only swap input columns are identified incorrectly as identity projections
[ https://issues.apache.org/jira/browse/HIVE-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesús Camacho Rodríguez updated HIVE-8926: -- Attachment: (was: HIVE-8926.01.patch) Projections that only swap input columns are identified incorrectly as identity projections --- Key: HIVE-8926 URL: https://issues.apache.org/jira/browse/HIVE-8926 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 0.15.0 Reporter: Jesús Camacho Rodríguez Assignee: Jesús Camacho Rodríguez Fix For: 0.15.0 Attachments: HIVE-8926.01.patch, HIVE-8926.patch Projection operators that only swap the input columns in the tuples are identified as identity projections, and thus incorrectly deleted from the plan by the _identity project removal_ optimization. This bug was introduced in HIVE-8435. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8926) Projections that only swap input columns are identified incorrectly as identity projections
[ https://issues.apache.org/jira/browse/HIVE-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesús Camacho Rodríguez updated HIVE-8926: -- Status: In Progress (was: Patch Available) Projections that only swap input columns are identified incorrectly as identity projections --- Key: HIVE-8926 URL: https://issues.apache.org/jira/browse/HIVE-8926 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 0.15.0 Reporter: Jesús Camacho Rodríguez Assignee: Jesús Camacho Rodríguez Fix For: 0.15.0 Attachments: HIVE-8926.01.patch, HIVE-8926.patch Projection operators that only swap the input columns in the tuples are identified as identity projections, and thus incorrectly deleted from the plan by the _identity project removal_ optimization. This bug was introduced in HIVE-8435. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8926) Projections that only swap input columns are identified incorrectly as identity projections
[ https://issues.apache.org/jira/browse/HIVE-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesús Camacho Rodríguez updated HIVE-8926: -- Status: Patch Available (was: In Progress) Projections that only swap input columns are identified incorrectly as identity projections --- Key: HIVE-8926 URL: https://issues.apache.org/jira/browse/HIVE-8926 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 0.15.0 Reporter: Jesús Camacho Rodríguez Assignee: Jesús Camacho Rodríguez Fix For: 0.15.0 Attachments: HIVE-8926.01.patch, HIVE-8926.patch Projection operators that only swap the input columns in the tuples are identified as identity projections, and thus incorrectly deleted from the plan by the _identity project removal_ optimization. This bug was introduced in HIVE-8435. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8857) hive release has SNAPSHOT dependency, which is not on maven central
[ https://issues.apache.org/jira/browse/HIVE-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14222926#comment-14222926 ] André Kelpe commented on HIVE-8857: --- There are two things here: Having a SNAPSHOT dep in a release is a problem to begin with, but having a hive release with broken dependencies on maven central is even more problematic. How am I as user supposed to know, that I need to add a random snapshot repository to get my hive working? hive release has SNAPSHOT dependency, which is not on maven central --- Key: HIVE-8857 URL: https://issues.apache.org/jira/browse/HIVE-8857 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: André Kelpe I just tried building a project, which uses hive-exec as a dependency and it bails out, since hive 0.14.0 introduced a SNAPSHOT dependency to apache calcite, which is not on maven central. Do we have to include another repository now? Next to that it also seems problematic to rely on a SNAPSHOT dependency, which can change any time. {code} :compileJava Download http://repo1.maven.org/maven2/org/apache/hive/hive-exec/0.14.0/hive-exec-0.14.0.pom Download http://repo1.maven.org/maven2/org/apache/hive/hive/0.14.0/hive-0.14.0.pom Download http://repo1.maven.org/maven2/org/apache/hive/hive-ant/0.14.0/hive-ant-0.14.0.pom Download http://repo1.maven.org/maven2/org/apache/hive/hive-metastore/0.14.0/hive-metastore-0.14.0.pom Download http://repo1.maven.org/maven2/org/apache/hive/hive-shims/0.14.0/hive-shims-0.14.0.pom Download http://repo1.maven.org/maven2/org/fusesource/jansi/jansi/1.11/jansi-1.11.pom Download http://repo1.maven.org/maven2/org/fusesource/jansi/jansi-project/1.11/jansi-project-1.11.pom Download http://repo1.maven.org/maven2/org/fusesource/fusesource-pom/1.8/fusesource-pom-1.8.pom Download http://repo1.maven.org/maven2/org/apache/hive/hive-serde/0.14.0/hive-serde-0.14.0.pom Download http://repo1.maven.org/maven2/org/apache/hive/shims/hive-shims-common/0.14.0/hive-shims-common-0.14.0.pom Download http://repo1.maven.org/maven2/org/apache/hive/shims/hive-shims-common-secure/0.14.0/hive-shims-common-secure-0.14.0.pom Download http://repo1.maven.org/maven2/org/apache/hive/shims/hive-shims-0.20/0.14.0/hive-shims-0.20-0.14.0.pom Download http://repo1.maven.org/maven2/org/apache/hive/shims/hive-shims-0.20S/0.14.0/hive-shims-0.20S-0.14.0.pom Download http://repo1.maven.org/maven2/org/apache/hive/shims/hive-shims-0.23/0.14.0/hive-shims-0.23-0.14.0.pom Download http://repo1.maven.org/maven2/org/apache/hive/hive-common/0.14.0/hive-common-0.14.0.pom Download http://repo1.maven.org/maven2/org/apache/curator/curator-framework/2.6.0/curator-framework-2.6.0.pom Download http://repo1.maven.org/maven2/org/apache/curator/apache-curator/2.6.0/apache-curator-2.6.0.pom Download http://repo1.maven.org/maven2/org/apache/curator/curator-client/2.6.0/curator-client-2.6.0.pom Download http://repo1.maven.org/maven2/org/slf4j/slf4j-api/1.7.6/slf4j-api-1.7.6.pom Download http://repo1.maven.org/maven2/org/slf4j/slf4j-parent/1.7.6/slf4j-parent-1.7.6.pom FAILURE: Build failed with an exception. * What went wrong: Could not resolve all dependencies for configuration ':provided'. Could not find org.apache.calcite:calcite-core:0.9.2-incubating-SNAPSHOT. Required by: cascading:cascading-hive:1.1.0-wip-dev org.apache.hive:hive-exec:0.14.0 Could not find org.apache.calcite:calcite-avatica:0.9.2-incubating-SNAPSHOT. Required by: cascading:cascading-hive:1.1.0-wip-dev org.apache.hive:hive-exec:0.14.0 * Try: Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. BUILD FAILED Total time: 16.956 secs {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8870) errors when selecting a struct field within an array from ORC based tables
[ https://issues.apache.org/jira/browse/HIVE-8870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8870: -- Status: Open (was: Patch Available) errors when selecting a struct field within an array from ORC based tables -- Key: HIVE-8870 URL: https://issues.apache.org/jira/browse/HIVE-8870 Project: Hive Issue Type: Bug Components: File Formats, Query Processor Affects Versions: 0.13.0, 0.14.0 Environment: HDP 2.1 / HDP 2.2 (YARN, but no Tez) Reporter: Michael Haeusler Assignee: Sergio Peña Attachments: HIVE-8870.1.patch When using ORC as storage for a table, we get errors on selecting a struct field within an array. These errors do not appear with default format. {code:sql} CREATE TABLE `foobar_orc`( `uid` bigint, `elements` arraystructelementid:bigint,foo:structbar:string) STORED AS ORC; {code} When selecting from this _empty_ table, we get a direct NPE within the Hive CLI: {code:sql} SELECT elements.elementId FROM foobar_orc; -- FAILED: RuntimeException java.lang.NullPointerException {code} A more real-world query produces a RuntimeException / NullPointerException in the mapper: {code:sql} SELECT uid, element.elementId FROM foobar_orc LATERAL VIEW EXPLODE(elements) e AS element; Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) [...] Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:61) [...] FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask {code} Both queries run fine on a non-orc table: {code:sql} CREATE TABLE `foobar`( `uid` bigint, `elements` arraystructelementid:bigint,foo:structbar:string); SELECT elements.elementId FROM foobar; -- OK -- Time taken: 0.225 seconds SELECT uid, element.elementId FROM foobar LATERAL VIEW EXPLODE(elements) e AS element; -- Total MapReduce CPU Time Spent: 1 seconds 920 msec -- OK -- Time taken: 25.905 seconds {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8870) errors when selecting a struct field within an array from ORC based tables
[ https://issues.apache.org/jira/browse/HIVE-8870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8870: -- Status: Patch Available (was: Open) errors when selecting a struct field within an array from ORC based tables -- Key: HIVE-8870 URL: https://issues.apache.org/jira/browse/HIVE-8870 Project: Hive Issue Type: Bug Components: File Formats, Query Processor Affects Versions: 0.13.0, 0.14.0 Environment: HDP 2.1 / HDP 2.2 (YARN, but no Tez) Reporter: Michael Haeusler Assignee: Sergio Peña Attachments: HIVE-8870.2.patch When using ORC as storage for a table, we get errors on selecting a struct field within an array. These errors do not appear with default format. {code:sql} CREATE TABLE `foobar_orc`( `uid` bigint, `elements` arraystructelementid:bigint,foo:structbar:string) STORED AS ORC; {code} When selecting from this _empty_ table, we get a direct NPE within the Hive CLI: {code:sql} SELECT elements.elementId FROM foobar_orc; -- FAILED: RuntimeException java.lang.NullPointerException {code} A more real-world query produces a RuntimeException / NullPointerException in the mapper: {code:sql} SELECT uid, element.elementId FROM foobar_orc LATERAL VIEW EXPLODE(elements) e AS element; Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) [...] Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:61) [...] FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask {code} Both queries run fine on a non-orc table: {code:sql} CREATE TABLE `foobar`( `uid` bigint, `elements` arraystructelementid:bigint,foo:structbar:string); SELECT elements.elementId FROM foobar; -- OK -- Time taken: 0.225 seconds SELECT uid, element.elementId FROM foobar LATERAL VIEW EXPLODE(elements) e AS element; -- Total MapReduce CPU Time Spent: 1 seconds 920 msec -- OK -- Time taken: 25.905 seconds {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8870) errors when selecting a struct field within an array from ORC based tables
[ https://issues.apache.org/jira/browse/HIVE-8870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8870: -- Attachment: (was: HIVE-8870.1.patch) errors when selecting a struct field within an array from ORC based tables -- Key: HIVE-8870 URL: https://issues.apache.org/jira/browse/HIVE-8870 Project: Hive Issue Type: Bug Components: File Formats, Query Processor Affects Versions: 0.13.0, 0.14.0 Environment: HDP 2.1 / HDP 2.2 (YARN, but no Tez) Reporter: Michael Haeusler Assignee: Sergio Peña Attachments: HIVE-8870.2.patch When using ORC as storage for a table, we get errors on selecting a struct field within an array. These errors do not appear with default format. {code:sql} CREATE TABLE `foobar_orc`( `uid` bigint, `elements` arraystructelementid:bigint,foo:structbar:string) STORED AS ORC; {code} When selecting from this _empty_ table, we get a direct NPE within the Hive CLI: {code:sql} SELECT elements.elementId FROM foobar_orc; -- FAILED: RuntimeException java.lang.NullPointerException {code} A more real-world query produces a RuntimeException / NullPointerException in the mapper: {code:sql} SELECT uid, element.elementId FROM foobar_orc LATERAL VIEW EXPLODE(elements) e AS element; Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) [...] Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:61) [...] FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask {code} Both queries run fine on a non-orc table: {code:sql} CREATE TABLE `foobar`( `uid` bigint, `elements` arraystructelementid:bigint,foo:structbar:string); SELECT elements.elementId FROM foobar; -- OK -- Time taken: 0.225 seconds SELECT uid, element.elementId FROM foobar LATERAL VIEW EXPLODE(elements) e AS element; -- Total MapReduce CPU Time Spent: 1 seconds 920 msec -- OK -- Time taken: 25.905 seconds {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8870) errors when selecting a struct field within an array from ORC based tables
[ https://issues.apache.org/jira/browse/HIVE-8870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8870: -- Attachment: HIVE-8870.2.patch errors when selecting a struct field within an array from ORC based tables -- Key: HIVE-8870 URL: https://issues.apache.org/jira/browse/HIVE-8870 Project: Hive Issue Type: Bug Components: File Formats, Query Processor Affects Versions: 0.13.0, 0.14.0 Environment: HDP 2.1 / HDP 2.2 (YARN, but no Tez) Reporter: Michael Haeusler Assignee: Sergio Peña Attachments: HIVE-8870.2.patch When using ORC as storage for a table, we get errors on selecting a struct field within an array. These errors do not appear with default format. {code:sql} CREATE TABLE `foobar_orc`( `uid` bigint, `elements` arraystructelementid:bigint,foo:structbar:string) STORED AS ORC; {code} When selecting from this _empty_ table, we get a direct NPE within the Hive CLI: {code:sql} SELECT elements.elementId FROM foobar_orc; -- FAILED: RuntimeException java.lang.NullPointerException {code} A more real-world query produces a RuntimeException / NullPointerException in the mapper: {code:sql} SELECT uid, element.elementId FROM foobar_orc LATERAL VIEW EXPLODE(elements) e AS element; Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) [...] Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:61) [...] FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask {code} Both queries run fine on a non-orc table: {code:sql} CREATE TABLE `foobar`( `uid` bigint, `elements` arraystructelementid:bigint,foo:structbar:string); SELECT elements.elementId FROM foobar; -- OK -- Time taken: 0.225 seconds SELECT uid, element.elementId FROM foobar LATERAL VIEW EXPLODE(elements) e AS element; -- Total MapReduce CPU Time Spent: 1 seconds 920 msec -- OK -- Time taken: 25.905 seconds {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6914) parquet-hive cannot write nested map (map value is map)
[ https://issues.apache.org/jira/browse/HIVE-6914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-6914: -- Attachment: HIVE-6914.4.patch Here's the file with the .q.out updated parquet-hive cannot write nested map (map value is map) --- Key: HIVE-6914 URL: https://issues.apache.org/jira/browse/HIVE-6914 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.12.0, 0.13.0 Reporter: Tongjie Chen Assignee: Sergio Peña Labels: parquet, serialization Attachments: HIVE-6914.1.patch, HIVE-6914.1.patch, HIVE-6914.2.patch, HIVE-6914.3.patch, HIVE-6914.4.patch, NestedMap.parquet // table schema (identical for both plain text version and parquet version) desc hive desc text_mmap; m map // sample nested map entry {level1:{level2_key1:value1,level2_key2:value2}} The following query will fail, insert overwrite table parquet_mmap select * from text_mmap; Caused by: parquet.io.ParquetEncodingException: This should be an ArrayWritable or MapWritable: org.apache.hadoop.hive.ql.io.parquet.writable.BinaryWritable@f2f8106 at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:85) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeArray(DataWritableWriter.java:118) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:80) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:82) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:55) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31) at parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:115) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:81) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:37) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:77) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:90) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:622) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540) ... 9 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6914) parquet-hive cannot write nested map (map value is map)
[ https://issues.apache.org/jira/browse/HIVE-6914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-6914: -- Status: Open (was: Patch Available) parquet-hive cannot write nested map (map value is map) --- Key: HIVE-6914 URL: https://issues.apache.org/jira/browse/HIVE-6914 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.13.0, 0.12.0 Reporter: Tongjie Chen Assignee: Sergio Peña Labels: parquet, serialization Attachments: HIVE-6914.1.patch, HIVE-6914.1.patch, HIVE-6914.2.patch, HIVE-6914.3.patch, HIVE-6914.4.patch, NestedMap.parquet // table schema (identical for both plain text version and parquet version) desc hive desc text_mmap; m map // sample nested map entry {level1:{level2_key1:value1,level2_key2:value2}} The following query will fail, insert overwrite table parquet_mmap select * from text_mmap; Caused by: parquet.io.ParquetEncodingException: This should be an ArrayWritable or MapWritable: org.apache.hadoop.hive.ql.io.parquet.writable.BinaryWritable@f2f8106 at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:85) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeArray(DataWritableWriter.java:118) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:80) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:82) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:55) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31) at parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:115) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:81) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:37) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:77) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:90) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:622) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540) ... 9 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6914) parquet-hive cannot write nested map (map value is map)
[ https://issues.apache.org/jira/browse/HIVE-6914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-6914: -- Status: Patch Available (was: Open) parquet-hive cannot write nested map (map value is map) --- Key: HIVE-6914 URL: https://issues.apache.org/jira/browse/HIVE-6914 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.13.0, 0.12.0 Reporter: Tongjie Chen Assignee: Sergio Peña Labels: parquet, serialization Attachments: HIVE-6914.1.patch, HIVE-6914.1.patch, HIVE-6914.2.patch, HIVE-6914.3.patch, HIVE-6914.4.patch, NestedMap.parquet // table schema (identical for both plain text version and parquet version) desc hive desc text_mmap; m map // sample nested map entry {level1:{level2_key1:value1,level2_key2:value2}} The following query will fail, insert overwrite table parquet_mmap select * from text_mmap; Caused by: parquet.io.ParquetEncodingException: This should be an ArrayWritable or MapWritable: org.apache.hadoop.hive.ql.io.parquet.writable.BinaryWritable@f2f8106 at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:85) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeArray(DataWritableWriter.java:118) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:80) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:82) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:55) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31) at parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:115) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:81) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:37) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:77) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:90) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:622) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540) ... 9 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8960) ParsingException in the WHERE statement with a Sub Query
Rémy SAISSY created HIVE-8960: - Summary: ParsingException in the WHERE statement with a Sub Query Key: HIVE-8960 URL: https://issues.apache.org/jira/browse/HIVE-8960 Project: Hive Issue Type: Bug Components: Parser Affects Versions: 0.13.0 Environment: Secured HDP 2.1.3 with Hive 0.13.0 Reporter: Rémy SAISSY Comparison with a Sub query in a WHERE statement does not work. Given that id_chargement is an integer: USE db1; SELECT * FROM tbl1 a WHERE a.id_chargement (SELECT b.id_chargement FROM tbl2 b); Returns the following parsing error: Error: Error while compiling statement: FAILED: ParseException line 1:88 cannot recognize input near 'SELECT' 'b' '.' in expression specification (state=42000,code=4) java.sql.SQLException: Error while compiling statement: FAILED: ParseException line 1:88 cannot recognize input near 'SELECT' 'b' '.' in expression specification at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:121) at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:109) at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:231) at org.apache.hive.beeline.Commands.execute(Commands.java:736) at org.apache.hive.beeline.Commands.sql(Commands.java:657) at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:804) at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:659) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:368) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:351) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8960) ParsingException in the WHERE statement with a Sub Query
[ https://issues.apache.org/jira/browse/HIVE-8960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rémy SAISSY updated HIVE-8960: -- Description: Comparison with a Sub query in a WHERE statement does not work. Given that id_chargement is an integer: USE db1; SELECT * FROM tbl1 a WHERE a.id_chargement (SELECT MAX(b.id_chargement) FROM tbl2 b); or SELECT * FROM tbl1 a WHERE a.id_chargement (SELECT b.id_chargement FROM tbl2 b LIMIT 1); Both return the following parsing error: Error: Error while compiling statement: FAILED: ParseException line 1:88 cannot recognize input near 'SELECT' 'b' '.' in expression specification (state=42000,code=4) java.sql.SQLException: Error while compiling statement: FAILED: ParseException line 1:88 cannot recognize input near 'SELECT' 'b' '.' in expression specification at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:121) at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:109) at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:231) at org.apache.hive.beeline.Commands.execute(Commands.java:736) at org.apache.hive.beeline.Commands.sql(Commands.java:657) at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:804) at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:659) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:368) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:351) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) was: Comparison with a Sub query in a WHERE statement does not work. Given that id_chargement is an integer: USE db1; SELECT * FROM tbl1 a WHERE a.id_chargement (SELECT b.id_chargement FROM tbl2 b); Returns the following parsing error: Error: Error while compiling statement: FAILED: ParseException line 1:88 cannot recognize input near 'SELECT' 'b' '.' in expression specification (state=42000,code=4) java.sql.SQLException: Error while compiling statement: FAILED: ParseException line 1:88 cannot recognize input near 'SELECT' 'b' '.' in expression specification at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:121) at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:109) at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:231) at org.apache.hive.beeline.Commands.execute(Commands.java:736) at org.apache.hive.beeline.Commands.sql(Commands.java:657) at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:804) at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:659) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:368) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:351) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) ParsingException in the WHERE statement with a Sub Query Key: HIVE-8960 URL: https://issues.apache.org/jira/browse/HIVE-8960 Project: Hive Issue Type: Bug Components: Parser Affects Versions: 0.13.0 Environment: Secured HDP 2.1.3 with Hive 0.13.0 Reporter: Rémy SAISSY Comparison with a Sub query in a WHERE statement does not work. Given that id_chargement is an integer: USE db1; SELECT * FROM tbl1 a WHERE a.id_chargement (SELECT MAX(b.id_chargement) FROM tbl2 b); or SELECT * FROM tbl1 a WHERE a.id_chargement (SELECT b.id_chargement FROM tbl2 b LIMIT 1); Both return the following parsing error: Error: Error while compiling statement: FAILED: ParseException line 1:88 cannot recognize input near 'SELECT' 'b' '.' in expression specification (state=42000,code=4) java.sql.SQLException: Error while compiling statement: FAILED: ParseException line 1:88 cannot recognize input near 'SELECT' 'b' '.' in expression specification at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:121) at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:109) at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:231) at org.apache.hive.beeline.Commands.execute(Commands.java:736) at org.apache.hive.beeline.Commands.sql
[jira] [Commented] (HIVE-8870) errors when selecting a struct field within an array from ORC based tables
[ https://issues.apache.org/jira/browse/HIVE-8870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224751#comment-14224751 ] Sergio Peña commented on HIVE-8870: --- Hi [~hagleitn], I've heard you have experience with ORC. Could you give me a quick review on the attached patch? It is a trivial change. errors when selecting a struct field within an array from ORC based tables -- Key: HIVE-8870 URL: https://issues.apache.org/jira/browse/HIVE-8870 Project: Hive Issue Type: Bug Components: File Formats, Query Processor Affects Versions: 0.13.0, 0.14.0 Environment: HDP 2.1 / HDP 2.2 (YARN, but no Tez) Reporter: Michael Haeusler Assignee: Sergio Peña Attachments: HIVE-8870.2.patch When using ORC as storage for a table, we get errors on selecting a struct field within an array. These errors do not appear with default format. {code:sql} CREATE TABLE `foobar_orc`( `uid` bigint, `elements` arraystructelementid:bigint,foo:structbar:string) STORED AS ORC; {code} When selecting from this _empty_ table, we get a direct NPE within the Hive CLI: {code:sql} SELECT elements.elementId FROM foobar_orc; -- FAILED: RuntimeException java.lang.NullPointerException {code} A more real-world query produces a RuntimeException / NullPointerException in the mapper: {code:sql} SELECT uid, element.elementId FROM foobar_orc LATERAL VIEW EXPLODE(elements) e AS element; Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) [...] Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:61) [...] FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask {code} Both queries run fine on a non-orc table: {code:sql} CREATE TABLE `foobar`( `uid` bigint, `elements` arraystructelementid:bigint,foo:structbar:string); SELECT elements.elementId FROM foobar; -- OK -- Time taken: 0.225 seconds SELECT uid, element.elementId FROM foobar LATERAL VIEW EXPLODE(elements) e AS element; -- Total MapReduce CPU Time Spent: 1 seconds 920 msec -- OK -- Time taken: 25.905 seconds {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7300) When creating database by specifying location, .db is not created
[ https://issues.apache.org/jira/browse/HIVE-7300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225285#comment-14225285 ] Sergio Peña commented on HIVE-7300: --- Hi [~sourabhpotnis], The LOCATION keyword specifies to Hive where the database you're creating will reside. In the example you wrote, you are specifying the location as /addh0010/hive/addh0011/warehouse, so all tables will be stored in that location. If you want to create a different location for the database, then you will need to run this: # hdfs dfs -mkdir /addh0010/hive/addh0011/warehouse/test_loc.db On Hive: # CREATE DATABASE test_loc LOCATION '/addh0010/hive/addh0011/warehouse/test_loc.db'; When creating database by specifying location, .db is not created - Key: HIVE-7300 URL: https://issues.apache.org/jira/browse/HIVE-7300 Project: Hive Issue Type: Bug Reporter: sourabh potnis Labels: .db, database, location When I create a database without specifying location: e.g. create database test; it will get created in /apps/hive/warehouse/ as /apps/hive/warehouse/test.db But when I create database by specifying location: e.g. create database test_loc location '/addh0010/hive/addh0011/warehouse'; Database will be created but /addh0010/hive/addh0011/warehouse/test_loc.db does not get created. So if user tries to create 2 tables with same name in two different databases at same location. We are not sure if table is created. So when database is created with location, .db directory should be created with that database name at that location. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-7300) When creating database by specifying location, .db is not created
[ https://issues.apache.org/jira/browse/HIVE-7300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña resolved HIVE-7300. --- Resolution: Invalid When creating database by specifying location, .db is not created - Key: HIVE-7300 URL: https://issues.apache.org/jira/browse/HIVE-7300 Project: Hive Issue Type: Bug Reporter: sourabh potnis Labels: .db, database, location When I create a database without specifying location: e.g. create database test; it will get created in /apps/hive/warehouse/ as /apps/hive/warehouse/test.db But when I create database by specifying location: e.g. create database test_loc location '/addh0010/hive/addh0011/warehouse'; Database will be created but /addh0010/hive/addh0011/warehouse/test_loc.db does not get created. So if user tries to create 2 tables with same name in two different databases at same location. We are not sure if table is created. So when database is created with location, .db directory should be created with that database name at that location. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-8974) Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames)
[ https://issues.apache.org/jira/browse/HIVE-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesús Camacho Rodríguez reassigned HIVE-8974: - Assignee: Jesús Camacho Rodríguez (was: Gunther Hagleitner) Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames) Key: HIVE-8974 URL: https://issues.apache.org/jira/browse/HIVE-8974 Project: Hive Issue Type: Task Reporter: Julian Hyde Assignee: Jesús Camacho Rodríguez Calcite recently (after 0.9.2, before 1.0.0) re-organized its package structure and renamed a lot of classes. CALCITE-296 has the details, including a description of the before:after mapping. This task is to upgrade to the version of Calcite that has the renamed packages. There is a 1.0.0-SNAPSHOT in Apache nexus. Calcite functionality has not changed significantly, so it should be straightforward to rename. This task should be completed ASAP, before Calcite moves on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8987) slf4j classpath collision when starting hive metastore
André Kelpe created HIVE-8987: - Summary: slf4j classpath collision when starting hive metastore Key: HIVE-8987 URL: https://issues.apache.org/jira/browse/HIVE-8987 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Environment: Apache Hadoop 2.5.2 and Apache Hive 0.14 Reporter: André Kelpe The latest release introduced a collision on the classpath. When I start the metatstore, I see an slf4j error: {code} apache-hive-0.14.0-bin/bin/hive --service metastore Starting Hive Metastore Server SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/hadoop-2.5.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/vagrant/apache-hive-0.14.0-bin/lib/hive-jdbc-0.14.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8870) errors when selecting a struct field within an array from ORC based tables
[ https://issues.apache.org/jira/browse/HIVE-8870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232140#comment-14232140 ] Sergio Peña commented on HIVE-8870: --- Hi [~owen.omalley], Could you help me reviewing this simple patch? [~brocknoland] told me you have experience with ORC. Thanks errors when selecting a struct field within an array from ORC based tables -- Key: HIVE-8870 URL: https://issues.apache.org/jira/browse/HIVE-8870 Project: Hive Issue Type: Bug Components: File Formats, Query Processor Affects Versions: 0.13.0, 0.14.0 Environment: HDP 2.1 / HDP 2.2 (YARN, but no Tez) Reporter: Michael Haeusler Assignee: Sergio Peña Attachments: HIVE-8870.2.patch When using ORC as storage for a table, we get errors on selecting a struct field within an array. These errors do not appear with default format. {code:sql} CREATE TABLE `foobar_orc`( `uid` bigint, `elements` arraystructelementid:bigint,foo:structbar:string) STORED AS ORC; {code} When selecting from this _empty_ table, we get a direct NPE within the Hive CLI: {code:sql} SELECT elements.elementId FROM foobar_orc; -- FAILED: RuntimeException java.lang.NullPointerException {code} A more real-world query produces a RuntimeException / NullPointerException in the mapper: {code:sql} SELECT uid, element.elementId FROM foobar_orc LATERAL VIEW EXPLODE(elements) e AS element; Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) [...] Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:61) [...] FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask {code} Both queries run fine on a non-orc table: {code:sql} CREATE TABLE `foobar`( `uid` bigint, `elements` arraystructelementid:bigint,foo:structbar:string); SELECT elements.elementId FROM foobar; -- OK -- Time taken: 0.225 seconds SELECT uid, element.elementId FROM foobar LATERAL VIEW EXPLODE(elements) e AS element; -- Total MapReduce CPU Time Spent: 1 seconds 920 msec -- OK -- Time taken: 25.905 seconds {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8870) errors when selecting a struct field within an array from ORC based tables
[ https://issues.apache.org/jira/browse/HIVE-8870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8870: -- Attachment: (was: HIVE-8870.2.patch) errors when selecting a struct field within an array from ORC based tables -- Key: HIVE-8870 URL: https://issues.apache.org/jira/browse/HIVE-8870 Project: Hive Issue Type: Bug Components: File Formats, Query Processor Affects Versions: 0.13.0, 0.14.0 Environment: HDP 2.1 / HDP 2.2 (YARN, but no Tez) Reporter: Michael Haeusler Assignee: Sergio Peña Attachments: HIVE-8870.3.patch When using ORC as storage for a table, we get errors on selecting a struct field within an array. These errors do not appear with default format. {code:sql} CREATE TABLE `foobar_orc`( `uid` bigint, `elements` arraystructelementid:bigint,foo:structbar:string) STORED AS ORC; {code} When selecting from this _empty_ table, we get a direct NPE within the Hive CLI: {code:sql} SELECT elements.elementId FROM foobar_orc; -- FAILED: RuntimeException java.lang.NullPointerException {code} A more real-world query produces a RuntimeException / NullPointerException in the mapper: {code:sql} SELECT uid, element.elementId FROM foobar_orc LATERAL VIEW EXPLODE(elements) e AS element; Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) [...] Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:61) [...] FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask {code} Both queries run fine on a non-orc table: {code:sql} CREATE TABLE `foobar`( `uid` bigint, `elements` arraystructelementid:bigint,foo:structbar:string); SELECT elements.elementId FROM foobar; -- OK -- Time taken: 0.225 seconds SELECT uid, element.elementId FROM foobar LATERAL VIEW EXPLODE(elements) e AS element; -- Total MapReduce CPU Time Spent: 1 seconds 920 msec -- OK -- Time taken: 25.905 seconds {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8870) errors when selecting a struct field within an array from ORC based tables
[ https://issues.apache.org/jira/browse/HIVE-8870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8870: -- Status: Patch Available (was: Open) errors when selecting a struct field within an array from ORC based tables -- Key: HIVE-8870 URL: https://issues.apache.org/jira/browse/HIVE-8870 Project: Hive Issue Type: Bug Components: File Formats, Query Processor Affects Versions: 0.13.0, 0.14.0 Environment: HDP 2.1 / HDP 2.2 (YARN, but no Tez) Reporter: Michael Haeusler Assignee: Sergio Peña Attachments: HIVE-8870.3.patch When using ORC as storage for a table, we get errors on selecting a struct field within an array. These errors do not appear with default format. {code:sql} CREATE TABLE `foobar_orc`( `uid` bigint, `elements` arraystructelementid:bigint,foo:structbar:string) STORED AS ORC; {code} When selecting from this _empty_ table, we get a direct NPE within the Hive CLI: {code:sql} SELECT elements.elementId FROM foobar_orc; -- FAILED: RuntimeException java.lang.NullPointerException {code} A more real-world query produces a RuntimeException / NullPointerException in the mapper: {code:sql} SELECT uid, element.elementId FROM foobar_orc LATERAL VIEW EXPLODE(elements) e AS element; Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) [...] Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:61) [...] FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask {code} Both queries run fine on a non-orc table: {code:sql} CREATE TABLE `foobar`( `uid` bigint, `elements` arraystructelementid:bigint,foo:structbar:string); SELECT elements.elementId FROM foobar; -- OK -- Time taken: 0.225 seconds SELECT uid, element.elementId FROM foobar LATERAL VIEW EXPLODE(elements) e AS element; -- Total MapReduce CPU Time Spent: 1 seconds 920 msec -- OK -- Time taken: 25.905 seconds {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8870) errors when selecting a struct field within an array from ORC based tables
[ https://issues.apache.org/jira/browse/HIVE-8870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8870: -- Status: Open (was: Patch Available) errors when selecting a struct field within an array from ORC based tables -- Key: HIVE-8870 URL: https://issues.apache.org/jira/browse/HIVE-8870 Project: Hive Issue Type: Bug Components: File Formats, Query Processor Affects Versions: 0.13.0, 0.14.0 Environment: HDP 2.1 / HDP 2.2 (YARN, but no Tez) Reporter: Michael Haeusler Assignee: Sergio Peña Attachments: HIVE-8870.3.patch When using ORC as storage for a table, we get errors on selecting a struct field within an array. These errors do not appear with default format. {code:sql} CREATE TABLE `foobar_orc`( `uid` bigint, `elements` arraystructelementid:bigint,foo:structbar:string) STORED AS ORC; {code} When selecting from this _empty_ table, we get a direct NPE within the Hive CLI: {code:sql} SELECT elements.elementId FROM foobar_orc; -- FAILED: RuntimeException java.lang.NullPointerException {code} A more real-world query produces a RuntimeException / NullPointerException in the mapper: {code:sql} SELECT uid, element.elementId FROM foobar_orc LATERAL VIEW EXPLODE(elements) e AS element; Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) [...] Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:61) [...] FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask {code} Both queries run fine on a non-orc table: {code:sql} CREATE TABLE `foobar`( `uid` bigint, `elements` arraystructelementid:bigint,foo:structbar:string); SELECT elements.elementId FROM foobar; -- OK -- Time taken: 0.225 seconds SELECT uid, element.elementId FROM foobar LATERAL VIEW EXPLODE(elements) e AS element; -- Total MapReduce CPU Time Spent: 1 seconds 920 msec -- OK -- Time taken: 25.905 seconds {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8870) errors when selecting a struct field within an array from ORC based tables
[ https://issues.apache.org/jira/browse/HIVE-8870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8870: -- Attachment: HIVE-8870.3.patch errors when selecting a struct field within an array from ORC based tables -- Key: HIVE-8870 URL: https://issues.apache.org/jira/browse/HIVE-8870 Project: Hive Issue Type: Bug Components: File Formats, Query Processor Affects Versions: 0.13.0, 0.14.0 Environment: HDP 2.1 / HDP 2.2 (YARN, but no Tez) Reporter: Michael Haeusler Assignee: Sergio Peña Attachments: HIVE-8870.3.patch When using ORC as storage for a table, we get errors on selecting a struct field within an array. These errors do not appear with default format. {code:sql} CREATE TABLE `foobar_orc`( `uid` bigint, `elements` arraystructelementid:bigint,foo:structbar:string) STORED AS ORC; {code} When selecting from this _empty_ table, we get a direct NPE within the Hive CLI: {code:sql} SELECT elements.elementId FROM foobar_orc; -- FAILED: RuntimeException java.lang.NullPointerException {code} A more real-world query produces a RuntimeException / NullPointerException in the mapper: {code:sql} SELECT uid, element.elementId FROM foobar_orc LATERAL VIEW EXPLODE(elements) e AS element; Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) [...] Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:61) [...] FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask {code} Both queries run fine on a non-orc table: {code:sql} CREATE TABLE `foobar`( `uid` bigint, `elements` arraystructelementid:bigint,foo:structbar:string); SELECT elements.elementId FROM foobar; -- OK -- Time taken: 0.225 seconds SELECT uid, element.elementId FROM foobar LATERAL VIEW EXPLODE(elements) e AS element; -- Total MapReduce CPU Time Spent: 1 seconds 920 msec -- OK -- Time taken: 25.905 seconds {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)