[jira] [Assigned] (HIVE-10515) Create tests to cover existing (supported) Hive CLI functionality
[ https://issues.apache.org/jira/browse/HIVE-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu reassigned HIVE-10515: --- Assignee: Ferdinand Xu > Create tests to cover existing (supported) Hive CLI functionality > - > > Key: HIVE-10515 > URL: https://issues.apache.org/jira/browse/HIVE-10515 > Project: Hive > Issue Type: Sub-task > Components: CLI >Affects Versions: 0.10.0 >Reporter: Xuefu Zhang >Assignee: Ferdinand Xu > > After removing HiveServer1, Hive CLI's functionality is reduced to its > original use case, a thick client application. Let's identify this so that we > maintain it when implementation is changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10623) Implement hive cli options using beeline functionality
[ https://issues.apache.org/jira/browse/HIVE-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-10623: Attachment: HIVE-10623.patch Hi [~xuefuz], could you help review this jira? Thank yoU! > Implement hive cli options using beeline functionality > -- > > Key: HIVE-10623 > URL: https://issues.apache.org/jira/browse/HIVE-10623 > Project: Hive > Issue Type: Sub-task > Components: CLI >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu > Attachments: HIVE-10623.patch > > > We need to support the original hive cli options for the purpose of backwards > compatibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9644) CASE comparison operator rotation optimization
[ https://issues.apache.org/jira/browse/HIVE-9644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-9644: --- Attachment: HIVE-9644.2.patch Extended patch with folding of when udf. > CASE comparison operator rotation optimization > -- > > Key: HIVE-9644 > URL: https://issues.apache.org/jira/browse/HIVE-9644 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Affects Versions: 0.14.0, 1.0.0, 1.1.0 >Reporter: Gopal V >Assignee: Ashutosh Chauhan > Attachments: HIVE-9644.1.patch, HIVE-9644.2.patch, HIVE-9644.patch > > > Constant folding for queries don't kick in for some automatically generated > query patterns which look like this. > {code} > hive> explain select count(1) from store_sales where (case ss_sold_date when > '1998-01-01' then 1 else null end)=1; > {code} > This should get rewritten by pushing the equality into the case branches. > {code} > select count(1) from store_sales where (case ss_sold_date when '1998-01-01' > then 1=1 else null=1 end); > {code} > Ending up with a simplified filter condition, resolving itself as > {code} > select count(1) from store_sales where ss_sold_date= '1998-01-01' ; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8065) Support HDFS encryption functionality on Hive
[ https://issues.apache.org/jira/browse/HIVE-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529998#comment-14529998 ] Brock Noland commented on HIVE-8065: In that case the results of the query are staged in ez1. > Support HDFS encryption functionality on Hive > - > > Key: HIVE-8065 > URL: https://issues.apache.org/jira/browse/HIVE-8065 > Project: Hive > Issue Type: Improvement >Affects Versions: 0.13.1 >Reporter: Sergio Peña >Assignee: Sergio Peña > Labels: Hive-Scrum > > The new encryption support on HDFS makes Hive incompatible and unusable when > this feature is used. > HDFS encryption is designed so that an user can configure different > encryption zones (or directories) for multi-tenant environments. An > encryption zone has an exclusive encryption key, such as AES-128 or AES-256. > Because of security compliance, the HDFS does not allow to move/rename files > between encryption zones. Renames are allowed only inside the same encryption > zone. A copy is allowed between encryption zones. > See HDFS-6134 for more details about HDFS encryption design. > Hive currently uses a scratch directory (like /tmp/$user/$random). This > scratch directory is used for the output of intermediate data (between MR > jobs) and for the final output of the hive query which is later moved to the > table directory location. > If Hive tables are in different encryption zones than the scratch directory, > then Hive won't be able to renames those files/directories, and it will make > Hive unusable. > To handle this problem, we can change the scratch directory of the > query/statement to be inside the same encryption zone of the table directory > location. This way, the renaming process will be successful. > Also, for statements that move files between encryption zones (i.e. LOAD > DATA), a copy may be executed instead of a rename. This will cause an > overhead when copying large data files, but it won't break the encryption on > Hive. > Another security thing to consider is when using joins selects. If Hive joins > different tables with different encryption key strengths, then the results of > the select might break the security compliance of the tables. Let's say two > tables with 128 bits and 256 bits encryption are joined, then the temporary > results might be stored in the 128 bits encryption zone. This will conflict > with the table encrypted with 256 bits temporary. > To fix this, Hive should be able to select the scratch directory that is more > secured/encrypted in order to save the intermediate data temporary with no > compliance issues. > For instance: > {noformat} > SELECT * FROM table-aes128 t1 JOIN table-aes256 t2 WHERE t1.id == t2.id; > {noformat} > - This should use a scratch directory (or staging directory) inside the > table-aes256 table location. > {noformat} > INSERT OVERWRITE TABLE table-unencrypted SELECT * FROM table-aes1; > {noformat} > - This should use a scratch directory inside the table-aes1 location. > {noformat} > FROM table-unencrypted > INSERT OVERWRITE TABLE table-aes128 SELECT id, name > INSERT OVERWRITE TABLE table-aes256 SELECT id, name > {noformat} > - This should use a scratch directory on each of the tables locations. > - The first SELECT will have its scratch directory on table-aes128 directory. > - The second SELECT will have its scratch directory on table-aes256 directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10592) ORC file dump in JSON format
[ https://issues.apache.org/jira/browse/HIVE-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529991#comment-14529991 ] Hive QA commented on HIVE-10592: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12730381/HIVE-10592.3.patch {color:red}ERROR:{color} -1 due to 24 failed/errored test(s), 8901 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_parts org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_unencrypted_tbl org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_with_different_encryption_keys org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_load_data_to_encrypted_tables org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_select_read_only_encrypted_tbl org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_disallow_transform org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_droppartition org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_sba_drop_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_alterpart_loc org.apache.hadoop.hive.ql.security.TestStorageBasedClientSideAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropDatabase org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropTable org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropView org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProviderWithACL.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbFailure org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbSuccess org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableFailure org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableSuccess org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessing org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessingCustomSetWhitelistAppend {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3747/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3747/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3747/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 24 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12730381 - PreCommit-HIVE-TRUNK-Build > ORC file dump in JSON format > > > Key: HIVE-10592 > URL: https://issues.apache.org/jira/browse/HIVE-10592 > Project: Hive > Issue Type: New Feature >Affects Versions: 1.3.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-10592.1.patch, HIVE-10592.2.patch, > HIVE-10592.3.patch, HIVE-10592.4.patch > > > ORC file dump uses custom format. Will be useful to dump ORC metadata in json > format so that other tools can be built on top it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10609) Vectorization : Q64 fails with ClassCastException
[ https://issues.apache.org/jira/browse/HIVE-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529972#comment-14529972 ] Matt McCline commented on HIVE-10609: - This doesn't fail on my combined build of HIVE-9743 and HIVE-10565. Will verify again when those JIRAs go in. > Vectorization : Q64 fails with ClassCastException > - > > Key: HIVE-10609 > URL: https://issues.apache.org/jira/browse/HIVE-10609 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 1.2.0 >Reporter: Mostafa Mokhtar >Assignee: Matt McCline > Fix For: 1.2.0 > > > TPC-DS Q64 fails with ClassCastException. > Query > {code} > select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number > ,cs1.b_streen_name ,cs1.b_city > ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city > ,cs1.c_zip ,cs1.syear ,cs1.cnt > ,cs1.s1 ,cs1.s2 ,cs1.s3 > ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt > from > (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as > store_name > ,s_zip as store_zip ,ad1.ca_street_number as b_street_number > ,ad1.ca_street_name as b_streen_name > ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as > c_street_number > ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip > as c_zip > ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) > as cnt > ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 > ,sum(ss_coupon_amt) as s3 > FROM store_sales > JOIN store_returns ON store_sales.ss_item_sk = > store_returns.sr_item_sk and store_sales.ss_ticket_number = > store_returns.sr_ticket_number > JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk > JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk > JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk > JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk > JOIN store ON store_sales.ss_store_sk = store.s_store_sk > JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= > cd1.cd_demo_sk > JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = > cd2.cd_demo_sk > JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk > JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = > hd1.hd_demo_sk > JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = > hd2.hd_demo_sk > JOIN customer_address ad1 ON store_sales.ss_addr_sk = > ad1.ca_address_sk > JOIN customer_address ad2 ON customer.c_current_addr_sk = > ad2.ca_address_sk > JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk > JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk > JOIN item ON store_sales.ss_item_sk = item.i_item_sk > JOIN > (select cs_item_sk > ,sum(cs_ext_list_price) as > sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund > from catalog_sales JOIN catalog_returns > ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk > and catalog_sales.cs_order_number = catalog_returns.cr_order_number > group by cs_item_sk > having > sum(cs_ext_list_price)>2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit)) > cs_ui > ON store_sales.ss_item_sk = cs_ui.cs_item_sk > WHERE > cd1.cd_marital_status <> cd2.cd_marital_status and > i_color in ('maroon','burnished','dim','steel','navajo','chocolate') > and > i_current_price between 35 and 35 + 10 and > i_current_price between 35 + 1 and 35 + 15 > group by i_product_name ,i_item_sk ,s_store_name ,s_zip ,ad1.ca_street_number >,ad1.ca_street_name ,ad1.ca_city ,ad1.ca_zip ,ad2.ca_street_number >,ad2.ca_street_name ,ad2.ca_city ,ad2.ca_zip ,d1.d_year ,d2.d_year > ,d3.d_year > ) cs1 > JOIN > (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as > store_name > ,s_zip as store_zip ,ad1.ca_street_number as b_street_number > ,ad1.ca_street_name as b_streen_name > ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as > c_street_number > ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip > as c_zip > ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) > as cnt > ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 > ,sum(ss_coupon_amt) as s3 > FROM store_sales > JOIN store_returns ON store_sales.ss_item_sk = > store_returns.sr_item_sk and store_sales.ss_ticket_number = > store_returns.sr_ticket_number > JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk > JOIN
[jira] [Commented] (HIVE-10618) Fix invocation of toString on byteArray in VerifyFast (250, 254)
[ https://issues.apache.org/jira/browse/HIVE-10618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529958#comment-14529958 ] Prasanth Jayachandran commented on HIVE-10618: -- +1 > Fix invocation of toString on byteArray in VerifyFast (250, 254) > > > Key: HIVE-10618 > URL: https://issues.apache.org/jira/browse/HIVE-10618 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Alexander Pivovarov >Assignee: Alexander Pivovarov >Priority: Minor > Attachments: rb33877.patch > > > Arrays.toString(byteArray) can be used to convert byte[] to string -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10620) ZooKeeperHiveLock overrides equal() method but not hashcode()
[ https://issues.apache.org/jira/browse/HIVE-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529931#comment-14529931 ] Ashutosh Chauhan commented on HIVE-10620: - +1 > ZooKeeperHiveLock overrides equal() method but not hashcode() > - > > Key: HIVE-10620 > URL: https://issues.apache.org/jira/browse/HIVE-10620 > Project: Hive > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Attachments: HIVE-10620.patch > > > ZooKeeperHiveLock overrides the public boolean equals(Object o) method but > does not for public int hashCode(). It violates the Java contract and may > cause unexpected results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10621) serde typeinfo equals methods are not symmetric
[ https://issues.apache.org/jira/browse/HIVE-10621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10621: --- Attachment: rb33880.patch patch #1 > serde typeinfo equals methods are not symmetric > --- > > Key: HIVE-10621 > URL: https://issues.apache.org/jira/browse/HIVE-10621 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Alexander Pivovarov >Assignee: Alexander Pivovarov >Priority: Minor > Attachments: rb33880.patch > > > correct equals method implementation should start with > {code} > if (this == other) { > return true; > } > if (other == null || getClass() != other.getClass()) { > return false; > } > {code} > DecimalTypeInfo, PrimitiveTypeInfo, VarcharTypeInfo, CharTypeInfo, > HiveDecimalWritable equals method implementation starts with > {code} > if (other == null || !(other instanceof )) { > return false > } > {code} > - first of all check for null is redundant > - the second issue is that "other instanceof " check is not > symmetric. > contract of equals() implies that, a.equals(b) is true if and only if > b.equals(a) is true > Current implementation violates this contract. > e.g. > DecimalTypeInfo instanceof PrimitiveTypeInfo is true > but > PrimitiveTypeInfo instanceof DecimalTypeInfo is false > See more details here > http://stackoverflow.com/questions/6518534/equals-method-overrides-equals-in-superclass-and-may-not-be-symmetric -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10621) serde typeinfo equals methods are not symmetric
[ https://issues.apache.org/jira/browse/HIVE-10621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10621: --- Description: correct equals method implementation should start with {code} if (this == other) { return true; } if (other == null || getClass() != other.getClass()) { return false; } {code} DecimalTypeInfo, PrimitiveTypeInfo, VarcharTypeInfo, CharTypeInfo, HiveDecimalWritable equals method implementation starts with {code} if (other == null || !(other instanceof )) { return false } {code} - first of all check for null is redundant - the second issue is that "other instanceof " check is not symmetric. contract of equals() implies that, a.equals(b) is true if and only if b.equals(a) is true Current implementation violates this contract. e.g. DecimalTypeInfo instanceof PrimitiveTypeInfo is true but PrimitiveTypeInfo instanceof DecimalTypeInfo is false See more details here http://stackoverflow.com/questions/6518534/equals-method-overrides-equals-in-superclass-and-may-not-be-symmetric > serde typeinfo equals methods are not symmetric > --- > > Key: HIVE-10621 > URL: https://issues.apache.org/jira/browse/HIVE-10621 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Alexander Pivovarov >Assignee: Alexander Pivovarov >Priority: Minor > > correct equals method implementation should start with > {code} > if (this == other) { > return true; > } > if (other == null || getClass() != other.getClass()) { > return false; > } > {code} > DecimalTypeInfo, PrimitiveTypeInfo, VarcharTypeInfo, CharTypeInfo, > HiveDecimalWritable equals method implementation starts with > {code} > if (other == null || !(other instanceof )) { > return false > } > {code} > - first of all check for null is redundant > - the second issue is that "other instanceof " check is not > symmetric. > contract of equals() implies that, a.equals(b) is true if and only if > b.equals(a) is true > Current implementation violates this contract. > e.g. > DecimalTypeInfo instanceof PrimitiveTypeInfo is true > but > PrimitiveTypeInfo instanceof DecimalTypeInfo is false > See more details here > http://stackoverflow.com/questions/6518534/equals-method-overrides-equals-in-superclass-and-may-not-be-symmetric -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10592) ORC file dump in JSON format
[ https://issues.apache.org/jira/browse/HIVE-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529907#comment-14529907 ] Prasanth Jayachandran commented on HIVE-10592: -- Added multifile support in the new patch. The output will now look like {code}./bin/hive --orcfiledump --json --pretty file:///app/warehouse/alltypes_bloom/00_0 file:///app/warehouse/alltypes_orc/00_0{code} {code} {"orcFileDumps": [ { "fileName": "file:\/\/\/app\/warehouse\/alltypes_bloom\/00_0", "fileVersion": "0.12", "writerVersion": "HIVE_8732", "numberOfRows": 3, "compression": "ZLIB", ... }, { "fileName": "file:\/\/\/app\/warehouse\/alltypes_orc\/00_0", "fileVersion": "0.12", "writerVersion": "HIVE_8732", "numberOfRows": 2, "compression": "ZLIB", ... } {code} > ORC file dump in JSON format > > > Key: HIVE-10592 > URL: https://issues.apache.org/jira/browse/HIVE-10592 > Project: Hive > Issue Type: New Feature >Affects Versions: 1.3.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-10592.1.patch, HIVE-10592.2.patch, > HIVE-10592.3.patch, HIVE-10592.4.patch > > > ORC file dump uses custom format. Will be useful to dump ORC metadata in json > format so that other tools can be built on top it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10592) ORC file dump in JSON format
[ https://issues.apache.org/jira/browse/HIVE-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-10592: - Attachment: HIVE-10592.4.patch > ORC file dump in JSON format > > > Key: HIVE-10592 > URL: https://issues.apache.org/jira/browse/HIVE-10592 > Project: Hive > Issue Type: New Feature >Affects Versions: 1.3.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-10592.1.patch, HIVE-10592.2.patch, > HIVE-10592.3.patch, HIVE-10592.4.patch > > > ORC file dump uses custom format. Will be useful to dump ORC metadata in json > format so that other tools can be built on top it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10563) MiniTezCliDriver tests ordering issues
[ https://issues.apache.org/jira/browse/HIVE-10563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-10563: - Attachment: HIVE-10563.2.patch > MiniTezCliDriver tests ordering issues > -- > > Key: HIVE-10563 > URL: https://issues.apache.org/jira/browse/HIVE-10563 > Project: Hive > Issue Type: Bug >Reporter: Hari Sankar Sivarama Subramaniyan >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-10563.1.patch, HIVE-10563.2.patch > > > There are a bunch of tests related to TestMiniTezCliDriver which gives > ordering issues when run on Centos/Windows/OSX -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10607) Combination of ReducesinkDedup + TopN optimization yields incorrect result if there are multiple GBY in reducer
[ https://issues.apache.org/jira/browse/HIVE-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529903#comment-14529903 ] Hive QA commented on HIVE-10607: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12730369/HIVE-10607.patch {color:red}ERROR:{color} -1 due to 25 failed/errored test(s), 8900 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_parts org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_unencrypted_tbl org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_with_different_encryption_keys org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_load_data_to_encrypted_tables org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_select_read_only_encrypted_tbl org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_disallow_transform org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_droppartition org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_sba_drop_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_alterpart_loc org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_limit_pushdown org.apache.hadoop.hive.ql.security.TestStorageBasedClientSideAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropDatabase org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropTable org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropView org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProviderWithACL.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbFailure org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbSuccess org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableFailure org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableSuccess org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessing org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessingCustomSetWhitelistAppend {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3746/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3746/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3746/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 25 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12730369 - PreCommit-HIVE-TRUNK-Build > Combination of ReducesinkDedup + TopN optimization yields incorrect result if > there are multiple GBY in reducer > --- > > Key: HIVE-10607 > URL: https://issues.apache.org/jira/browse/HIVE-10607 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer, Tez >Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.1.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Attachments: HIVE-10607.patch > > > {code:sql} > select ctinyint, count(cdouble) from (select ctinyint, cdouble from > alltypesorc group by ctinyint, cdouble) t1 group by ctinyint order by > ctinyint limit 20; > {code} > This gives different result set depending on which set of optimizations are > on. In particular in .q test environment following two invocations will give > you different result set: > {code} > * mvn test -Phadoop-2 -Dtest.output.overwrite=true > -Dtest=TestMiniTezCliDriver -Dqfile=test.q > -Dhive.optimize.reducededuplication.min.reducer=1 > -Dhive.limit.pushdown.memory.usage=0.3f > * mvn t
[jira] [Commented] (HIVE-8065) Support HDFS encryption functionality on Hive
[ https://issues.apache.org/jira/browse/HIVE-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529892#comment-14529892 ] Eugene Koifman commented on HIVE-8065: -- How come the move restriction is not an issue for something like Insert Overwrite tableEZ1 select * from tableEZ2 inner join tableEZ3? > Support HDFS encryption functionality on Hive > - > > Key: HIVE-8065 > URL: https://issues.apache.org/jira/browse/HIVE-8065 > Project: Hive > Issue Type: Improvement >Affects Versions: 0.13.1 >Reporter: Sergio Peña >Assignee: Sergio Peña > Labels: Hive-Scrum > > The new encryption support on HDFS makes Hive incompatible and unusable when > this feature is used. > HDFS encryption is designed so that an user can configure different > encryption zones (or directories) for multi-tenant environments. An > encryption zone has an exclusive encryption key, such as AES-128 or AES-256. > Because of security compliance, the HDFS does not allow to move/rename files > between encryption zones. Renames are allowed only inside the same encryption > zone. A copy is allowed between encryption zones. > See HDFS-6134 for more details about HDFS encryption design. > Hive currently uses a scratch directory (like /tmp/$user/$random). This > scratch directory is used for the output of intermediate data (between MR > jobs) and for the final output of the hive query which is later moved to the > table directory location. > If Hive tables are in different encryption zones than the scratch directory, > then Hive won't be able to renames those files/directories, and it will make > Hive unusable. > To handle this problem, we can change the scratch directory of the > query/statement to be inside the same encryption zone of the table directory > location. This way, the renaming process will be successful. > Also, for statements that move files between encryption zones (i.e. LOAD > DATA), a copy may be executed instead of a rename. This will cause an > overhead when copying large data files, but it won't break the encryption on > Hive. > Another security thing to consider is when using joins selects. If Hive joins > different tables with different encryption key strengths, then the results of > the select might break the security compliance of the tables. Let's say two > tables with 128 bits and 256 bits encryption are joined, then the temporary > results might be stored in the 128 bits encryption zone. This will conflict > with the table encrypted with 256 bits temporary. > To fix this, Hive should be able to select the scratch directory that is more > secured/encrypted in order to save the intermediate data temporary with no > compliance issues. > For instance: > {noformat} > SELECT * FROM table-aes128 t1 JOIN table-aes256 t2 WHERE t1.id == t2.id; > {noformat} > - This should use a scratch directory (or staging directory) inside the > table-aes256 table location. > {noformat} > INSERT OVERWRITE TABLE table-unencrypted SELECT * FROM table-aes1; > {noformat} > - This should use a scratch directory inside the table-aes1 location. > {noformat} > FROM table-unencrypted > INSERT OVERWRITE TABLE table-aes128 SELECT id, name > INSERT OVERWRITE TABLE table-aes256 SELECT id, name > {noformat} > - This should use a scratch directory on each of the tables locations. > - The first SELECT will have its scratch directory on table-aes128 directory. > - The second SELECT will have its scratch directory on table-aes256 directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10620) ZooKeeperHiveLock overrides equal() method but not hashcode()
[ https://issues.apache.org/jira/browse/HIVE-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang updated HIVE-10620: --- Attachment: HIVE-10620.patch [~szehon] [~ashutoshc] could you review the code? Thanks > ZooKeeperHiveLock overrides equal() method but not hashcode() > - > > Key: HIVE-10620 > URL: https://issues.apache.org/jira/browse/HIVE-10620 > Project: Hive > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Attachments: HIVE-10620.patch > > > ZooKeeperHiveLock overrides the public boolean equals(Object o) method but > does not for public int hashCode(). It violates the Java contract and may > cause unexpected results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10620) ZooKeeperHiveLock overrides equal() method but not hashcode()
[ https://issues.apache.org/jira/browse/HIVE-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang updated HIVE-10620: --- Description: ZooKeeperHiveLock overrides the public boolean equals(Object o) method but does not for public int hashCode(). It violates the Java contract and may cause unexpected results. (was: ZooKeeperHiveLock overrides the public boolean equals(Object o) method but does not for public int hashCode(). It violates the Java contract that equal and may cause unexpected results.) > ZooKeeperHiveLock overrides equal() method but not hashcode() > - > > Key: HIVE-10620 > URL: https://issues.apache.org/jira/browse/HIVE-10620 > Project: Hive > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > > ZooKeeperHiveLock overrides the public boolean equals(Object o) method but > does not for public int hashCode(). It violates the Java contract and may > cause unexpected results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10619) Fix ConcurrentHashMap.get in MetadataListStructObjectInspector.getInstance (52)
[ https://issues.apache.org/jira/browse/HIVE-10619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10619: --- Attachment: rb33878.patch patch #1 > Fix ConcurrentHashMap.get in MetadataListStructObjectInspector.getInstance > (52) > --- > > Key: HIVE-10619 > URL: https://issues.apache.org/jira/browse/HIVE-10619 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Alexander Pivovarov >Assignee: Alexander Pivovarov >Priority: Minor > Attachments: rb33878.patch > > > cached.get(columnNames) should be replaced with cached.get(key) in the code > block below > {code} > cached = new ConcurrentHashMap>, > MetadataListStructObjectInspector>(); > public static MetadataListStructObjectInspector getInstance( > List columnNames) { > ArrayList> key = new ArrayList>(1); > key.add(columnNames); > MetadataListStructObjectInspector result = cached.get(columnNames); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10618) Fix invocation of toString on byteArray in VerifyFast (250, 254)
[ https://issues.apache.org/jira/browse/HIVE-10618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10618: --- Attachment: rb33877.patch patch #1 > Fix invocation of toString on byteArray in VerifyFast (250, 254) > > > Key: HIVE-10618 > URL: https://issues.apache.org/jira/browse/HIVE-10618 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Alexander Pivovarov >Assignee: Alexander Pivovarov >Priority: Minor > Attachments: rb33877.patch > > > Arrays.toString(byteArray) can be used to convert byte[] to string -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10539) set default value of hive.repl.task.factory
[ https://issues.apache.org/jira/browse/HIVE-10539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529819#comment-14529819 ] Hive QA commented on HIVE-10539: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12730349/HIVE-10539.3.patch {color:red}ERROR:{color} -1 due to 24 failed/errored test(s), 8900 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_parts org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_unencrypted_tbl org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_with_different_encryption_keys org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_load_data_to_encrypted_tables org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_select_read_only_encrypted_tbl org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_disallow_transform org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_droppartition org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_sba_drop_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_alterpart_loc org.apache.hadoop.hive.ql.security.TestStorageBasedClientSideAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropDatabase org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropTable org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropView org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProviderWithACL.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbFailure org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbSuccess org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableFailure org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableSuccess org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessing org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessingCustomSetWhitelistAppend {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3745/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3745/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3745/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 24 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12730349 - PreCommit-HIVE-TRUNK-Build > set default value of hive.repl.task.factory > --- > > Key: HIVE-10539 > URL: https://issues.apache.org/jira/browse/HIVE-10539 > Project: Hive > Issue Type: Bug >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Attachments: HIVE-10539.1.patch, HIVE-10539.2.patch, > HIVE-10539.3.patch > > > hive.repl.task.factory does not have a default value set. It should be set to > org.apache.hive.hcatalog.api.repl.exim.EximReplicationTaskFactory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10538) Fix NPE in FileSinkOperator from hashcode mismatch
[ https://issues.apache.org/jira/browse/HIVE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Slawski updated HIVE-10538: - Attachment: HIVE-10538.2.patch I've attached the second revision of the patch which updates failed Spark qtests. > Fix NPE in FileSinkOperator from hashcode mismatch > -- > > Key: HIVE-10538 > URL: https://issues.apache.org/jira/browse/HIVE-10538 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 1.0.0, 1.2.0 >Reporter: Peter Slawski >Assignee: Peter Slawski >Priority: Critical > Fix For: 1.2.0, 1.3.0 > > Attachments: HIVE-10538.1.patch, HIVE-10538.1.patch, > HIVE-10538.1.patch, HIVE-10538.2.patch > > > A Null Pointer Exception occurs when in FileSinkOperator when using bucketed > tables and distribute by with multiFileSpray enabled. The following snippet > query reproduces this issue: > {code} > set hive.enforce.bucketing = true; > set hive.exec.reducers.max = 20; > create table bucket_a(key int, value_a string) clustered by (key) into 256 > buckets; > create table bucket_b(key int, value_b string) clustered by (key) into 256 > buckets; > create table bucket_ab(key int, value_a string, value_b string) clustered by > (key) into 256 buckets; > -- Insert data into bucket_a and bucket_b > insert overwrite table bucket_ab > select a.key, a.value_a, b.value_b from bucket_a a join bucket_b b on (a.key > = b.key) distribute by key; > {code} > The following stack trace is logged. > {code} > 2015-04-29 12:54:12,841 FATAL [pool-110-thread-1]: ExecReducer > (ExecReducer.java:reduce(255)) - > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row (tag=0) {"key":{},"value":{"_col0":"113","_col1":"val_113"}} > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) > at > org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:819) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:747) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235) > ... 8 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9743) Incorrect result set for vectorized left outer join
[ https://issues.apache.org/jira/browse/HIVE-9743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529775#comment-14529775 ] Matt McCline commented on HIVE-9743: [~vikram.dixit] Ok, SMB removed. I think this one is good to go as soon as the Apache tests pass. > Incorrect result set for vectorized left outer join > --- > > Key: HIVE-9743 > URL: https://issues.apache.org/jira/browse/HIVE-9743 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 0.14.0 >Reporter: N Campbell >Assignee: Matt McCline > Attachments: HIVE-9743.01.patch, HIVE-9743.02.patch, > HIVE-9743.03.patch, HIVE-9743.04.patch, HIVE-9743.05.patch, > HIVE-9743.06.patch, HIVE-9743.08.patch, HIVE-9743.09.patch > > > This query is supposed to return 3 rows and will when run without Tez but > returns 2 rows when run with Tez. > select tjoin1.rnum, tjoin1.c1, tjoin1.c2, tjoin2.c2 as c2j2 from tjoin1 left > outer join tjoin2 on ( tjoin1.c1 = tjoin2.c1 and tjoin1.c2 > 15 ) > tjoin1.rnum tjoin1.c1 tjoin1.c2 c2j2 > 1 20 25 > 2 50 > instead of > tjoin1.rnum tjoin1.c1 tjoin1.c2 c2j2 > 0 10 15 > 1 20 25 > 2 50 > create table if not exists TJOIN1 (RNUM int , C1 int, C2 int) > STORED AS orc ; > 0|10|15 > 1|20|25 > 2|\N|50 > create table if not exists TJOIN2 (RNUM int , C1 int, C2 char(2)) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' > STORED AS TEXTFILE ; > 0|10|BB > 1|15|DD > 2|\N|EE > 3|10|FF -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9743) Incorrect result set for vectorized left outer join
[ https://issues.apache.org/jira/browse/HIVE-9743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-9743: --- Attachment: HIVE-9743.09.patch > Incorrect result set for vectorized left outer join > --- > > Key: HIVE-9743 > URL: https://issues.apache.org/jira/browse/HIVE-9743 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 0.14.0 >Reporter: N Campbell >Assignee: Matt McCline > Attachments: HIVE-9743.01.patch, HIVE-9743.02.patch, > HIVE-9743.03.patch, HIVE-9743.04.patch, HIVE-9743.05.patch, > HIVE-9743.06.patch, HIVE-9743.08.patch, HIVE-9743.09.patch > > > This query is supposed to return 3 rows and will when run without Tez but > returns 2 rows when run with Tez. > select tjoin1.rnum, tjoin1.c1, tjoin1.c2, tjoin2.c2 as c2j2 from tjoin1 left > outer join tjoin2 on ( tjoin1.c1 = tjoin2.c1 and tjoin1.c2 > 15 ) > tjoin1.rnum tjoin1.c1 tjoin1.c2 c2j2 > 1 20 25 > 2 50 > instead of > tjoin1.rnum tjoin1.c1 tjoin1.c2 c2j2 > 0 10 15 > 1 20 25 > 2 50 > create table if not exists TJOIN1 (RNUM int , C1 int, C2 int) > STORED AS orc ; > 0|10|15 > 1|20|25 > 2|\N|50 > create table if not exists TJOIN2 (RNUM int , C1 int, C2 char(2)) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' > STORED AS TEXTFILE ; > 0|10|BB > 1|15|DD > 2|\N|EE > 3|10|FF -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10565) LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT OUTER JOIN repeated key correctly
[ https://issues.apache.org/jira/browse/HIVE-10565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-10565: Attachment: HIVE-10565.07.patch > LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT > OUTER JOIN repeated key correctly > > > Key: HIVE-10565 > URL: https://issues.apache.org/jira/browse/HIVE-10565 > Project: Hive > Issue Type: Sub-task > Components: Hive >Affects Versions: 1.2.0 >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Fix For: 1.2.0, 1.3.0 > > Attachments: HIVE-10565.01.patch, HIVE-10565.02.patch, > HIVE-10565.03.patch, HIVE-10565.04.patch, HIVE-10565.05.patch, > HIVE-10565.06.patch, HIVE-10565.07.patch > > > Filtering can knock out some of the rows for a repeated key, but those > knocked out rows need to be included in the LEFT OUTER JOIN result and are > currently not when only some rows are filtered out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10605) Make hive version number update automatically in webhcat-default.xml during hive tar generation
[ https://issues.apache.org/jira/browse/HIVE-10605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529749#comment-14529749 ] Hive QA commented on HIVE-10605: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12730340/HIVE-10605.patch {color:red}ERROR:{color} -1 due to 25 failed/errored test(s), 8900 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_parts org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_unencrypted_tbl org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_with_different_encryption_keys org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_load_data_to_encrypted_tables org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_select_read_only_encrypted_tbl org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_disallow_transform org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_droppartition org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_sba_drop_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_alterpart_loc org.apache.hadoop.hive.ql.security.TestStorageBasedClientSideAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropDatabase org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropTable org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropView org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProviderWithACL.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbFailure org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbSuccess org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableFailure org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableSuccess org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessing org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessingCustomSetWhitelistAppend org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3744/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3744/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3744/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 25 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12730340 - PreCommit-HIVE-TRUNK-Build > Make hive version number update automatically in webhcat-default.xml during > hive tar generation > --- > > Key: HIVE-10605 > URL: https://issues.apache.org/jira/browse/HIVE-10605 > Project: Hive > Issue Type: Bug > Components: WebHCat >Affects Versions: 1.3.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Fix For: 1.3.0 > > Attachments: HIVE-10605.patch > > > so we don't have to do HIVE-10604 on each release -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10595) Dropping a table can cause NPEs in the compactor
[ https://issues.apache.org/jira/browse/HIVE-10595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529725#comment-14529725 ] Eugene Koifman commented on HIVE-10595: --- I'm not sure I understand how this works. The Initiator (if the table/partition is no longer there) will not add anything to compaction queue. So then there is nothing for Worker/Cleaner to do in this case. How will data from TXNS, COMPLETED_TXN_COMPONENTS, TXN_COMPONENTS which relates to these table get cleaned up? > Dropping a table can cause NPEs in the compactor > > > Key: HIVE-10595 > URL: https://issues.apache.org/jira/browse/HIVE-10595 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 0.14.0, 1.0.0, 1.1.0 >Reporter: Alan Gates >Assignee: Alan Gates > Attachments: HIVE-10595.patch > > > Reproduction: > # start metastore with compactor off > # insert enough entries in a table to trigger a compaction > # drop the table > # stop metastore > # restart metastore with compactor on > Result: NPE in the compactor threads. I suspect this would also happen if > the inserts and drops were done in between a run of the compactor, but I > haven't proven it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName
[ https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9392: -- Attachment: HIVE-9392.4.patch > JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to > column names having duplicated fqColumnName > > > Key: HIVE-9392 > URL: https://issues.apache.org/jira/browse/HIVE-9392 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 0.14.0 >Reporter: Mostafa Mokhtar >Assignee: Pengcheng Xiong >Priority: Critical > Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch, HIVE-9392.3.patch, > HIVE-9392.4.patch > > > In JoinStatsRule.process the join column statistics are stored in HashMap > joinedColStats, the key used which is the ColStatistics.fqColName is > duplicated between join column in the same vertex, as a result distinctVals > ends up having duplicated values which negatively affects the join > cardinality estimation. > The duplicate keys are usually named KEY.reducesinkkey0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9534) incorrect result set for query that projects a windowed aggregate
[ https://issues.apache.org/jira/browse/HIVE-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529716#comment-14529716 ] Chaoyu Tang commented on HIVE-9534: --- Oracle 11.2 treats avg(distinct tsint.csint) over () as analytic function instead of aggregation function, so the query return 4 rows of returns 2.5. Note, there is not order by clause or window clause inside the parenthesis of "over". Could you try query like "select avg(distinct tsint.csint) over (order by rnum rows between 1 preceding and 1 following) from tsint" to see if it works in Oracle c12? It did not work in 11.2. > incorrect result set for query that projects a windowed aggregate > - > > Key: HIVE-9534 > URL: https://issues.apache.org/jira/browse/HIVE-9534 > Project: Hive > Issue Type: Bug > Components: SQL >Reporter: N Campbell >Assignee: Chaoyu Tang > > Result set returned by Hive has one row instead of 5 > {code} > select avg(distinct tsint.csint) over () from tsint > create table if not exists TSINT (RNUM int , CSINT smallint) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' > STORED AS TEXTFILE; > 0|\N > 1|-1 > 2|0 > 3|1 > 4|10 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9743) Incorrect result set for vectorized left outer join
[ https://issues.apache.org/jira/browse/HIVE-9743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529711#comment-14529711 ] Matt McCline commented on HIVE-9743: Given lack of time, I think I'll pull the SMB changes since the regular map join case repro is very clear. > Incorrect result set for vectorized left outer join > --- > > Key: HIVE-9743 > URL: https://issues.apache.org/jira/browse/HIVE-9743 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 0.14.0 >Reporter: N Campbell >Assignee: Matt McCline > Attachments: HIVE-9743.01.patch, HIVE-9743.02.patch, > HIVE-9743.03.patch, HIVE-9743.04.patch, HIVE-9743.05.patch, > HIVE-9743.06.patch, HIVE-9743.08.patch > > > This query is supposed to return 3 rows and will when run without Tez but > returns 2 rows when run with Tez. > select tjoin1.rnum, tjoin1.c1, tjoin1.c2, tjoin2.c2 as c2j2 from tjoin1 left > outer join tjoin2 on ( tjoin1.c1 = tjoin2.c1 and tjoin1.c2 > 15 ) > tjoin1.rnum tjoin1.c1 tjoin1.c2 c2j2 > 1 20 25 > 2 50 > instead of > tjoin1.rnum tjoin1.c1 tjoin1.c2 c2j2 > 0 10 15 > 1 20 25 > 2 50 > create table if not exists TJOIN1 (RNUM int , C1 int, C2 int) > STORED AS orc ; > 0|10|15 > 1|20|25 > 2|\N|50 > create table if not exists TJOIN2 (RNUM int , C1 int, C2 char(2)) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' > STORED AS TEXTFILE ; > 0|10|BB > 1|15|DD > 2|\N|EE > 3|10|FF -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10538) Fix NPE in FileSinkOperator from hashcode mismatch
[ https://issues.apache.org/jira/browse/HIVE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529707#comment-14529707 ] Peter Slawski commented on HIVE-10538: -- Great, I've been working on just that. I'll be able to posted an updated patch tomorrow. > Fix NPE in FileSinkOperator from hashcode mismatch > -- > > Key: HIVE-10538 > URL: https://issues.apache.org/jira/browse/HIVE-10538 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 1.0.0, 1.2.0 >Reporter: Peter Slawski >Assignee: Peter Slawski >Priority: Critical > Fix For: 1.2.0, 1.3.0 > > Attachments: HIVE-10538.1.patch, HIVE-10538.1.patch, > HIVE-10538.1.patch > > > A Null Pointer Exception occurs when in FileSinkOperator when using bucketed > tables and distribute by with multiFileSpray enabled. The following snippet > query reproduces this issue: > {code} > set hive.enforce.bucketing = true; > set hive.exec.reducers.max = 20; > create table bucket_a(key int, value_a string) clustered by (key) into 256 > buckets; > create table bucket_b(key int, value_b string) clustered by (key) into 256 > buckets; > create table bucket_ab(key int, value_a string, value_b string) clustered by > (key) into 256 buckets; > -- Insert data into bucket_a and bucket_b > insert overwrite table bucket_ab > select a.key, a.value_a, b.value_b from bucket_a a join bucket_b b on (a.key > = b.key) distribute by key; > {code} > The following stack trace is logged. > {code} > 2015-04-29 12:54:12,841 FATAL [pool-110-thread-1]: ExecReducer > (ExecReducer.java:reduce(255)) - > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row (tag=0) {"key":{},"value":{"_col0":"113","_col1":"val_113"}} > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) > at > org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:819) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:747) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235) > ... 8 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10591) Support limited integer type promotion in ORC
[ https://issues.apache.org/jira/browse/HIVE-10591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-10591: - Attachment: HIVE-10591.3.patch Added fix for acid test failures. > Support limited integer type promotion in ORC > - > > Key: HIVE-10591 > URL: https://issues.apache.org/jira/browse/HIVE-10591 > Project: Hive > Issue Type: New Feature >Affects Versions: 1.3.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-10591.1.patch, HIVE-10591.2.patch, > HIVE-10591.2.patch, HIVE-10591.3.patch > > > ORC currently does not support schema-on-read. If we alter an ORC table with > 'int' type to 'bigint' and if we query the altered table ClassCastException > will be thrown as the schema on read from table descriptor will expect > LongWritable whereas ORC will return IntWritable based on file schema stored > within ORC file. OrcSerde currently doesn't do any type conversions or type > promotions for performance reasons in inner loop. Since smallints, ints and > bigints are stored in the same way in ORC, it will be possible be allow such > type promotions without hurting performance. Following type promotions can be > supported without any casting > smallint -> int > smallint -> bigint > int -> bigint > Tinyint promotion is not possible without casting as tinyints are stored > using RLE byte writer whereas smallints, ints and bigints are stored using > RLE integer writer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10614) schemaTool upgrade from 0.14.0 to 1.3.0 causes failure
[ https://issues.apache.org/jira/browse/HIVE-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-10614: - Attachment: HIVE-10614.1.master.patch [~thejas] Thanks for the review, added HIVE-10614.1.master.patch for the master branch > schemaTool upgrade from 0.14.0 to 1.3.0 causes failure > -- > > Key: HIVE-10614 > URL: https://issues.apache.org/jira/browse/HIVE-10614 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Hari Sankar Sivarama Subramaniyan >Assignee: Hari Sankar Sivarama Subramaniyan >Priority: Critical > Attachments: HIVE-10614.1.master.patch, HIVE-10614.1.patch > > > ./schematool -dbType mysql -upgradeSchemaFrom 0.14.0 -verbose > {code} > ++--+ > | >| > ++--+ > | < HIVE-7018 Remove Table and Partition tables column LINK_TARGET_ID from > Mysql for other DBs do not have it > | > ++--+ > 1 row selected (0.004 seconds) > 0: jdbc:mysql://node-1.example.com/hive> DROP PROCEDURE IF EXISTS > RM_TLBS_LINKID > No rows affected (0.005 seconds) > 0: jdbc:mysql://node-1.example.com/hive> DROP PROCEDURE IF EXISTS > RM_PARTITIONS_LINKID > No rows affected (0.006 seconds) > 0: jdbc:mysql://node-1.example.com/hive> DROP PROCEDURE IF EXISTS RM_LINKID > No rows affected (0.002 seconds) > 0: jdbc:mysql://node-1.example.com/hive> CREATE PROCEDURE RM_TLBS_LINKID() > BEGIN IF EXISTS (SELECT * FROM `INFORMATION_SCHEMA`.`COLUMNS` WHERE > `TABLE_NAME` = 'TBLS' AND `COLUMN_NAME` = 'LINK_TARGET_ID') THEN ALTER TABLE > `TBLS` DROP FOREIGN KEY `TBLS_FK3` ; ALTER TABLE `TBLS` DROP KEY `TBLS_N51` ; > ALTER TABLE `TBLS` DROP COLUMN `LINK_TARGET_ID` ; END IF; END > Error: You have an error in your SQL syntax; check the manual that > corresponds to your MySQL server version for the right syntax to use near '' > at line 1 (state=42000,code=1064) > Closing: 0: jdbc:mysql://node-1.example.com/hive?createDatabaseIfNotExist=true > org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore > state would be inconsistent !! > org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore > state would be inconsistent !! > at > org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:229) > at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:468) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > Caused by: java.io.IOException: Schema script failed, errorcode 2 > at > org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:355) > at > org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:326) > at > org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:224) > {code} > Looks like HIVE-7018 has introduced stored procedure as part of mysql upgrade > script and it is causing issues with schematool upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10538) Fix NPE in FileSinkOperator from hashcode mismatch
[ https://issues.apache.org/jira/browse/HIVE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529690#comment-14529690 ] Prasanth Jayachandran commented on HIVE-10538: -- The result difference seems to be an expected change because of hashcode difference. [~petersla] Can you put an updated patch by running the tests again with "-Dtest.output.overwrite=true" option? This will overwrite the q.out files. > Fix NPE in FileSinkOperator from hashcode mismatch > -- > > Key: HIVE-10538 > URL: https://issues.apache.org/jira/browse/HIVE-10538 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 1.0.0, 1.2.0 >Reporter: Peter Slawski >Assignee: Peter Slawski >Priority: Critical > Fix For: 1.2.0, 1.3.0 > > Attachments: HIVE-10538.1.patch, HIVE-10538.1.patch, > HIVE-10538.1.patch > > > A Null Pointer Exception occurs when in FileSinkOperator when using bucketed > tables and distribute by with multiFileSpray enabled. The following snippet > query reproduces this issue: > {code} > set hive.enforce.bucketing = true; > set hive.exec.reducers.max = 20; > create table bucket_a(key int, value_a string) clustered by (key) into 256 > buckets; > create table bucket_b(key int, value_b string) clustered by (key) into 256 > buckets; > create table bucket_ab(key int, value_a string, value_b string) clustered by > (key) into 256 buckets; > -- Insert data into bucket_a and bucket_b > insert overwrite table bucket_ab > select a.key, a.value_a, b.value_b from bucket_a a join bucket_b b on (a.key > = b.key) distribute by key; > {code} > The following stack trace is logged. > {code} > 2015-04-29 12:54:12,841 FATAL [pool-110-thread-1]: ExecReducer > (ExecReducer.java:reduce(255)) - > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row (tag=0) {"key":{},"value":{"_col0":"113","_col1":"val_113"}} > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) > at > org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:819) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:747) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235) > ... 8 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10617) LLAP: allocator occasionally has a spurious failure to allocate due to "partitioned" locking and has to retry
[ https://issues.apache.org/jira/browse/HIVE-10617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-10617: Summary: LLAP: allocator occasionally has a spurious failure to allocate due to "partitioned" locking and has to retry (was: LLAP: fix allocator concurrency rarely causing spurious failure to allocate due to "partitioned" locking) > LLAP: allocator occasionally has a spurious failure to allocate due to > "partitioned" locking and has to retry > - > > Key: HIVE-10617 > URL: https://issues.apache.org/jira/browse/HIVE-10617 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > > See HIVE-10482 and the comment in code. Right now this is worked around by > retrying. > Simple case - thread can reserve memory from manager and bounce between > checking arena 1 and arena 2 for memory as other threads allocate and > deallocate from respective arenas in reverse order, making it look like > there's no memory. More importantly this can happen when buddy blocks are > split when lots of stuff is allocated. > This can be solved either with some form of helping (esp. for split case) or > by making allocator an "actor" (or set of actors, one per 1-N arenas that > they would own), to satisfy alloc requests more deterministically (and also > get rid of most sync). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10614) schemaTool upgrade from 0.14.0 to 1.3.0 causes failure
[ https://issues.apache.org/jira/browse/HIVE-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529684#comment-14529684 ] Thejas M Nair commented on HIVE-10614: -- +1 for current patch, it would work with 1.2 branch. We need another one for master (that also has similar change for hive-schema-1.3.0.mysql.sql) > schemaTool upgrade from 0.14.0 to 1.3.0 causes failure > -- > > Key: HIVE-10614 > URL: https://issues.apache.org/jira/browse/HIVE-10614 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Hari Sankar Sivarama Subramaniyan >Assignee: Hari Sankar Sivarama Subramaniyan >Priority: Critical > Attachments: HIVE-10614.1.patch > > > ./schematool -dbType mysql -upgradeSchemaFrom 0.14.0 -verbose > {code} > ++--+ > | >| > ++--+ > | < HIVE-7018 Remove Table and Partition tables column LINK_TARGET_ID from > Mysql for other DBs do not have it > | > ++--+ > 1 row selected (0.004 seconds) > 0: jdbc:mysql://node-1.example.com/hive> DROP PROCEDURE IF EXISTS > RM_TLBS_LINKID > No rows affected (0.005 seconds) > 0: jdbc:mysql://node-1.example.com/hive> DROP PROCEDURE IF EXISTS > RM_PARTITIONS_LINKID > No rows affected (0.006 seconds) > 0: jdbc:mysql://node-1.example.com/hive> DROP PROCEDURE IF EXISTS RM_LINKID > No rows affected (0.002 seconds) > 0: jdbc:mysql://node-1.example.com/hive> CREATE PROCEDURE RM_TLBS_LINKID() > BEGIN IF EXISTS (SELECT * FROM `INFORMATION_SCHEMA`.`COLUMNS` WHERE > `TABLE_NAME` = 'TBLS' AND `COLUMN_NAME` = 'LINK_TARGET_ID') THEN ALTER TABLE > `TBLS` DROP FOREIGN KEY `TBLS_FK3` ; ALTER TABLE `TBLS` DROP KEY `TBLS_N51` ; > ALTER TABLE `TBLS` DROP COLUMN `LINK_TARGET_ID` ; END IF; END > Error: You have an error in your SQL syntax; check the manual that > corresponds to your MySQL server version for the right syntax to use near '' > at line 1 (state=42000,code=1064) > Closing: 0: jdbc:mysql://node-1.example.com/hive?createDatabaseIfNotExist=true > org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore > state would be inconsistent !! > org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore > state would be inconsistent !! > at > org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:229) > at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:468) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > Caused by: java.io.IOException: Schema script failed, errorcode 2 > at > org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:355) > at > org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:326) > at > org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:224) > {code} > Looks like HIVE-7018 has introduced stored procedure as part of mysql upgrade > script and it is causing issues with schematool upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7018) Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others
[ https://issues.apache.org/jira/browse/HIVE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529676#comment-14529676 ] Thejas M Nair commented on HIVE-7018: - I think the change here was in the right direction, however it breaks the preferred way to upgrade hive (using schematool). This is a release blocker for 1.2.0. . A patch to revert the changes here has been uploaded to HIVE-10614 . I think we should go ahead with that, and reopen this jira after it is committed. Once the schematool/beeline breakage is fixed, this change can go back into hive. > Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but > not others > - > > Key: HIVE-7018 > URL: https://issues.apache.org/jira/browse/HIVE-7018 > Project: Hive > Issue Type: Bug >Reporter: Brock Noland >Assignee: Yongzhi Chen > Fix For: 1.2.0 > > Attachments: HIVE-7018.1.patch, HIVE-7018.2.patch > > > It appears that at least postgres and oracle do not have the LINK_TARGET_ID > column while mysql does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10538) Fix NPE in FileSinkOperator from hashcode mismatch
[ https://issues.apache.org/jira/browse/HIVE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529673#comment-14529673 ] Peter Slawski commented on HIVE-10538: -- The Spark driver failures are caused by this change. This would be expected if a row's hashcode affected its ordering in Spark. This patch makes it so that HiveKey's hashcode outputted from ReduceSinkOperator is no longer always multiplied by 31 (as explained previously). Also, for at least those failed qtests, the row ordering/output in the expected output differs across MapRed, Tez, and Spark. So, execution engine affects ordering. >From >[spark/groupby_complex_types_multi_single_reducer.q.out#L221|https://github.com/apache/hive/blob/master/ql/src/test/results/clientpositive/spark/groupby_complex_types_multi_single_reducer.q.out#L221] {code} POSTHOOK: query: SELECT DEST2.* FROM DEST2 POSTHOOK: type: QUERY POSTHOOK: Input: default@dest2 A masked pattern was here {"120":"val_120"} 2 {"129":"val_129"} 2 {"160":"val_160"} 1 {"26":"val_26"} 2 {"27":"val_27"} 1 {"288":"val_288"} 2 {"298":"val_298"} 3 {"30":"val_30"} 1 {"311":"val_311"} 3 {"74":"val_74"} 1 {code} >From >[groupby_complex_types_multi_single_reducer.q.out#L240|https://github.com/apache/hive/blob/master/ql/src/test/results/clientpositive/groupby_complex_types_multi_single_reducer.q.out#L240] {code} POSTHOOK: query: SELECT DEST2.* FROM DEST2 POSTHOOK: type: QUERY POSTHOOK: Input: default@dest2 A masked pattern was here {"0":"val_0"} 3 {"10":"val_10"} 1 {"100":"val_100"} 2 {"103":"val_103"} 2 {"104":"val_104"} 2 {"105":"val_105"} 1 {"11":"val_11"} 1 {"111":"val_111"} 1 {"113":"val_113"} 2 {"114":"val_114"} 1 {code} > Fix NPE in FileSinkOperator from hashcode mismatch > -- > > Key: HIVE-10538 > URL: https://issues.apache.org/jira/browse/HIVE-10538 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 1.0.0, 1.2.0 >Reporter: Peter Slawski >Assignee: Peter Slawski >Priority: Critical > Fix For: 1.2.0, 1.3.0 > > Attachments: HIVE-10538.1.patch, HIVE-10538.1.patch, > HIVE-10538.1.patch > > > A Null Pointer Exception occurs when in FileSinkOperator when using bucketed > tables and distribute by with multiFileSpray enabled. The following snippet > query reproduces this issue: > {code} > set hive.enforce.bucketing = true; > set hive.exec.reducers.max = 20; > create table bucket_a(key int, value_a string) clustered by (key) into 256 > buckets; > create table bucket_b(key int, value_b string) clustered by (key) into 256 > buckets; > create table bucket_ab(key int, value_a string, value_b string) clustered by > (key) into 256 buckets; > -- Insert data into bucket_a and bucket_b > insert overwrite table bucket_ab > select a.key, a.value_a, b.value_b from bucket_a a join bucket_b b on (a.key > = b.key) distribute by key; > {code} > The following stack trace is logged. > {code} > 2015-04-29 12:54:12,841 FATAL [pool-110-thread-1]: ExecReducer > (ExecReducer.java:reduce(255)) - > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row (tag=0) {"key":{},"value":{"_col0":"113","_col1":"val_113"}} > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) > at > org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:819) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:747) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235) > ... 8 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10617) LLAP: fix allocator concurrency rarely causing spurious failure to allocate due to "partitioned" locking
[ https://issues.apache.org/jira/browse/HIVE-10617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529664#comment-14529664 ] Sergey Shelukhin commented on HIVE-10617: - Will do this later. With 6 executors x 8 nodes I see an average of 0.5 allocation retries (not task retries) per 1000 tasks in a query reading entire lineitem from TPCH 1Tb scale. So it's annoying to have these retries, but not super important. > LLAP: fix allocator concurrency rarely causing spurious failure to allocate > due to "partitioned" locking > > > Key: HIVE-10617 > URL: https://issues.apache.org/jira/browse/HIVE-10617 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > > See HIVE-10482 and the comment in code. Right now this is worked around by > retrying. > Simple case - thread can reserve memory from manager and bounce between > checking arena 1 and arena 2 for memory as other threads allocate and > deallocate from respective arenas in reverse order, making it look like > there's no memory. More importantly this can happen when buddy blocks are > split when lots of stuff is allocated. > This can be solved either with some form of helping (esp. for split case) or > by making allocator an "actor" (or set of actors, one per 1-N arenas that > they would own), to satisfy alloc requests more deterministically (and also > get rid of most sync). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-10617) LLAP: fix allocator concurrency rarely causing spurious failure to allocate due to "partitioned" locking
[ https://issues.apache.org/jira/browse/HIVE-10617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HIVE-10617: --- Assignee: Sergey Shelukhin > LLAP: fix allocator concurrency rarely causing spurious failure to allocate > due to "partitioned" locking > > > Key: HIVE-10617 > URL: https://issues.apache.org/jira/browse/HIVE-10617 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > > See HIVE-10482 and the comment in code. Right now this is worked around by > retrying. > Simple case - thread can reserve memory from manager and bounce between > checking arena 1 and arena 2 for memory as other threads allocate and > deallocate from respective arenas in reverse order, making it look like > there's no memory. More importantly this can happen when buddy blocks are > split when lots of stuff is allocated. > This can be solved either with some form of helping (esp. for split case) or > by making allocator an "actor" (or set of actors, one per 1-N arenas that > they would own), to satisfy alloc requests more deterministically (and also > get rid of most sync). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10617) LLAP: fix allocator concurrency rarely causing spurious failure to allocate due to "partitioned" locking
[ https://issues.apache.org/jira/browse/HIVE-10617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-10617: Description: See HIVE-10482 and the comment in code. Right now this is worked around by retrying. Simple case - thread can reserve memory from manager and bounce between checking arena 1 and arena 2 for memory as other threads allocate and deallocate from respective arenas in reverse order, making it look like there's no memory. More importantly this can happen when buddy blocks are split when lots of stuff is allocated. This can be solved either with some form of helping (esp. for split case) or by making allocator an "actor" (or set of actors, one per 1-N arenas that they would own), to satisfy alloc requests more deterministically (and also get rid of most sync). was: See HIVE-10482 and the comment in code. Simple case - thread can reserve memory from manager and bounce between checking arena 1 and arena 2 for memory as other threads allocate and deallocate from respective arenas in reverse order, making it look like there's no memory. More importantly this can happen when buddy blocks are split when lots of stuff is allocated. This can be solved either with some form of helping (esp. for split case) or by making allocator an "actor" (or set of actors, one per 1-N arenas that they would own), to satisfy alloc requests more deterministically (and also get rid of most sync). > LLAP: fix allocator concurrency rarely causing spurious failure to allocate > due to "partitioned" locking > > > Key: HIVE-10617 > URL: https://issues.apache.org/jira/browse/HIVE-10617 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin > > See HIVE-10482 and the comment in code. Right now this is worked around by > retrying. > Simple case - thread can reserve memory from manager and bounce between > checking arena 1 and arena 2 for memory as other threads allocate and > deallocate from respective arenas in reverse order, making it look like > there's no memory. More importantly this can happen when buddy blocks are > split when lots of stuff is allocated. > This can be solved either with some form of helping (esp. for split case) or > by making allocator an "actor" (or set of actors, one per 1-N arenas that > they would own), to satisfy alloc requests more deterministically (and also > get rid of most sync). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10617) LLAP: fix allocator concurrency rarely causing spurious failure to allocate due to "partitioned" locking
[ https://issues.apache.org/jira/browse/HIVE-10617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-10617: Summary: LLAP: fix allocator concurrency rarely causing spurious failure to allocate due to "partitioned" locking (was: fix allocator concurrency rarely causing spurious failure to allocate due to "partitioned" locking) > LLAP: fix allocator concurrency rarely causing spurious failure to allocate > due to "partitioned" locking > > > Key: HIVE-10617 > URL: https://issues.apache.org/jira/browse/HIVE-10617 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin > > See HIVE-10482 and the comment in code. > Simple case - thread can reserve memory from manager and bounce between > checking arena 1 and arena 2 for memory as other threads allocate and > deallocate from respective arenas in reverse order, making it look like > there's no memory. More importantly this can happen when buddy blocks are > split when lots of stuff is allocated. > This can be solved either with some form of helping (esp. for split case) or > by making allocator an "actor" (or set of actors, one per 1-N arenas that > they would own), to satisfy alloc requests more deterministically (and also > get rid of most sync). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10614) schemaTool upgrade from 0.14.0 to 1.3.0 causes failure
[ https://issues.apache.org/jira/browse/HIVE-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529657#comment-14529657 ] Hive QA commented on HIVE-10614: {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12730668/HIVE-10614.1.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-METASTORE-Test/43/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-METASTORE-Test/43/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-METASTORE-Test-43/ This message is automatically generated. ATTACHMENT ID: 12730668 - PreCommit-HIVE-METASTORE-Test > schemaTool upgrade from 0.14.0 to 1.3.0 causes failure > -- > > Key: HIVE-10614 > URL: https://issues.apache.org/jira/browse/HIVE-10614 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Hari Sankar Sivarama Subramaniyan >Assignee: Hari Sankar Sivarama Subramaniyan >Priority: Critical > Attachments: HIVE-10614.1.patch > > > ./schematool -dbType mysql -upgradeSchemaFrom 0.14.0 -verbose > {code} > ++--+ > | >| > ++--+ > | < HIVE-7018 Remove Table and Partition tables column LINK_TARGET_ID from > Mysql for other DBs do not have it > | > ++--+ > 1 row selected (0.004 seconds) > 0: jdbc:mysql://node-1.example.com/hive> DROP PROCEDURE IF EXISTS > RM_TLBS_LINKID > No rows affected (0.005 seconds) > 0: jdbc:mysql://node-1.example.com/hive> DROP PROCEDURE IF EXISTS > RM_PARTITIONS_LINKID > No rows affected (0.006 seconds) > 0: jdbc:mysql://node-1.example.com/hive> DROP PROCEDURE IF EXISTS RM_LINKID > No rows affected (0.002 seconds) > 0: jdbc:mysql://node-1.example.com/hive> CREATE PROCEDURE RM_TLBS_LINKID() > BEGIN IF EXISTS (SELECT * FROM `INFORMATION_SCHEMA`.`COLUMNS` WHERE > `TABLE_NAME` = 'TBLS' AND `COLUMN_NAME` = 'LINK_TARGET_ID') THEN ALTER TABLE > `TBLS` DROP FOREIGN KEY `TBLS_FK3` ; ALTER TABLE `TBLS` DROP KEY `TBLS_N51` ; > ALTER TABLE `TBLS` DROP COLUMN `LINK_TARGET_ID` ; END IF; END > Error: You have an error in your SQL syntax; check the manual that > corresponds to your MySQL server version for the right syntax to use near '' > at line 1 (state=42000,code=1064) > Closing: 0: jdbc:mysql://node-1.example.com/hive?createDatabaseIfNotExist=true > org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore > state would be inconsistent !! > org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore > state would be inconsistent !! > at > org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:229) > at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:468) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > Caused by: java.io.IOException: Schema script failed, errorcode 2 > at > org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:355) > at > org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:326) > at > org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:224) > {code} > Looks like HIVE-7018 has introduced stored procedure as part of mysql upgrade > script and it is causing issues with schematool upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-10482) LLAP: AsertionError cannot allocate when reading from orc
[ https://issues.apache.org/jira/browse/HIVE-10482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin resolved HIVE-10482. - Resolution: Fixed committed a workaround > LLAP: AsertionError cannot allocate when reading from orc > - > > Key: HIVE-10482 > URL: https://issues.apache.org/jira/browse/HIVE-10482 > Project: Hive > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Sergey Shelukhin > Fix For: llap > > > This was from a run of tpch query 1. [~sershe] - not sure if you've already > seen this. Creating a jira so that it doesn't get lost. > {code} > 2015-04-24 13:11:54,180 > [TezTaskRunner_attempt_1429683757595_0326_4_00_000199_0(container_1_0326_01_003216_sseth_20150424131137_8ec6200c-77c8-43ea-a6a3-a0ab1da6e1ac:4_Map > 1_199_0)] ERROR org.apache.hadoop.hive.ql.exec.tez.TezProcessor: > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: > java.io.IOException: java.lang.AssertionError: Cannot allocate > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:74) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:314) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:329) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: java.io.IOException: > java.lang.AssertionError: Cannot allocate > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355) > at > org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) > at > org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:137) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:62) > ... 16 more > Caused by: java.io.IOException: java.lang.AssertionError: Cannot allocate > at > org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.rethrowErrorIfAny(LlapInputFormat.java:257) > at > org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.nextCvb(LlapInputFormat.java:209) > at > org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:147) > at > org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:97) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) > ... 22 more > Caused by: java.lang.AssertionError: Cannot allocate > at > org.apache.hadoop.hive.ql.io.orc.InStream.readEncodedStream(InStream.java:761) > at > org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:441) > at > org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataRea
[jira] [Updated] (HIVE-10614) schemaTool upgrade from 0.14.0 to 1.3.0 causes failure
[ https://issues.apache.org/jira/browse/HIVE-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-10614: - Attachment: HIVE-10614.1.patch This happens because schematool runs via beeline and when there is a ";" in the command, beeline interprets it as the command terminator. Stored procedures use ";" as delimiter between statements, thus the entire stored procedure does not get send to mysql as a single command and hence the above error. I am uploading a patch to back out the fix for HIVE-7018 for now. Once we have the fix for HIVE-7018 working with schematool, we can add them back. The task mentioned in the previous line can be done via a follow up jira. cc-ing [~sushanth], [~thejas] for reviewing the change. Thanks Hari > schemaTool upgrade from 0.14.0 to 1.3.0 causes failure > -- > > Key: HIVE-10614 > URL: https://issues.apache.org/jira/browse/HIVE-10614 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Hari Sankar Sivarama Subramaniyan >Assignee: Hari Sankar Sivarama Subramaniyan >Priority: Critical > Attachments: HIVE-10614.1.patch > > > ./schematool -dbType mysql -upgradeSchemaFrom 0.14.0 -verbose > {code} > ++--+ > | >| > ++--+ > | < HIVE-7018 Remove Table and Partition tables column LINK_TARGET_ID from > Mysql for other DBs do not have it > | > ++--+ > 1 row selected (0.004 seconds) > 0: jdbc:mysql://node-1.example.com/hive> DROP PROCEDURE IF EXISTS > RM_TLBS_LINKID > No rows affected (0.005 seconds) > 0: jdbc:mysql://node-1.example.com/hive> DROP PROCEDURE IF EXISTS > RM_PARTITIONS_LINKID > No rows affected (0.006 seconds) > 0: jdbc:mysql://node-1.example.com/hive> DROP PROCEDURE IF EXISTS RM_LINKID > No rows affected (0.002 seconds) > 0: jdbc:mysql://node-1.example.com/hive> CREATE PROCEDURE RM_TLBS_LINKID() > BEGIN IF EXISTS (SELECT * FROM `INFORMATION_SCHEMA`.`COLUMNS` WHERE > `TABLE_NAME` = 'TBLS' AND `COLUMN_NAME` = 'LINK_TARGET_ID') THEN ALTER TABLE > `TBLS` DROP FOREIGN KEY `TBLS_FK3` ; ALTER TABLE `TBLS` DROP KEY `TBLS_N51` ; > ALTER TABLE `TBLS` DROP COLUMN `LINK_TARGET_ID` ; END IF; END > Error: You have an error in your SQL syntax; check the manual that > corresponds to your MySQL server version for the right syntax to use near '' > at line 1 (state=42000,code=1064) > Closing: 0: jdbc:mysql://node-1.example.com/hive?createDatabaseIfNotExist=true > org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore > state would be inconsistent !! > org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore > state would be inconsistent !! > at > org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:229) > at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:468) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > Caused by: java.io.IOException: Schema script failed, errorcode 2 > at > org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:355) > at > org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:326) > at > org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:224) > {code} > Looks like HIVE-7018 has introduced stored procedure as part of mysql upgrade > script and it is causing issues with schematool upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10604) update webhcat-default.xml with 1.2 version numbers
[ https://issues.apache.org/jira/browse/HIVE-10604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529633#comment-14529633 ] Hive QA commented on HIVE-10604: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12730316/HIVE-10604.patch {color:red}ERROR:{color} -1 due to 24 failed/errored test(s), 8900 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_parts org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_unencrypted_tbl org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_with_different_encryption_keys org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_load_data_to_encrypted_tables org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_select_read_only_encrypted_tbl org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_disallow_transform org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_droppartition org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_sba_drop_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_alterpart_loc org.apache.hadoop.hive.ql.security.TestStorageBasedClientSideAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropDatabase org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropTable org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropView org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProviderWithACL.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbFailure org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbSuccess org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableFailure org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableSuccess org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessing org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessingCustomSetWhitelistAppend {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3743/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3743/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3743/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 24 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12730316 - PreCommit-HIVE-TRUNK-Build > update webhcat-default.xml with 1.2 version numbers > --- > > Key: HIVE-10604 > URL: https://issues.apache.org/jira/browse/HIVE-10604 > Project: Hive > Issue Type: Bug > Components: WebHCat >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Minor > Fix For: 1.2.0 > > Attachments: HIVE-10604.patch > > > no precommit tests -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10616) TypeInfoUtils doesn't handle DECIMAL with just precision specified
[ https://issues.apache.org/jira/browse/HIVE-10616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Friedrich updated HIVE-10616: Attachment: HIVE-10616.1.patch > TypeInfoUtils doesn't handle DECIMAL with just precision specified > -- > > Key: HIVE-10616 > URL: https://issues.apache.org/jira/browse/HIVE-10616 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 1.0.0 >Reporter: Thomas Friedrich >Assignee: Thomas Friedrich >Priority: Minor > Attachments: HIVE-10616.1.patch > > > The parseType method in TypeInfoUtils doesn't handle decimal types with just > precision specified although that's a valid type definition. > As a result, TypeInfoUtils.getTypeInfoFromTypeString will always return > decimal(10,0) for any decimal() string. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6679) HiveServer2 should support configurable the server side socket timeout and keepalive for various transports types where applicable
[ https://issues.apache.org/jira/browse/HIVE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529631#comment-14529631 ] Thejas M Nair commented on HIVE-6679: - +1 . Just a minor comment. Can you also update the description of HIVE_SERVER2_TCP_SOCKET_BLOCKING_TIMEOUT to say that its applicable only in binary mode, and for http mode, the equivalent is hive.server2.thrift.http.max.idle.time? > HiveServer2 should support configurable the server side socket timeout and > keepalive for various transports types where applicable > -- > > Key: HIVE-6679 > URL: https://issues.apache.org/jira/browse/HIVE-6679 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0 >Reporter: Prasad Mujumdar >Assignee: Navis > Labels: TODOC1.0, TODOC15 > Fix For: 1.2.0 > > Attachments: HIVE-6679.1.patch.txt, HIVE-6679.2.patch.txt, > HIVE-6679.3.patch, HIVE-6679.4.patch, HIVE-6679.5.patch, HIVE-6679.6.patch > > > HiveServer2 should support configurable the server side socket read timeout > and TCP keep-alive option. Metastore server already support this (and the so > is the old hive server). > We now have multiple client connectivity options like Kerberos, Delegation > Token (Digest-MD5), Plain SASL, Plain SASL with SSL and raw sockets. The > configuration should be applicable to all types (if possible). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName
[ https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529626#comment-14529626 ] Pengcheng Xiong commented on HIVE-9392: --- rename the patch to get QA run. > JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to > column names having duplicated fqColumnName > > > Key: HIVE-9392 > URL: https://issues.apache.org/jira/browse/HIVE-9392 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 0.14.0 >Reporter: Mostafa Mokhtar >Assignee: Prasanth Jayachandran >Priority: Critical > Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch, HIVE-9392.3.patch > > > In JoinStatsRule.process the join column statistics are stored in HashMap > joinedColStats, the key used which is the ColStatistics.fqColName is > duplicated between join column in the same vertex, as a result distinctVals > ends up having duplicated values which negatively affects the join > cardinality estimation. > The duplicate keys are usually named KEY.reducesinkkey0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName
[ https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9392: -- Attachment: HIVE-9392.3.patch > JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to > column names having duplicated fqColumnName > > > Key: HIVE-9392 > URL: https://issues.apache.org/jira/browse/HIVE-9392 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 0.14.0 >Reporter: Mostafa Mokhtar >Assignee: Prasanth Jayachandran >Priority: Critical > Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch, HIVE-9392.3.patch > > > In JoinStatsRule.process the join column statistics are stored in HashMap > joinedColStats, the key used which is the ColStatistics.fqColName is > duplicated between join column in the same vertex, as a result distinctVals > ends up having duplicated values which negatively affects the join > cardinality estimation. > The duplicate keys are usually named KEY.reducesinkkey0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName
[ https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9392: -- Attachment: (was: HIVE-9392.01.patch) > JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to > column names having duplicated fqColumnName > > > Key: HIVE-9392 > URL: https://issues.apache.org/jira/browse/HIVE-9392 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 0.14.0 >Reporter: Mostafa Mokhtar >Assignee: Prasanth Jayachandran >Priority: Critical > Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch, HIVE-9392.3.patch > > > In JoinStatsRule.process the join column statistics are stored in HashMap > joinedColStats, the key used which is the ColStatistics.fqColName is > duplicated between join column in the same vertex, as a result distinctVals > ends up having duplicated values which negatively affects the join > cardinality estimation. > The duplicate keys are usually named KEY.reducesinkkey0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName
[ https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529623#comment-14529623 ] Pengcheng Xiong commented on HIVE-9392: --- [~mmokhtar], could you please take a look? Thanks. > JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to > column names having duplicated fqColumnName > > > Key: HIVE-9392 > URL: https://issues.apache.org/jira/browse/HIVE-9392 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 0.14.0 >Reporter: Mostafa Mokhtar >Assignee: Prasanth Jayachandran >Priority: Critical > Attachments: HIVE-9392.01.patch, HIVE-9392.1.patch, HIVE-9392.2.patch > > > In JoinStatsRule.process the join column statistics are stored in HashMap > joinedColStats, the key used which is the ColStatistics.fqColName is > duplicated between join column in the same vertex, as a result distinctVals > ends up having duplicated values which negatively affects the join > cardinality estimation. > The duplicate keys are usually named KEY.reducesinkkey0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName
[ https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9392: -- Attachment: HIVE-9392.01.patch After discussing with [~jpullokkaran], we assume that this patch will solve the problem. And we already tried TPCDS 70,89 to confirm > JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to > column names having duplicated fqColumnName > > > Key: HIVE-9392 > URL: https://issues.apache.org/jira/browse/HIVE-9392 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 0.14.0 >Reporter: Mostafa Mokhtar >Assignee: Prasanth Jayachandran >Priority: Critical > Attachments: HIVE-9392.01.patch, HIVE-9392.1.patch, HIVE-9392.2.patch > > > In JoinStatsRule.process the join column statistics are stored in HashMap > joinedColStats, the key used which is the ColStatistics.fqColName is > duplicated between join column in the same vertex, as a result distinctVals > ends up having duplicated values which negatively affects the join > cardinality estimation. > The duplicate keys are usually named KEY.reducesinkkey0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9534) incorrect result set for query that projects a windowed aggregate
[ https://issues.apache.org/jira/browse/HIVE-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529608#comment-14529608 ] N Campbell commented on HIVE-9534: -- re your comment about ORACLE select avg(distinct tsint.csint) over () from tsint null, -1, 0, 1, 10 ORACLE Oracle Database 12c Enterprise Edition ( 12.1.0.2.0) returns 2.5, 2.5, 2.5, 2.5, 2.5 > incorrect result set for query that projects a windowed aggregate > - > > Key: HIVE-9534 > URL: https://issues.apache.org/jira/browse/HIVE-9534 > Project: Hive > Issue Type: Bug > Components: SQL >Reporter: N Campbell >Assignee: Chaoyu Tang > > Result set returned by Hive has one row instead of 5 > {code} > select avg(distinct tsint.csint) over () from tsint > create table if not exists TSINT (RNUM int , CSINT smallint) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' > STORED AS TEXTFILE; > 0|\N > 1|-1 > 2|0 > 3|1 > 4|10 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10213) MapReduce jobs using dynamic-partitioning fail on commit.
[ https://issues.apache.org/jira/browse/HIVE-10213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-10213: Attachment: MapRedExample.java Thanks for the commit, [~sushanth]. I just verified this fix again with the YHive-13 and trunk, using the attached program. (The code reshuffles the specified 'source' data into a differently partitioned 'target' table, using dynamic partitioning.) Here's the exception-trace for the bug. I've verified that the partitioning happens correctly with this patch applied: {code} Error: java.io.IOException: No callback registered for TaskAttemptID:attempt_1428474791204_201112_m_00_0@hdfs://crystalmyth.myth.net:8020/tmp/myth/mythdb/foobar_partitioned_dt_grid/_DYN0.6055391511914422/dt=__HIVE_DEFAULT_PARTITION__/grid=__HIVE_DEFAULT_PARTITION__ at org.apache.hive.hcatalog.mapreduce.TaskCommitContextRegistry.commitTask(TaskCommitContextRegistry.java:74) at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitTask(FileOutputCommitterContainer.java:143) at org.apache.hadoop.mapred.Task.commit(Task.java:1163) at org.apache.hadoop.mapred.Task.done(Task.java:1025) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:345) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) {code} > MapReduce jobs using dynamic-partitioning fail on commit. > - > > Key: HIVE-10213 > URL: https://issues.apache.org/jira/browse/HIVE-10213 > Project: Hive > Issue Type: Bug > Components: HCatalog >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > Fix For: 1.2.0 > > Attachments: HIVE-10213.1.patch, MapRedExample.java > > > I recently ran into a problem in {{TaskCommitContextRegistry}}, when using > dynamic-partitions. > Consider a MapReduce program that reads HCatRecords from a table (using > HCatInputFormat), and then writes to another table (with identical schema), > using HCatOutputFormat. The Map-task fails with the following exception: > {code} > Error: java.io.IOException: No callback registered for > TaskAttemptID:attempt_1426589008676_509707_m_00_0@hdfs://crystalmyth.myth.net:8020/user/mithunr/mythdb/target/_DYN0.6784154320609959/grid=__HIVE_DEFAULT_PARTITION__/dt=__HIVE_DEFAULT_PARTITION__ > at > org.apache.hive.hcatalog.mapreduce.TaskCommitContextRegistry.commitTask(TaskCommitContextRegistry.java:56) > at > org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitTask(FileOutputCommitterContainer.java:139) > at org.apache.hadoop.mapred.Task.commit(Task.java:1163) > at org.apache.hadoop.mapred.Task.done(Task.java:1025) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:345) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > {code} > {{TaskCommitContextRegistry::commitTask()}} uses call-backs registered from > {{DynamicPartitionFileRecordWriter}}. But in case {{HCatInputFormat}} and > {{HCatOutputFormat}} are both used in the same job, the > {{DynamicPartitionFileRecordWriter}} might only be exercised in the Reducer. > I'm relaxing the IOException, and log a warning message instead of just > failing. > (I'll post the fix shortly.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6679) HiveServer2 should support configurable the server side socket timeout and keepalive for various transports types where applicable
[ https://issues.apache.org/jira/browse/HIVE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-6679: --- Fix Version/s: (was: 1.1.0) 1.2.0 > HiveServer2 should support configurable the server side socket timeout and > keepalive for various transports types where applicable > -- > > Key: HIVE-6679 > URL: https://issues.apache.org/jira/browse/HIVE-6679 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0 >Reporter: Prasad Mujumdar >Assignee: Navis > Labels: TODOC1.0, TODOC15 > Fix For: 1.2.0 > > Attachments: HIVE-6679.1.patch.txt, HIVE-6679.2.patch.txt, > HIVE-6679.3.patch, HIVE-6679.4.patch, HIVE-6679.5.patch, HIVE-6679.6.patch > > > HiveServer2 should support configurable the server side socket read timeout > and TCP keep-alive option. Metastore server already support this (and the so > is the old hive server). > We now have multiple client connectivity options like Kerberos, Delegation > Token (Digest-MD5), Plain SASL, Plain SASL with SSL and raw sockets. The > configuration should be applicable to all types (if possible). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6679) HiveServer2 should support configurable the server side socket timeout and keepalive for various transports types where applicable
[ https://issues.apache.org/jira/browse/HIVE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-6679: --- Affects Version/s: 1.1.0 1.0.0 > HiveServer2 should support configurable the server side socket timeout and > keepalive for various transports types where applicable > -- > > Key: HIVE-6679 > URL: https://issues.apache.org/jira/browse/HIVE-6679 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0 >Reporter: Prasad Mujumdar >Assignee: Navis > Labels: TODOC1.0, TODOC15 > Fix For: 1.2.0 > > Attachments: HIVE-6679.1.patch.txt, HIVE-6679.2.patch.txt, > HIVE-6679.3.patch, HIVE-6679.4.patch, HIVE-6679.5.patch, HIVE-6679.6.patch > > > HiveServer2 should support configurable the server side socket read timeout > and TCP keep-alive option. Metastore server already support this (and the so > is the old hive server). > We now have multiple client connectivity options like Kerberos, Delegation > Token (Digest-MD5), Plain SASL, Plain SASL with SSL and raw sockets. The > configuration should be applicable to all types (if possible). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6679) HiveServer2 should support configurable the server side socket timeout and keepalive for various transports types where applicable
[ https://issues.apache.org/jira/browse/HIVE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-6679: --- Affects Version/s: 1.2.0 > HiveServer2 should support configurable the server side socket timeout and > keepalive for various transports types where applicable > -- > > Key: HIVE-6679 > URL: https://issues.apache.org/jira/browse/HIVE-6679 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0 >Reporter: Prasad Mujumdar >Assignee: Navis > Labels: TODOC1.0, TODOC15 > Fix For: 1.2.0 > > Attachments: HIVE-6679.1.patch.txt, HIVE-6679.2.patch.txt, > HIVE-6679.3.patch, HIVE-6679.4.patch, HIVE-6679.5.patch, HIVE-6679.6.patch > > > HiveServer2 should support configurable the server side socket read timeout > and TCP keep-alive option. Metastore server already support this (and the so > is the old hive server). > We now have multiple client connectivity options like Kerberos, Delegation > Token (Digest-MD5), Plain SASL, Plain SASL with SSL and raw sockets. The > configuration should be applicable to all types (if possible). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-10547) CBO (Calcite Return Path) : genFileSinkPlan uses wrong partition col to create FS
[ https://issues.apache.org/jira/browse/HIVE-10547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong resolved HIVE-10547. Resolution: Fixed > CBO (Calcite Return Path) : genFileSinkPlan uses wrong partition col to > create FS > - > > Key: HIVE-10547 > URL: https://issues.apache.org/jira/browse/HIVE-10547 > Project: Hive > Issue Type: Sub-task > Components: CBO >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Fix For: 1.2.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HIVE-10564) webhcat should use webhcat-site.xml properties for controller job submission
[ https://issues.apache.org/jira/browse/HIVE-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman reopened HIVE-10564: --- Unfortunately this has unexpected side effects. Every time a job is submitted, various properties are passed in cmd line using -Dfoo=bar This change causes AppConfig Configuration object to accumulate the union of all these properties so Job N+1 includes properties that belong previous jobs. for example, if you run a job with "-D, templeton.statusdir=TestSqoop_1" and then another job that does not specify "statusdir", the 2nd job will write to TestSqoop_1 this will cause a major problem > webhcat should use webhcat-site.xml properties for controller job submission > > > Key: HIVE-10564 > URL: https://issues.apache.org/jira/browse/HIVE-10564 > Project: Hive > Issue Type: Bug >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Labels: TODOC1.2 > Fix For: 1.2.0 > > Attachments: HIVE-10564.1.patch > > > webhcat should use webhcat-site.xml in configuration for the > TempletonController map-only job that it launches. This will allow users to > set any MR/hdfs properties that want to see used for the controller job. > NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10526) CBO (Calcite Return Path): HiveCost epsilon comparison should take row count in to account
[ https://issues.apache.org/jira/browse/HIVE-10526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-10526: Attachment: HIVE-10526.2.patch Reuploading .1.patch as .2.patch > CBO (Calcite Return Path): HiveCost epsilon comparison should take row count > in to account > -- > > Key: HIVE-10526 > URL: https://issues.apache.org/jira/browse/HIVE-10526 > Project: Hive > Issue Type: Sub-task > Components: CBO >Affects Versions: 0.12.0 >Reporter: Laljo John Pullokkaran >Assignee: Laljo John Pullokkaran > Fix For: 1.2.0 > > Attachments: HIVE-10526.1.patch, HIVE-10526.2.patch, HIVE-10526.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10526) CBO (Calcite Return Path): HiveCost epsilon comparison should take row count in to account
[ https://issues.apache.org/jira/browse/HIVE-10526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529584#comment-14529584 ] Sushanth Sowmyan commented on HIVE-10526: - I don't see this picked up in the test commit queue, and it's possible it'll fail out saying it's already processed this file, so I'm going to re-upload .1.patch as .2.patch and manually submit this into the queue. > CBO (Calcite Return Path): HiveCost epsilon comparison should take row count > in to account > -- > > Key: HIVE-10526 > URL: https://issues.apache.org/jira/browse/HIVE-10526 > Project: Hive > Issue Type: Sub-task > Components: CBO >Affects Versions: 0.12.0 >Reporter: Laljo John Pullokkaran >Assignee: Laljo John Pullokkaran > Fix For: 1.2.0 > > Attachments: HIVE-10526.1.patch, HIVE-10526.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9451) Add max size of column dictionaries to ORC metadata
[ https://issues.apache.org/jira/browse/HIVE-9451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529580#comment-14529580 ] Sushanth Sowmyan commented on HIVE-9451: Okay, thanks for the update. Will wait to hear more. :) > Add max size of column dictionaries to ORC metadata > --- > > Key: HIVE-9451 > URL: https://issues.apache.org/jira/browse/HIVE-9451 > Project: Hive > Issue Type: Improvement >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Labels: ORC > Fix For: 1.2.0 > > Attachments: HIVE-9451.patch, HIVE-9451.patch > > > To predict the amount of memory required to read an ORC file we need to know > the size of the dictionaries for the columns that we are reading. I propose > adding the number of bytes for each column's dictionary to the stripe's > column statistics. The file's column statistics would have the maximum > dictionary size for each column. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10565) LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT OUTER JOIN repeated key correctly
[ https://issues.apache.org/jira/browse/HIVE-10565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529578#comment-14529578 ] Vikram Dixit K commented on HIVE-10565: --- I am reviewing this one. > LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT > OUTER JOIN repeated key correctly > > > Key: HIVE-10565 > URL: https://issues.apache.org/jira/browse/HIVE-10565 > Project: Hive > Issue Type: Sub-task > Components: Hive >Affects Versions: 1.2.0 >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Fix For: 1.2.0, 1.3.0 > > Attachments: HIVE-10565.01.patch, HIVE-10565.02.patch, > HIVE-10565.03.patch, HIVE-10565.04.patch, HIVE-10565.05.patch, > HIVE-10565.06.patch > > > Filtering can knock out some of the rows for a repeated key, but those > knocked out rows need to be included in the LEFT OUTER JOIN result and are > currently not when only some rows are filtered out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10565) LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT OUTER JOIN repeated key correctly
[ https://issues.apache.org/jira/browse/HIVE-10565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529572#comment-14529572 ] Sushanth Sowmyan commented on HIVE-10565: - Hi Matt, who would be the ideal person to review this patch? > LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT > OUTER JOIN repeated key correctly > > > Key: HIVE-10565 > URL: https://issues.apache.org/jira/browse/HIVE-10565 > Project: Hive > Issue Type: Sub-task > Components: Hive >Affects Versions: 1.2.0 >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Fix For: 1.2.0, 1.3.0 > > Attachments: HIVE-10565.01.patch, HIVE-10565.02.patch, > HIVE-10565.03.patch, HIVE-10565.04.patch, HIVE-10565.05.patch, > HIVE-10565.06.patch > > > Filtering can knock out some of the rows for a repeated key, but those > knocked out rows need to be included in the LEFT OUTER JOIN result and are > currently not when only some rows are filtered out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9451) Add max size of column dictionaries to ORC metadata
[ https://issues.apache.org/jira/browse/HIVE-9451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529570#comment-14529570 ] Prasanth Jayachandran commented on HIVE-9451: - No.. The test failures looks related. [~owen.omalley] Can you take a look at the test failures? I am assuming all these are related to file size differences. > Add max size of column dictionaries to ORC metadata > --- > > Key: HIVE-9451 > URL: https://issues.apache.org/jira/browse/HIVE-9451 > Project: Hive > Issue Type: Improvement >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Labels: ORC > Fix For: 1.2.0 > > Attachments: HIVE-9451.patch, HIVE-9451.patch > > > To predict the amount of memory required to read an ORC file we need to know > the size of the dictionaries for the columns that we are reading. I propose > adding the number of bytes for each column's dictionary to the stripe's > column statistics. The file's column statistics would have the maximum > dictionary size for each column. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9451) Add max size of column dictionaries to ORC metadata
[ https://issues.apache.org/jira/browse/HIVE-9451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529566#comment-14529566 ] Sushanth Sowmyan commented on HIVE-9451: Hi, given the previous +1 pending tests, and tests having run, do the tests look okay to commit? > Add max size of column dictionaries to ORC metadata > --- > > Key: HIVE-9451 > URL: https://issues.apache.org/jira/browse/HIVE-9451 > Project: Hive > Issue Type: Improvement >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Labels: ORC > Fix For: 1.2.0 > > Attachments: HIVE-9451.patch, HIVE-9451.patch > > > To predict the amount of memory required to read an ORC file we need to know > the size of the dictionaries for the columns that we are reading. I propose > adding the number of bytes for each column's dictionary to the stripe's > column statistics. The file's column statistics would have the maximum > dictionary size for each column. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9743) Incorrect result set for vectorized left outer join
[ https://issues.apache.org/jira/browse/HIVE-9743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529560#comment-14529560 ] Vikram Dixit K commented on HIVE-9743: -- That seems to be because with SMB there seems to be full delegation to the base class. I am not sure if we need the SMB changes at all. > Incorrect result set for vectorized left outer join > --- > > Key: HIVE-9743 > URL: https://issues.apache.org/jira/browse/HIVE-9743 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 0.14.0 >Reporter: N Campbell >Assignee: Matt McCline > Attachments: HIVE-9743.01.patch, HIVE-9743.02.patch, > HIVE-9743.03.patch, HIVE-9743.04.patch, HIVE-9743.05.patch, > HIVE-9743.06.patch, HIVE-9743.08.patch > > > This query is supposed to return 3 rows and will when run without Tez but > returns 2 rows when run with Tez. > select tjoin1.rnum, tjoin1.c1, tjoin1.c2, tjoin2.c2 as c2j2 from tjoin1 left > outer join tjoin2 on ( tjoin1.c1 = tjoin2.c1 and tjoin1.c2 > 15 ) > tjoin1.rnum tjoin1.c1 tjoin1.c2 c2j2 > 1 20 25 > 2 50 > instead of > tjoin1.rnum tjoin1.c1 tjoin1.c2 c2j2 > 0 10 15 > 1 20 25 > 2 50 > create table if not exists TJOIN1 (RNUM int , C1 int, C2 int) > STORED AS orc ; > 0|10|15 > 1|20|25 > 2|\N|50 > create table if not exists TJOIN2 (RNUM int , C1 int, C2 char(2)) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' > STORED AS TEXTFILE ; > 0|10|BB > 1|15|DD > 2|\N|EE > 3|10|FF -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-8769) Physical optimizer : Incorrect CE results in a shuffle join instead of a Map join (PK/FK pattern not detected)
[ https://issues.apache.org/jira/browse/HIVE-8769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong reassigned HIVE-8769: - Assignee: Pengcheng Xiong (was: Prasanth Jayachandran) > Physical optimizer : Incorrect CE results in a shuffle join instead of a Map > join (PK/FK pattern not detected) > -- > > Key: HIVE-8769 > URL: https://issues.apache.org/jira/browse/HIVE-8769 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 0.14.0 >Reporter: Mostafa Mokhtar >Assignee: Pengcheng Xiong > Fix For: 1.2.0 > > > TPC-DS Q82 is running slower than hive 13 because the join type is not > correct. > The estimate for item x inventory x date_dim is 227 Million rows while the > actual is 3K rows. > Hive 13 finishes in 753 seconds. > Hive 14 finishes in 1,267 seconds. > Hive 14 + force map join finished in 431 seconds. > Query > {code} > select i_item_id >,i_item_desc >,i_current_price > from item, inventory, date_dim, store_sales > where i_current_price between 30 and 30+30 > and inv_item_sk = i_item_sk > and d_date_sk=inv_date_sk > and d_date between '2002-05-30' and '2002-07-30' > and i_manufact_id in (437,129,727,663) > and inv_quantity_on_hand between 100 and 500 > and ss_item_sk = i_item_sk > group by i_item_id,i_item_desc,i_current_price > order by i_item_id > limit 100 > {code} > Plan > {code} > STAGE PLANS: > Stage: Stage-1 > Tez > Edges: > Map 7 <- Map 1 (BROADCAST_EDGE), Map 2 (BROADCAST_EDGE) > Reducer 4 <- Map 3 (SIMPLE_EDGE), Map 7 (SIMPLE_EDGE) > Reducer 5 <- Reducer 4 (SIMPLE_EDGE) > Reducer 6 <- Reducer 5 (SIMPLE_EDGE) > DagName: mmokhtar_20141106005353_7a2eb8df-12ff-4fe9-89b4-30f1e4e3fb90:1 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: item > filterExpr: ((i_current_price BETWEEN 30 AND 60 and > (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: > boolean) > Statistics: Num rows: 462000 Data size: 663862160 Basic > stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: ((i_current_price BETWEEN 30 AND 60 and > (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: > boolean) > Statistics: Num rows: 115500 Data size: 34185680 Basic > stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: i_item_sk (type: int), i_item_id (type: > string), i_item_desc (type: string), i_current_price (type: float) > outputColumnNames: _col0, _col1, _col2, _col3 > Statistics: Num rows: 115500 Data size: 33724832 Basic > stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 115500 Data size: 33724832 > Basic stats: COMPLETE Column stats: COMPLETE > value expressions: _col1 (type: string), _col2 (type: > string), _col3 (type: float) > Execution mode: vectorized > Map 2 > Map Operator Tree: > TableScan > alias: date_dim > filterExpr: (d_date BETWEEN '2002-05-30' AND '2002-07-30' > and d_date_sk is not null) (type: boolean) > Statistics: Num rows: 73049 Data size: 81741831 Basic > stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: (d_date BETWEEN '2002-05-30' AND '2002-07-30' > and d_date_sk is not null) (type: boolean) > Statistics: Num rows: 36524 Data size: 3579352 Basic > stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: d_date_sk (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 36524 Data size: 146096 Basic > stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 36524 Data size: 146096 Basic > stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: _col0 (type: int) > out
[jira] [Commented] (HIVE-10526) CBO (Calcite Return Path): HiveCost epsilon comparison should take row count in to account
[ https://issues.apache.org/jira/browse/HIVE-10526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529502#comment-14529502 ] Laljo John Pullokkaran commented on HIVE-10526: --- uploaded modified patch last week. For some reason QA run didn't kick in. > CBO (Calcite Return Path): HiveCost epsilon comparison should take row count > in to account > -- > > Key: HIVE-10526 > URL: https://issues.apache.org/jira/browse/HIVE-10526 > Project: Hive > Issue Type: Sub-task > Components: CBO >Affects Versions: 0.12.0 >Reporter: Laljo John Pullokkaran >Assignee: Laljo John Pullokkaran > Fix For: 1.2.0 > > Attachments: HIVE-10526.1.patch, HIVE-10526.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8769) Physical optimizer : Incorrect CE results in a shuffle join instead of a Map join (PK/FK pattern not detected)
[ https://issues.apache.org/jira/browse/HIVE-8769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529498#comment-14529498 ] Pengcheng Xiong commented on HIVE-8769: --- [~mohammedmostafa], I ran it with 1GB TPCDS and it seems that the PKFK is detected, although my plan is different from yours. {code} Vertex dependency in root stage Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 5 (SIMPLE_EDGE), Map 6 (SIMPLE_EDGE) Reducer 3 <- Reducer 2 (SIMPLE_EDGE) Reducer 4 <- Reducer 3 (SIMPLE_EDGE) Map 6 <- Map 7 (BROADCAST_EDGE) Stage-0 Fetch Operator limit:100 Stage-1 Reducer 4 File Output Operator [FS_33] compressed:false Statistics:Num rows: 100 Data size: 39600 Basic stats: COMPLETE Column stats: COMPLETE table:{"serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe","input format:":"org.apache.hadoop.mapred.TextInputFormat","output format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"} Limit [LIM_32] Number of rows:100 Statistics:Num rows: 100 Data size: 39600 Basic stats: COMPLETE Column stats: COMPLETE Select Operator [SEL_31] | outputColumnNames:["_col0","_col1","_col2"] | Statistics:Num rows: 142657470 Data size: 56492358120 Basic stats: COMPLETE Column stats: COMPLETE |<-Reducer 3 [SIMPLE_EDGE] Reduce Output Operator [RS_30] key expressions:_col0 (type: string) sort order:+ Statistics:Num rows: 142657470 Data size: 56492358120 Basic stats: COMPLETE Column stats: COMPLETE value expressions:_col1 (type: string), _col2 (type: decimal(7,2)) Group By Operator [GBY_28] | keys:KEY._col0 (type: string), KEY._col1 (type: string), KEY._col2 (type: decimal(7,2)) | outputColumnNames:["_col0","_col1","_col2"] | Statistics:Num rows: 142657470 Data size: 56492358120 Basic stats: COMPLETE Column stats: COMPLETE |<-Reducer 2 [SIMPLE_EDGE] Reduce Output Operator [RS_27] key expressions:_col0 (type: string), _col1 (type: string), _col2 (type: decimal(7,2)) Map-reduce partition columns:_col0 (type: string), _col1 (type: string), _col2 (type: decimal(7,2)) sort order:+++ Statistics:Num rows: 142657470 Data size: 56492358120 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator [GBY_26] keys:_col0 (type: string), _col1 (type: string), _col2 (type: decimal(7,2)) outputColumnNames:["_col0","_col1","_col2"] Statistics:Num rows: 142657470 Data size: 56492358120 Basic stats: COMPLETE Column stats: COMPLETE Select Operator [SEL_24] outputColumnNames:["_col0","_col1","_col2"] Statistics:Num rows: 142657470 Data size: 56492358120 Basic stats: COMPLETE Column stats: COMPLETE Merge Join Operator [MERGEJOIN_49] | condition map:[{"":"Inner Join 0 to 1"},{"":"Inner Join 1 to 2"}] | keys:{"2":"_col1 (type: int)","1":"_col0 (type: int)","0":"_col0 (type: int)"} | outputColumnNames:["_col2","_col3","_col4"] | Statistics:Num rows: 142657470 Data size: 56492358120 Basic stats: COMPLETE Column stats: COMPLETE |<-Map 1 [SIMPLE_EDGE] | Reduce Output Operator [RS_18] | key expressions:_col0 (type: int) | Map-reduce partition columns:_col0 (type: int) | sort order:+ | Statistics:Num rows: 2880404 Data size: 11521616 Basic stats: COMPLETE Column stats: COMPLETE | Select Operator [SEL_1] |outputColumnNames:["_col0"] |Statistics:Num rows: 2880404 Data size: 11521616 Basic stats: COMPLETE Column stats: COMPLETE |Filter Operator [FIL_44] | predicate:ss_item_sk is not null (type: boolean) | Statistics:Num rows: 2880404 Data size: 11521616 Basic stats: COMPLETE Column stats: COMPLETE | TableScan [TS_0] | ali
[jira] [Commented] (HIVE-10482) LLAP: AsertionError cannot allocate when reading from orc
[ https://issues.apache.org/jira/browse/HIVE-10482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529493#comment-14529493 ] Sergey Shelukhin commented on HIVE-10482: - This happens when BuddyAllocator has one block of memory larger than target allocation. When memory is reserved and several threads go to allocate, they go from target size and then try to split larger sizes. If several threads try to split the block at the same time, one will split and re-add the remainder to lower level lists (e.g. 768k out of 1Mb block, after using 256k, will be added as one 512k block and one 256k block), but when the split is done, the others are waiting on the lock for the 1Mb-block list and will never again look at lower level lists. There are several ways to fix this; adding some sort of "helping" to get threads to provide blocks to other threads after split is very complex (many special cases) and may have perf overhead in common case, plus in general case it may not solve similar issues e.g. with multiple arenas, where we examine full arena 1, then go to non-full arena 2, meanwhile someone allocates from 2 and deallocates to 1, so we are screwed again; making allocator use "actor-like" model (removing all sync and having allocator thread that serves request queue); a retry loop that would retry as long as any changes have happened since last attempt. Not sure yet if 2 or 3 are best. > LLAP: AsertionError cannot allocate when reading from orc > - > > Key: HIVE-10482 > URL: https://issues.apache.org/jira/browse/HIVE-10482 > Project: Hive > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Sergey Shelukhin > Fix For: llap > > > This was from a run of tpch query 1. [~sershe] - not sure if you've already > seen this. Creating a jira so that it doesn't get lost. > {code} > 2015-04-24 13:11:54,180 > [TezTaskRunner_attempt_1429683757595_0326_4_00_000199_0(container_1_0326_01_003216_sseth_20150424131137_8ec6200c-77c8-43ea-a6a3-a0ab1da6e1ac:4_Map > 1_199_0)] ERROR org.apache.hadoop.hive.ql.exec.tez.TezProcessor: > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: > java.io.IOException: java.lang.AssertionError: Cannot allocate > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:74) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:314) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:329) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: java.io.IOException: > java.lang.AssertionError: Cannot allocate > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355) > at > org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) > at > org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:137) >
[jira] [Commented] (HIVE-10506) CBO (Calcite Return Path): Disallow return path to be enable if CBO is off
[ https://issues.apache.org/jira/browse/HIVE-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529489#comment-14529489 ] Laljo John Pullokkaran commented on HIVE-10506: --- +1 > CBO (Calcite Return Path): Disallow return path to be enable if CBO is off > -- > > Key: HIVE-10506 > URL: https://issues.apache.org/jira/browse/HIVE-10506 > Project: Hive > Issue Type: Sub-task > Components: CBO >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Fix For: 1.2.0 > > Attachments: HIVE-10506.01.patch, HIVE-10506.patch > > > If hive.cbo.enable=false and hive.cbo.returnpath=true then some optimizations > would kick in. It's quite possible that in customer environment, they might > end up in these scenarios; we should prevent it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10591) Support limited integer type promotion in ORC
[ https://issues.apache.org/jira/browse/HIVE-10591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529491#comment-14529491 ] Hive QA commented on HIVE-10591: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12730595/HIVE-10591.2.patch {color:red}ERROR:{color} -1 due to 53 failed/errored test(s), 8901 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_parts org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_all_non_partitioned org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_tmp_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_where_no_match org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_where_non_partitioned org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_update_delete org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_transform_acid org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_update_all_non_partitioned org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_update_tmp_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_update_where_no_match org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_update_where_non_partitioned org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_unencrypted_tbl org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_with_different_encryption_keys org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_load_data_to_encrypted_tables org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_select_read_only_encrypted_tbl org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_all_non_partitioned org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_tmp_table org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_where_no_match org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_where_non_partitioned org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert_update_delete org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_update_all_non_partitioned org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_update_tmp_table org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_update_where_no_match org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_update_where_non_partitioned org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_disallow_transform org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_droppartition org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_sba_drop_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_alterpart_loc org.apache.hadoop.hive.ql.TestTxnCommands2.testBucketizedInputFormat org.apache.hadoop.hive.ql.TestTxnCommands2.testDeleteIn org.apache.hadoop.hive.ql.TestTxnCommands2.testUpdateMixedCase org.apache.hadoop.hive.ql.security.TestStorageBasedClientSideAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropDatabase org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropTable org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropView org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProviderWithACL.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbFailure org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbSuccess org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableFailure org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableSuccess org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessing org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessingCustomSetWhitelistAppend org.apache.hadoop.hive.ql.txn.compactor.TestCompactor.majorCompactAfterAbort org.apache.hadoop.hive.ql.txn.compa
[jira] [Commented] (HIVE-10482) LLAP: AsertionError cannot allocate when reading from orc
[ https://issues.apache.org/jira/browse/HIVE-10482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529476#comment-14529476 ] Sergey Shelukhin commented on HIVE-10482: - I found the issue, not clear how to fix this yet though > LLAP: AsertionError cannot allocate when reading from orc > - > > Key: HIVE-10482 > URL: https://issues.apache.org/jira/browse/HIVE-10482 > Project: Hive > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Sergey Shelukhin > Fix For: llap > > > This was from a run of tpch query 1. [~sershe] - not sure if you've already > seen this. Creating a jira so that it doesn't get lost. > {code} > 2015-04-24 13:11:54,180 > [TezTaskRunner_attempt_1429683757595_0326_4_00_000199_0(container_1_0326_01_003216_sseth_20150424131137_8ec6200c-77c8-43ea-a6a3-a0ab1da6e1ac:4_Map > 1_199_0)] ERROR org.apache.hadoop.hive.ql.exec.tez.TezProcessor: > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: > java.io.IOException: java.lang.AssertionError: Cannot allocate > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:74) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:314) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:329) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: java.io.IOException: > java.lang.AssertionError: Cannot allocate > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355) > at > org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) > at > org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:137) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:62) > ... 16 more > Caused by: java.io.IOException: java.lang.AssertionError: Cannot allocate > at > org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.rethrowErrorIfAny(LlapInputFormat.java:257) > at > org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.nextCvb(LlapInputFormat.java:209) > at > org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:147) > at > org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:97) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) > ... 22 more > Caused by: java.lang.AssertionError: Cannot allocate > at > org.apache.hadoop.hive.ql.io.orc.InStream.readEncodedStream(InStream.java:761) > at > org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:441) >
[jira] [Commented] (HIVE-9743) Incorrect result set for vectorized left outer join
[ https://issues.apache.org/jira/browse/HIVE-9743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529477#comment-14529477 ] Matt McCline commented on HIVE-9743: [~vikram.dixit] I removed the annotations and the MR vector_left_outer_join3.q.out and fiddled with environment variables so that it now has "Sorted Merge Bucket Map Join Operator" operators; Tez has "Merge Join Operator" as you said. The original LEFT OUTER JOIN problem does not repro with vector_left_outer_join3.q though. > Incorrect result set for vectorized left outer join > --- > > Key: HIVE-9743 > URL: https://issues.apache.org/jira/browse/HIVE-9743 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 0.14.0 >Reporter: N Campbell >Assignee: Matt McCline > Attachments: HIVE-9743.01.patch, HIVE-9743.02.patch, > HIVE-9743.03.patch, HIVE-9743.04.patch, HIVE-9743.05.patch, > HIVE-9743.06.patch, HIVE-9743.08.patch > > > This query is supposed to return 3 rows and will when run without Tez but > returns 2 rows when run with Tez. > select tjoin1.rnum, tjoin1.c1, tjoin1.c2, tjoin2.c2 as c2j2 from tjoin1 left > outer join tjoin2 on ( tjoin1.c1 = tjoin2.c1 and tjoin1.c2 > 15 ) > tjoin1.rnum tjoin1.c1 tjoin1.c2 c2j2 > 1 20 25 > 2 50 > instead of > tjoin1.rnum tjoin1.c1 tjoin1.c2 c2j2 > 0 10 15 > 1 20 25 > 2 50 > create table if not exists TJOIN1 (RNUM int , C1 int, C2 int) > STORED AS orc ; > 0|10|15 > 1|20|25 > 2|\N|50 > create table if not exists TJOIN2 (RNUM int , C1 int, C2 char(2)) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' > STORED AS TEXTFILE ; > 0|10|BB > 1|15|DD > 2|\N|EE > 3|10|FF -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10610) hive command fails to get hadoop version
[ https://issues.apache.org/jira/browse/HIVE-10610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-10610: Assignee: Shwetha G S > hive command fails to get hadoop version > > > Key: HIVE-10610 > URL: https://issues.apache.org/jira/browse/HIVE-10610 > Project: Hive > Issue Type: Bug >Reporter: Shwetha G S >Assignee: Shwetha G S > Attachments: HIVE-10610.patch > > > NO PRECOMMIT TESTS > If debug level logging is enabled, hive command fails with the following > exception: > {noformat} > apache-hive-1.2.0-SNAPSHOT-bin$ ./bin/hive > Unable to determine Hadoop version information from 13:54:07,683 > 'hadoop version' returned: > 2015-05-05 13:54:08,014 DEBUG - [main:] ~ version: 2.5.0-cdh5.3.3 > (VersionInfo:171) Hadoop 2.5.0-cdh5.3.3 Subversion > http://github.com/cloudera/hadoop -r 82a65209d6e9e4a2b41fdbcd8190c7ea38730627 > Compiled by jenkins on 2015-04-08T22:00Z Compiled with protoc 2.5.0 From > source with checksum 1531e104cdad7489656f44875f3334b This command was run > using > /Users/sshivalingamurthy/installs/hadoop-2.5.0-cdh5.3.3/share/hadoop/common/hadoop-common-2.5.0-cdh5.3.3.jar > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10615) LLAP: Invalid containerId prefix
[ https://issues.apache.org/jira/browse/HIVE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529473#comment-14529473 ] Prasanth Jayachandran commented on HIVE-10615: -- [~sseth] fyi.. > LLAP: Invalid containerId prefix > > > Key: HIVE-10615 > URL: https://issues.apache.org/jira/browse/HIVE-10615 > Project: Hive > Issue Type: Sub-task >Affects Versions: llap >Reporter: Prasanth Jayachandran > > I encountered this error when I ran a simple query in llap mode today. > {code}org.apache.hadoop.ipc.RemoteException(java.io.IOException): > java.lang.IllegalArgumentException: Invalid ContainerId prefix: > at > org.apache.hadoop.yarn.api.records.ContainerId.fromString(ContainerId.java:211) > at > org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:178) > at > org.apache.tez.dag.app.TezTaskCommunicatorImpl$TezTaskUmbilicalProtocolImpl.heartbeat(TezTaskCommunicatorImpl.java:311) > at > org.apache.hadoop.hive.llap.tezplugins.LlapTaskCommunicator$LlapTaskUmbilicalProtocolImpl.heartbeat(LlapTaskCommunicator.java:398) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.ipc.WritableRpcEngine$Server$WritableRpcInvoker.call(WritableRpcEngine.java:514) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) > at org.apache.hadoop.ipc.Client.call(Client.java:1468) > at org.apache.hadoop.ipc.Client.call(Client.java:1399) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:244) > at com.sun.proxy.$Proxy14.heartbeat(Unknown Source) > at > org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.heartbeat(LlapTaskReporter.java:256) > at > org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.call(LlapTaskReporter.java:184) > at > org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.call(LlapTaskReporter.java:126) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 15/05/05 15:24:22 [Task-Executor-0] INFO task.TezTaskRunner : Interrupted > while waiting for task to complete. Interrupting task > 15/05/05 15:24:22 [TezTaskRunner_attempt_1430816501738_0034_1_00_00_0] > INFO task.TezTaskRunner : Encounted an error while executing task: > attempt_1430816501738_0034_1_00_00_0 > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2052) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:193) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.initialize(LogicalIOProcessorRuntimeTask.java:218) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:177) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(Thr
[jira] [Commented] (HIVE-8065) Support HDFS encryption functionality on Hive
[ https://issues.apache.org/jira/browse/HIVE-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529461#comment-14529461 ] Brock Noland commented on HIVE-8065: bq. have you considered creating a single encrypted staging dir for all queries to use instead of creating new ones under the table namespace? (this could be owned by Hive and encrypted with Hive's key). If so, why did you choose the current design? This approach does not work since you cannot move files across encryption zones. > Support HDFS encryption functionality on Hive > - > > Key: HIVE-8065 > URL: https://issues.apache.org/jira/browse/HIVE-8065 > Project: Hive > Issue Type: Improvement >Affects Versions: 0.13.1 >Reporter: Sergio Peña >Assignee: Sergio Peña > Labels: Hive-Scrum > > The new encryption support on HDFS makes Hive incompatible and unusable when > this feature is used. > HDFS encryption is designed so that an user can configure different > encryption zones (or directories) for multi-tenant environments. An > encryption zone has an exclusive encryption key, such as AES-128 or AES-256. > Because of security compliance, the HDFS does not allow to move/rename files > between encryption zones. Renames are allowed only inside the same encryption > zone. A copy is allowed between encryption zones. > See HDFS-6134 for more details about HDFS encryption design. > Hive currently uses a scratch directory (like /tmp/$user/$random). This > scratch directory is used for the output of intermediate data (between MR > jobs) and for the final output of the hive query which is later moved to the > table directory location. > If Hive tables are in different encryption zones than the scratch directory, > then Hive won't be able to renames those files/directories, and it will make > Hive unusable. > To handle this problem, we can change the scratch directory of the > query/statement to be inside the same encryption zone of the table directory > location. This way, the renaming process will be successful. > Also, for statements that move files between encryption zones (i.e. LOAD > DATA), a copy may be executed instead of a rename. This will cause an > overhead when copying large data files, but it won't break the encryption on > Hive. > Another security thing to consider is when using joins selects. If Hive joins > different tables with different encryption key strengths, then the results of > the select might break the security compliance of the tables. Let's say two > tables with 128 bits and 256 bits encryption are joined, then the temporary > results might be stored in the 128 bits encryption zone. This will conflict > with the table encrypted with 256 bits temporary. > To fix this, Hive should be able to select the scratch directory that is more > secured/encrypted in order to save the intermediate data temporary with no > compliance issues. > For instance: > {noformat} > SELECT * FROM table-aes128 t1 JOIN table-aes256 t2 WHERE t1.id == t2.id; > {noformat} > - This should use a scratch directory (or staging directory) inside the > table-aes256 table location. > {noformat} > INSERT OVERWRITE TABLE table-unencrypted SELECT * FROM table-aes1; > {noformat} > - This should use a scratch directory inside the table-aes1 location. > {noformat} > FROM table-unencrypted > INSERT OVERWRITE TABLE table-aes128 SELECT id, name > INSERT OVERWRITE TABLE table-aes256 SELECT id, name > {noformat} > - This should use a scratch directory on each of the tables locations. > - The first SELECT will have its scratch directory on table-aes128 directory. > - The second SELECT will have its scratch directory on table-aes256 directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9743) Incorrect result set for vectorized left outer join
[ https://issues.apache.org/jira/browse/HIVE-9743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-9743: --- Attachment: HIVE-9743.08.patch > Incorrect result set for vectorized left outer join > --- > > Key: HIVE-9743 > URL: https://issues.apache.org/jira/browse/HIVE-9743 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 0.14.0 >Reporter: N Campbell >Assignee: Matt McCline > Attachments: HIVE-9743.01.patch, HIVE-9743.02.patch, > HIVE-9743.03.patch, HIVE-9743.04.patch, HIVE-9743.05.patch, > HIVE-9743.06.patch, HIVE-9743.08.patch > > > This query is supposed to return 3 rows and will when run without Tez but > returns 2 rows when run with Tez. > select tjoin1.rnum, tjoin1.c1, tjoin1.c2, tjoin2.c2 as c2j2 from tjoin1 left > outer join tjoin2 on ( tjoin1.c1 = tjoin2.c1 and tjoin1.c2 > 15 ) > tjoin1.rnum tjoin1.c1 tjoin1.c2 c2j2 > 1 20 25 > 2 50 > instead of > tjoin1.rnum tjoin1.c1 tjoin1.c2 c2j2 > 0 10 15 > 1 20 25 > 2 50 > create table if not exists TJOIN1 (RNUM int , C1 int, C2 int) > STORED AS orc ; > 0|10|15 > 1|20|25 > 2|\N|50 > create table if not exists TJOIN2 (RNUM int , C1 int, C2 char(2)) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' > STORED AS TEXTFILE ; > 0|10|BB > 1|15|DD > 2|\N|EE > 3|10|FF -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9743) Incorrect result set for vectorized left outer join
[ https://issues.apache.org/jira/browse/HIVE-9743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-9743: --- Attachment: (was: HIVE-9743.07.patch) > Incorrect result set for vectorized left outer join > --- > > Key: HIVE-9743 > URL: https://issues.apache.org/jira/browse/HIVE-9743 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 0.14.0 >Reporter: N Campbell >Assignee: Matt McCline > Attachments: HIVE-9743.01.patch, HIVE-9743.02.patch, > HIVE-9743.03.patch, HIVE-9743.04.patch, HIVE-9743.05.patch, HIVE-9743.06.patch > > > This query is supposed to return 3 rows and will when run without Tez but > returns 2 rows when run with Tez. > select tjoin1.rnum, tjoin1.c1, tjoin1.c2, tjoin2.c2 as c2j2 from tjoin1 left > outer join tjoin2 on ( tjoin1.c1 = tjoin2.c1 and tjoin1.c2 > 15 ) > tjoin1.rnum tjoin1.c1 tjoin1.c2 c2j2 > 1 20 25 > 2 50 > instead of > tjoin1.rnum tjoin1.c1 tjoin1.c2 c2j2 > 0 10 15 > 1 20 25 > 2 50 > create table if not exists TJOIN1 (RNUM int , C1 int, C2 int) > STORED AS orc ; > 0|10|15 > 1|20|25 > 2|\N|50 > create table if not exists TJOIN2 (RNUM int , C1 int, C2 char(2)) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' > STORED AS TEXTFILE ; > 0|10|BB > 1|15|DD > 2|\N|EE > 3|10|FF -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10213) MapReduce jobs using dynamic-partitioning fail on commit.
[ https://issues.apache.org/jira/browse/HIVE-10213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529452#comment-14529452 ] Sushanth Sowmyan commented on HIVE-10213: - This patch set off some warning flags for me with regards to the traditional M-R usecase, but it's because it's been a while since I looked at this piece of code. The traditional M-R usecase is still fine, because the DynamicPartitionFileRecordWriterContainer.close() will register an appropriate TaskCommitterProxy, and a commit on the OutputCommitter will be called in the same process scope, thus making it okay. For pig-based optimizations also, it'd continue to be okay as the singleton retains it in memory. +1, and I'm okay with committing this patch as-is, tests have already run on this, and this section of code has not changed since then. > MapReduce jobs using dynamic-partitioning fail on commit. > - > > Key: HIVE-10213 > URL: https://issues.apache.org/jira/browse/HIVE-10213 > Project: Hive > Issue Type: Bug > Components: HCatalog >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > Attachments: HIVE-10213.1.patch > > > I recently ran into a problem in {{TaskCommitContextRegistry}}, when using > dynamic-partitions. > Consider a MapReduce program that reads HCatRecords from a table (using > HCatInputFormat), and then writes to another table (with identical schema), > using HCatOutputFormat. The Map-task fails with the following exception: > {code} > Error: java.io.IOException: No callback registered for > TaskAttemptID:attempt_1426589008676_509707_m_00_0@hdfs://crystalmyth.myth.net:8020/user/mithunr/mythdb/target/_DYN0.6784154320609959/grid=__HIVE_DEFAULT_PARTITION__/dt=__HIVE_DEFAULT_PARTITION__ > at > org.apache.hive.hcatalog.mapreduce.TaskCommitContextRegistry.commitTask(TaskCommitContextRegistry.java:56) > at > org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitTask(FileOutputCommitterContainer.java:139) > at org.apache.hadoop.mapred.Task.commit(Task.java:1163) > at org.apache.hadoop.mapred.Task.done(Task.java:1025) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:345) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > {code} > {{TaskCommitContextRegistry::commitTask()}} uses call-backs registered from > {{DynamicPartitionFileRecordWriter}}. But in case {{HCatInputFormat}} and > {{HCatOutputFormat}} are both used in the same job, the > {{DynamicPartitionFileRecordWriter}} might only be exercised in the Reducer. > I'm relaxing the IOException, and log a warning message instead of just > failing. > (I'll post the fix shortly.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10608) Fix useless 'if' stamement in RetryingMetaStoreClient (135)
[ https://issues.apache.org/jira/browse/HIVE-10608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529430#comment-14529430 ] Szehon Ho commented on HIVE-10608: -- +1 > Fix useless 'if' stamement in RetryingMetaStoreClient (135) > --- > > Key: HIVE-10608 > URL: https://issues.apache.org/jira/browse/HIVE-10608 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Alexander Pivovarov >Assignee: Alexander Pivovarov >Priority: Minor > Attachments: rb33861.patch > > > "if" statement below is useless because it ends with ; > {code} > } catch (MetaException e) { > if (e.getMessage().matches("(?s).*(IO|TTransport)Exception.*")); > caughtException = e; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8065) Support HDFS encryption functionality on Hive
[ https://issues.apache.org/jira/browse/HIVE-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529378#comment-14529378 ] Eugene Koifman commented on HIVE-8065: -- [~spena], when implementing this, have you considered creating a single encrypted staging dir for all queries to use instead of creating new ones under the table namespace? (this could be owned by Hive and encrypted with Hive's key). If so, why did you choose the current design? Some possible issues with current design: Requires write permission on the table dir delete-on-exit (on stagingdir) is not completely reliable as far as I know. This may leave files around in a query like "SELECT * FROM table-aes128 t1 JOIN table-aes256 t2 WHERE t1.id == t2.id;" when staging dir is created under table-aes256, someone how has a key for this EZ may read data (in theory at least) that came from table-aes128 even if they don't have a key for EZ which contains table-aes128. thanks > Support HDFS encryption functionality on Hive > - > > Key: HIVE-8065 > URL: https://issues.apache.org/jira/browse/HIVE-8065 > Project: Hive > Issue Type: Improvement >Affects Versions: 0.13.1 >Reporter: Sergio Peña >Assignee: Sergio Peña > Labels: Hive-Scrum > > The new encryption support on HDFS makes Hive incompatible and unusable when > this feature is used. > HDFS encryption is designed so that an user can configure different > encryption zones (or directories) for multi-tenant environments. An > encryption zone has an exclusive encryption key, such as AES-128 or AES-256. > Because of security compliance, the HDFS does not allow to move/rename files > between encryption zones. Renames are allowed only inside the same encryption > zone. A copy is allowed between encryption zones. > See HDFS-6134 for more details about HDFS encryption design. > Hive currently uses a scratch directory (like /tmp/$user/$random). This > scratch directory is used for the output of intermediate data (between MR > jobs) and for the final output of the hive query which is later moved to the > table directory location. > If Hive tables are in different encryption zones than the scratch directory, > then Hive won't be able to renames those files/directories, and it will make > Hive unusable. > To handle this problem, we can change the scratch directory of the > query/statement to be inside the same encryption zone of the table directory > location. This way, the renaming process will be successful. > Also, for statements that move files between encryption zones (i.e. LOAD > DATA), a copy may be executed instead of a rename. This will cause an > overhead when copying large data files, but it won't break the encryption on > Hive. > Another security thing to consider is when using joins selects. If Hive joins > different tables with different encryption key strengths, then the results of > the select might break the security compliance of the tables. Let's say two > tables with 128 bits and 256 bits encryption are joined, then the temporary > results might be stored in the 128 bits encryption zone. This will conflict > with the table encrypted with 256 bits temporary. > To fix this, Hive should be able to select the scratch directory that is more > secured/encrypted in order to save the intermediate data temporary with no > compliance issues. > For instance: > {noformat} > SELECT * FROM table-aes128 t1 JOIN table-aes256 t2 WHERE t1.id == t2.id; > {noformat} > - This should use a scratch directory (or staging directory) inside the > table-aes256 table location. > {noformat} > INSERT OVERWRITE TABLE table-unencrypted SELECT * FROM table-aes1; > {noformat} > - This should use a scratch directory inside the table-aes1 location. > {noformat} > FROM table-unencrypted > INSERT OVERWRITE TABLE table-aes128 SELECT id, name > INSERT OVERWRITE TABLE table-aes256 SELECT id, name > {noformat} > - This should use a scratch directory on each of the tables locations. > - The first SELECT will have its scratch directory on table-aes128 directory. > - The second SELECT will have its scratch directory on table-aes256 directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7018) Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others
[ https://issues.apache.org/jira/browse/HIVE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529366#comment-14529366 ] Thejas M Nair commented on HIVE-7018: - This change breaks schematool upgrade - See HIVE-10614 > Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but > not others > - > > Key: HIVE-7018 > URL: https://issues.apache.org/jira/browse/HIVE-7018 > Project: Hive > Issue Type: Bug >Reporter: Brock Noland >Assignee: Yongzhi Chen > Fix For: 1.2.0 > > Attachments: HIVE-7018.1.patch, HIVE-7018.2.patch > > > It appears that at least postgres and oracle do not have the LINK_TARGET_ID > column while mysql does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9534) incorrect result set for query that projects a windowed aggregate
[ https://issues.apache.org/jira/browse/HIVE-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529304#comment-14529304 ] Chaoyu Tang commented on HIVE-9534: --- Looks like that the over(analytic_clause) part will be ignored in a query with distinct in Hive: {code} function @init { gParent.pushMsg("function specification", state); } @after { gParent.popMsg(state); } : functionName LPAREN ( (STAR) => (star=STAR) | (dist=KW_DISTINCT)? (selectExpression (COMMA selectExpression)*)? ) RPAREN (KW_OVER ws=window_specification)? -> {$star != null}? ^(TOK_FUNCTIONSTAR functionName $ws?) -> {$dist == null}? ^(TOK_FUNCTION functionName (selectExpression+)? $ws?) -> ^(TOK_FUNCTIONDI functionName (selectExpression+)?) ; {code} the query like: select avg(distinct col1) over() from testwindow; or select avg(distinct col1) over(order by col2 rows between 1 preceding and 1 following) from testwindow; the over(...) is totally ignored. So I am going to fix this issue by throwing out unsupported error. > incorrect result set for query that projects a windowed aggregate > - > > Key: HIVE-9534 > URL: https://issues.apache.org/jira/browse/HIVE-9534 > Project: Hive > Issue Type: Bug > Components: SQL >Reporter: N Campbell >Assignee: Chaoyu Tang > > Result set returned by Hive has one row instead of 5 > {code} > select avg(distinct tsint.csint) over () from tsint > create table if not exists TSINT (RNUM int , CSINT smallint) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' > STORED AS TEXTFILE; > 0|\N > 1|-1 > 2|0 > 3|1 > 4|10 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10592) ORC file dump in JSON format
[ https://issues.apache.org/jira/browse/HIVE-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529258#comment-14529258 ] Gopal V commented on HIVE-10592: LGTM - +1 But a follow-up usability JIRA advised for the multi-file output scenarios - you need to produce an array of JSON objects instead of a JSON object of arrays. Since there are no consumers for this output until there is some output, we need to iterate on this after writing some analysis scripts once this makes it into the build. As an example of the difficulty in keeping JSON object walkers simple, try running {code} ./dist/hive/bin/hive --service orcfiledump -j -p /apps/hive/warehouse/tpcds5_bin_partitioned_orc_200.db/customer_demographics/00_0 /apps/hive/warehouse/tpcds5_bin_partitioned_orc_200.db/customer_demographics/01_0 { "fileName": [ "\/apps\/hive\/warehouse\/tpcds5_bin_partitioned_orc_200.db\/customer_demographics\/00_0", "\/apps\/hive\/warehouse\/tpcds5_bin_partitioned_orc_200.db\/customer_demographics\/01_0" ], "fileVersion": [ "0.12", "0.12" ], "writerVersion": [ "HIVE_8732", "HIVE_8732" ], ... {code} > ORC file dump in JSON format > > > Key: HIVE-10592 > URL: https://issues.apache.org/jira/browse/HIVE-10592 > Project: Hive > Issue Type: New Feature >Affects Versions: 1.3.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-10592.1.patch, HIVE-10592.2.patch, > HIVE-10592.3.patch > > > ORC file dump uses custom format. Will be useful to dump ORC metadata in json > format so that other tools can be built on top it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10595) Dropping a table can cause NPEs in the compactor
[ https://issues.apache.org/jira/browse/HIVE-10595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529236#comment-14529236 ] Sushanth Sowmyan commented on HIVE-10595: - And by that, I mean +1 for inclusion to branch-1.2 > Dropping a table can cause NPEs in the compactor > > > Key: HIVE-10595 > URL: https://issues.apache.org/jira/browse/HIVE-10595 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 0.14.0, 1.0.0, 1.1.0 >Reporter: Alan Gates >Assignee: Alan Gates > Attachments: HIVE-10595.patch > > > Reproduction: > # start metastore with compactor off > # insert enough entries in a table to trigger a compaction > # drop the table > # stop metastore > # restart metastore with compactor on > Result: NPE in the compactor threads. I suspect this would also happen if > the inserts and drops were done in between a run of the compactor, but I > haven't proven it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10595) Dropping a table can cause NPEs in the compactor
[ https://issues.apache.org/jira/browse/HIVE-10595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529235#comment-14529235 ] Sushanth Sowmyan commented on HIVE-10595: - >From the description, this is an NPE in a pretty core feature, so this >qualifies as an outage. Added to >https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status > Dropping a table can cause NPEs in the compactor > > > Key: HIVE-10595 > URL: https://issues.apache.org/jira/browse/HIVE-10595 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 0.14.0, 1.0.0, 1.1.0 >Reporter: Alan Gates >Assignee: Alan Gates > Attachments: HIVE-10595.patch > > > Reproduction: > # start metastore with compactor off > # insert enough entries in a table to trigger a compaction > # drop the table > # stop metastore > # restart metastore with compactor on > Result: NPE in the compactor threads. I suspect this would also happen if > the inserts and drops were done in between a run of the compactor, but I > haven't proven it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10613) HCatSchemaUtils getHCatFieldSchema should include field comment
[ https://issues.apache.org/jira/browse/HIVE-10613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Friedrich updated HIVE-10613: Attachment: HIVE-10613.1.patch > HCatSchemaUtils getHCatFieldSchema should include field comment > --- > > Key: HIVE-10613 > URL: https://issues.apache.org/jira/browse/HIVE-10613 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 1.0.0 >Reporter: Thomas Friedrich >Assignee: Thomas Friedrich >Priority: Minor > Attachments: HIVE-10613.1.patch > > > HCatSchemaUtils.getHCatFieldSchema converts a FieldSchema to a > HCatFieldSchema. Instead of initializing the comment property from the > FieldSchema object, the comment in the HCatFieldSchema is always set to null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10595) Dropping a table can cause NPEs in the compactor
[ https://issues.apache.org/jira/browse/HIVE-10595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529230#comment-14529230 ] Alan Gates commented on HIVE-10595: --- [~sushanth], I'd like to add this to 1.2 as it can jam the compactor. It's already patch available, so I should be able to get it in quickly. [~ekoifman], would you have time to review this? > Dropping a table can cause NPEs in the compactor > > > Key: HIVE-10595 > URL: https://issues.apache.org/jira/browse/HIVE-10595 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 0.14.0, 1.0.0, 1.1.0 >Reporter: Alan Gates >Assignee: Alan Gates > Attachments: HIVE-10595.patch > > > Reproduction: > # start metastore with compactor off > # insert enough entries in a table to trigger a compaction > # drop the table > # stop metastore > # restart metastore with compactor on > Result: NPE in the compactor threads. I suspect this would also happen if > the inserts and drops were done in between a run of the compactor, but I > haven't proven it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10608) Fix useless 'if' stamement in RetryingMetaStoreClient (135)
[ https://issues.apache.org/jira/browse/HIVE-10608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529228#comment-14529228 ] Chaoyu Tang commented on HIVE-10608: +1 (non-binding) > Fix useless 'if' stamement in RetryingMetaStoreClient (135) > --- > > Key: HIVE-10608 > URL: https://issues.apache.org/jira/browse/HIVE-10608 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Alexander Pivovarov >Assignee: Alexander Pivovarov >Priority: Minor > Attachments: rb33861.patch > > > "if" statement below is useless because it ends with ; > {code} > } catch (MetaException e) { > if (e.getMessage().matches("(?s).*(IO|TTransport)Exception.*")); > caughtException = e; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10614) schemaTool upgrade from 0.14.0 to 1.3.0 causes failure
[ https://issues.apache.org/jira/browse/HIVE-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529229#comment-14529229 ] Sushanth Sowmyan commented on HIVE-10614: - Marked as outage, approved for 1.2 > schemaTool upgrade from 0.14.0 to 1.3.0 causes failure > -- > > Key: HIVE-10614 > URL: https://issues.apache.org/jira/browse/HIVE-10614 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Hari Sankar Sivarama Subramaniyan >Assignee: Hari Sankar Sivarama Subramaniyan >Priority: Critical > > ./schematool -dbType mysql -upgradeSchemaFrom 0.14.0 -verbose > {code} > ++--+ > | >| > ++--+ > | < HIVE-7018 Remove Table and Partition tables column LINK_TARGET_ID from > Mysql for other DBs do not have it > | > ++--+ > 1 row selected (0.004 seconds) > 0: jdbc:mysql://node-1.example.com/hive> DROP PROCEDURE IF EXISTS > RM_TLBS_LINKID > No rows affected (0.005 seconds) > 0: jdbc:mysql://node-1.example.com/hive> DROP PROCEDURE IF EXISTS > RM_PARTITIONS_LINKID > No rows affected (0.006 seconds) > 0: jdbc:mysql://node-1.example.com/hive> DROP PROCEDURE IF EXISTS RM_LINKID > No rows affected (0.002 seconds) > 0: jdbc:mysql://node-1.example.com/hive> CREATE PROCEDURE RM_TLBS_LINKID() > BEGIN IF EXISTS (SELECT * FROM `INFORMATION_SCHEMA`.`COLUMNS` WHERE > `TABLE_NAME` = 'TBLS' AND `COLUMN_NAME` = 'LINK_TARGET_ID') THEN ALTER TABLE > `TBLS` DROP FOREIGN KEY `TBLS_FK3` ; ALTER TABLE `TBLS` DROP KEY `TBLS_N51` ; > ALTER TABLE `TBLS` DROP COLUMN `LINK_TARGET_ID` ; END IF; END > Error: You have an error in your SQL syntax; check the manual that > corresponds to your MySQL server version for the right syntax to use near '' > at line 1 (state=42000,code=1064) > Closing: 0: jdbc:mysql://node-1.example.com/hive?createDatabaseIfNotExist=true > org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore > state would be inconsistent !! > org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore > state would be inconsistent !! > at > org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:229) > at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:468) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > Caused by: java.io.IOException: Schema script failed, errorcode 2 > at > org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:355) > at > org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:326) > at > org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:224) > {code} > Looks like HIVE-7018 has introduced stored procedure as part of mysql upgrade > script and it is causing issues with schematool upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10591) Support limited integer type promotion in ORC
[ https://issues.apache.org/jira/browse/HIVE-10591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-10591: - Attachment: HIVE-10591.2.patch Attaching again due to build machine error. > Support limited integer type promotion in ORC > - > > Key: HIVE-10591 > URL: https://issues.apache.org/jira/browse/HIVE-10591 > Project: Hive > Issue Type: New Feature >Affects Versions: 1.3.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-10591.1.patch, HIVE-10591.2.patch, > HIVE-10591.2.patch > > > ORC currently does not support schema-on-read. If we alter an ORC table with > 'int' type to 'bigint' and if we query the altered table ClassCastException > will be thrown as the schema on read from table descriptor will expect > LongWritable whereas ORC will return IntWritable based on file schema stored > within ORC file. OrcSerde currently doesn't do any type conversions or type > promotions for performance reasons in inner loop. Since smallints, ints and > bigints are stored in the same way in ORC, it will be possible be allow such > type promotions without hurting performance. Following type promotions can be > supported without any casting > smallint -> int > smallint -> bigint > int -> bigint > Tinyint promotion is not possible without casting as tinyints are stored > using RLE byte writer whereas smallints, ints and bigints are stored using > RLE integer writer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10608) Fix useless 'if' stamement in RetryingMetaStoreClient (135)
[ https://issues.apache.org/jira/browse/HIVE-10608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10608: --- Attachment: rb33861.patch patch #1 > Fix useless 'if' stamement in RetryingMetaStoreClient (135) > --- > > Key: HIVE-10608 > URL: https://issues.apache.org/jira/browse/HIVE-10608 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Alexander Pivovarov >Assignee: Alexander Pivovarov >Priority: Minor > Attachments: rb33861.patch > > > "if" statement below is useless because it ends with ; > {code} > } catch (MetaException e) { > if (e.getMessage().matches("(?s).*(IO|TTransport)Exception.*")); > caughtException = e; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10610) hive command fails to get hadoop version
[ https://issues.apache.org/jira/browse/HIVE-10610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529202#comment-14529202 ] Sushanth Sowmyan commented on HIVE-10610: - +1. Also approved for 1.2, since this is a trivial patch. Editing description to note "NO PRECOMMIT TESTS" > hive command fails to get hadoop version > > > Key: HIVE-10610 > URL: https://issues.apache.org/jira/browse/HIVE-10610 > Project: Hive > Issue Type: Bug >Reporter: Shwetha G S > Attachments: HIVE-10610.patch > > > If debug level logging is enabled, hive command fails with the following > exception: > {noformat} > apache-hive-1.2.0-SNAPSHOT-bin$ ./bin/hive > Unable to determine Hadoop version information from 13:54:07,683 > 'hadoop version' returned: > 2015-05-05 13:54:08,014 DEBUG - [main:] ~ version: 2.5.0-cdh5.3.3 > (VersionInfo:171) Hadoop 2.5.0-cdh5.3.3 Subversion > http://github.com/cloudera/hadoop -r 82a65209d6e9e4a2b41fdbcd8190c7ea38730627 > Compiled by jenkins on 2015-04-08T22:00Z Compiled with protoc 2.5.0 From > source with checksum 1531e104cdad7489656f44875f3334b This command was run > using > /Users/sshivalingamurthy/installs/hadoop-2.5.0-cdh5.3.3/share/hadoop/common/hadoop-common-2.5.0-cdh5.3.3.jar > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10610) hive command fails to get hadoop version
[ https://issues.apache.org/jira/browse/HIVE-10610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-10610: Description: NO PRECOMMIT TESTS If debug level logging is enabled, hive command fails with the following exception: {noformat} apache-hive-1.2.0-SNAPSHOT-bin$ ./bin/hive Unable to determine Hadoop version information from 13:54:07,683 'hadoop version' returned: 2015-05-05 13:54:08,014 DEBUG - [main:] ~ version: 2.5.0-cdh5.3.3 (VersionInfo:171) Hadoop 2.5.0-cdh5.3.3 Subversion http://github.com/cloudera/hadoop -r 82a65209d6e9e4a2b41fdbcd8190c7ea38730627 Compiled by jenkins on 2015-04-08T22:00Z Compiled with protoc 2.5.0 From source with checksum 1531e104cdad7489656f44875f3334b This command was run using /Users/sshivalingamurthy/installs/hadoop-2.5.0-cdh5.3.3/share/hadoop/common/hadoop-common-2.5.0-cdh5.3.3.jar {noformat} was: If debug level logging is enabled, hive command fails with the following exception: {noformat} apache-hive-1.2.0-SNAPSHOT-bin$ ./bin/hive Unable to determine Hadoop version information from 13:54:07,683 'hadoop version' returned: 2015-05-05 13:54:08,014 DEBUG - [main:] ~ version: 2.5.0-cdh5.3.3 (VersionInfo:171) Hadoop 2.5.0-cdh5.3.3 Subversion http://github.com/cloudera/hadoop -r 82a65209d6e9e4a2b41fdbcd8190c7ea38730627 Compiled by jenkins on 2015-04-08T22:00Z Compiled with protoc 2.5.0 From source with checksum 1531e104cdad7489656f44875f3334b This command was run using /Users/sshivalingamurthy/installs/hadoop-2.5.0-cdh5.3.3/share/hadoop/common/hadoop-common-2.5.0-cdh5.3.3.jar {noformat} > hive command fails to get hadoop version > > > Key: HIVE-10610 > URL: https://issues.apache.org/jira/browse/HIVE-10610 > Project: Hive > Issue Type: Bug >Reporter: Shwetha G S > Attachments: HIVE-10610.patch > > > NO PRECOMMIT TESTS > If debug level logging is enabled, hive command fails with the following > exception: > {noformat} > apache-hive-1.2.0-SNAPSHOT-bin$ ./bin/hive > Unable to determine Hadoop version information from 13:54:07,683 > 'hadoop version' returned: > 2015-05-05 13:54:08,014 DEBUG - [main:] ~ version: 2.5.0-cdh5.3.3 > (VersionInfo:171) Hadoop 2.5.0-cdh5.3.3 Subversion > http://github.com/cloudera/hadoop -r 82a65209d6e9e4a2b41fdbcd8190c7ea38730627 > Compiled by jenkins on 2015-04-08T22:00Z Compiled with protoc 2.5.0 From > source with checksum 1531e104cdad7489656f44875f3334b This command was run > using > /Users/sshivalingamurthy/installs/hadoop-2.5.0-cdh5.3.3/share/hadoop/common/hadoop-common-2.5.0-cdh5.3.3.jar > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10591) Support limited integer type promotion in ORC
[ https://issues.apache.org/jira/browse/HIVE-10591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529200#comment-14529200 ] Hive QA commented on HIVE-10591: {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12730389/HIVE-10591.2.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3738/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3738/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3738/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Tests exited with: ExecutionException: java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: resource batch-exec.vm not found. {noformat} This message is automatically generated. ATTACHMENT ID: 12730389 - PreCommit-HIVE-TRUNK-Build > Support limited integer type promotion in ORC > - > > Key: HIVE-10591 > URL: https://issues.apache.org/jira/browse/HIVE-10591 > Project: Hive > Issue Type: New Feature >Affects Versions: 1.3.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-10591.1.patch, HIVE-10591.2.patch > > > ORC currently does not support schema-on-read. If we alter an ORC table with > 'int' type to 'bigint' and if we query the altered table ClassCastException > will be thrown as the schema on read from table descriptor will expect > LongWritable whereas ORC will return IntWritable based on file schema stored > within ORC file. OrcSerde currently doesn't do any type conversions or type > promotions for performance reasons in inner loop. Since smallints, ints and > bigints are stored in the same way in ORC, it will be possible be allow such > type promotions without hurting performance. Following type promotions can be > supported without any casting > smallint -> int > smallint -> bigint > int -> bigint > Tinyint promotion is not possible without casting as tinyints are stored > using RLE byte writer whereas smallints, ints and bigints are stored using > RLE integer writer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10607) Combination of ReducesinkDedup + TopN optimization yields incorrect result if there are multiple GBY in reducer
[ https://issues.apache.org/jira/browse/HIVE-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529191#comment-14529191 ] Sushanth Sowmyan commented on HIVE-10607: - HIVE-10607 is currently marked as an outage in https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status, so yes, this is approved. Please go ahead and commit. > Combination of ReducesinkDedup + TopN optimization yields incorrect result if > there are multiple GBY in reducer > --- > > Key: HIVE-10607 > URL: https://issues.apache.org/jira/browse/HIVE-10607 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer, Tez >Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.1.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Attachments: HIVE-10607.patch > > > {code:sql} > select ctinyint, count(cdouble) from (select ctinyint, cdouble from > alltypesorc group by ctinyint, cdouble) t1 group by ctinyint order by > ctinyint limit 20; > {code} > This gives different result set depending on which set of optimizations are > on. In particular in .q test environment following two invocations will give > you different result set: > {code} > * mvn test -Phadoop-2 -Dtest.output.overwrite=true > -Dtest=TestMiniTezCliDriver -Dqfile=test.q > -Dhive.optimize.reducededuplication.min.reducer=1 > -Dhive.limit.pushdown.memory.usage=0.3f > * mvn test -Phadoop-2 -Dtest.output.overwrite=true > -Dtest=TestMiniTezCliDriver -Dqfile=test.q > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10563) MiniTezCliDriver tests ordering issues
[ https://issues.apache.org/jira/browse/HIVE-10563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529181#comment-14529181 ] Thejas M Nair commented on HIVE-10563: -- Note that the sort directive will not work for queries that involve limit. There is at least one such query in this case. But for other queries, the sort directive would be better. > MiniTezCliDriver tests ordering issues > -- > > Key: HIVE-10563 > URL: https://issues.apache.org/jira/browse/HIVE-10563 > Project: Hive > Issue Type: Bug >Reporter: Hari Sankar Sivarama Subramaniyan >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-10563.1.patch > > > There are a bunch of tests related to TestMiniTezCliDriver which gives > ordering issues when run on Centos/Windows/OSX -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-10060) Provide more informative stage description in Spark Web UI [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang reassigned HIVE-10060: -- Assignee: Jimmy Xiang > Provide more informative stage description in Spark Web UI [Spark Branch] > - > > Key: HIVE-10060 > URL: https://issues.apache.org/jira/browse/HIVE-10060 > Project: Hive > Issue Type: Sub-task > Components: spark-branch >Affects Versions: 1.2.0 >Reporter: Chao Sun >Assignee: Jimmy Xiang >Priority: Minor > > Currently, for HoS in Spark Web UI, it displays stage information like the > following: > Description: > foreachAsync at RemoteHiveSparkClient.java:254 > org.apache.spark.api.java.JavaPairRDD.foreachAsync(JavaPairRDD.scala:45) > org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:254) > org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:366) > It would be better to provide more useful information, like what is the part > of the query that this stage is about. Looks like this is implemented in > SparkSQL. -- This message was sent by Atlassian JIRA (v6.3.4#6332)