date:20150505

[jira] [Assigned] (HIVE-10515) Create tests to cover existing (supported) Hive CLI functionality

2015-05-05 Thread Ferdinand Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu reassigned HIVE-10515:
---

Assignee: Ferdinand Xu

> Create tests to cover existing (supported) Hive CLI functionality
> -
>
> Key: HIVE-10515
> URL: https://issues.apache.org/jira/browse/HIVE-10515
> Project: Hive
>  Issue Type: Sub-task
>  Components: CLI
>Affects Versions: 0.10.0
>Reporter: Xuefu Zhang
>Assignee: Ferdinand Xu
>
> After removing HiveServer1, Hive CLI's functionality is reduced to its 
> original use case, a thick client application. Let's identify this so that we 
> maintain it when implementation is changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10623) Implement hive cli options using beeline functionality

2015-05-05 Thread Ferdinand Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-10623:

Attachment: HIVE-10623.patch

Hi [~xuefuz], could you help review this jira? Thank yoU!

> Implement hive cli options using beeline functionality
> --
>
> Key: HIVE-10623
> URL: https://issues.apache.org/jira/browse/HIVE-10623
> Project: Hive
>  Issue Type: Sub-task
>  Components: CLI
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-10623.patch
>
>
> We need to support the original hive cli options for the purpose of backwards 
> compatibility. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9644) CASE comparison operator rotation optimization

2015-05-05 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-9644:
---
Attachment: HIVE-9644.2.patch

Extended patch with folding of when udf. 

> CASE comparison operator rotation optimization
> --
>
> Key: HIVE-9644
> URL: https://issues.apache.org/jira/browse/HIVE-9644
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 0.14.0, 1.0.0, 1.1.0
>Reporter: Gopal V
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-9644.1.patch, HIVE-9644.2.patch, HIVE-9644.patch
>
>
> Constant folding for queries don't kick in for some automatically generated 
> query patterns which look like this.
> {code}
> hive> explain select count(1) from store_sales where (case ss_sold_date when 
> '1998-01-01' then 1 else null end)=1;
> {code}
> This should get rewritten by pushing the equality into the case branches.
> {code}
> select count(1) from store_sales where (case ss_sold_date when '1998-01-01' 
> then 1=1 else null=1 end);
> {code}
> Ending up with a simplified filter condition, resolving itself as 
> {code}
> select count(1) from store_sales where ss_sold_date= '1998-01-01' ;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8065) Support HDFS encryption functionality on Hive

2015-05-05 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529998#comment-14529998
 ] 

Brock Noland commented on HIVE-8065:


In that case the results of the query are staged in ez1. 

> Support HDFS encryption functionality on Hive
> -
>
> Key: HIVE-8065
> URL: https://issues.apache.org/jira/browse/HIVE-8065
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.13.1
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>  Labels: Hive-Scrum
>
> The new encryption support on HDFS makes Hive incompatible and unusable when 
> this feature is used.
> HDFS encryption is designed so that an user can configure different 
> encryption zones (or directories) for multi-tenant environments. An 
> encryption zone has an exclusive encryption key, such as AES-128 or AES-256. 
> Because of security compliance, the HDFS does not allow to move/rename files 
> between encryption zones. Renames are allowed only inside the same encryption 
> zone. A copy is allowed between encryption zones.
> See HDFS-6134 for more details about HDFS encryption design.
> Hive currently uses a scratch directory (like /tmp/$user/$random). This 
> scratch directory is used for the output of intermediate data (between MR 
> jobs) and for the final output of the hive query which is later moved to the 
> table directory location.
> If Hive tables are in different encryption zones than the scratch directory, 
> then Hive won't be able to renames those files/directories, and it will make 
> Hive unusable.
> To handle this problem, we can change the scratch directory of the 
> query/statement to be inside the same encryption zone of the table directory 
> location. This way, the renaming process will be successful. 
> Also, for statements that move files between encryption zones (i.e. LOAD 
> DATA), a copy may be executed instead of a rename. This will cause an 
> overhead when copying large data files, but it won't break the encryption on 
> Hive.
> Another security thing to consider is when using joins selects. If Hive joins 
> different tables with different encryption key strengths, then the results of 
> the select might break the security compliance of the tables. Let's say two 
> tables with 128 bits and 256 bits encryption are joined, then the temporary 
> results might be stored in the 128 bits encryption zone. This will conflict 
> with the table encrypted with 256 bits temporary.
> To fix this, Hive should be able to select the scratch directory that is more 
> secured/encrypted in order to save the intermediate data temporary with no 
> compliance issues.
> For instance:
> {noformat}
> SELECT * FROM table-aes128 t1 JOIN table-aes256 t2 WHERE t1.id == t2.id;
> {noformat}
> - This should use a scratch directory (or staging directory) inside the 
> table-aes256 table location.
> {noformat}
> INSERT OVERWRITE TABLE table-unencrypted SELECT * FROM table-aes1;
> {noformat}
> - This should use a scratch directory inside the table-aes1 location.
> {noformat}
> FROM table-unencrypted
> INSERT OVERWRITE TABLE table-aes128 SELECT id, name
> INSERT OVERWRITE TABLE table-aes256 SELECT id, name
> {noformat}
> - This should use a scratch directory on each of the tables locations.
> - The first SELECT will have its scratch directory on table-aes128 directory.
> - The second SELECT will have its scratch directory on table-aes256 directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10592) ORC file dump in JSON format

2015-05-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529991#comment-14529991
 ] 

Hive QA commented on HIVE-10592:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12730381/HIVE-10592.3.patch

{color:red}ERROR:{color} -1 due to 24 failed/errored test(s), 8901 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_parts
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_unencrypted_tbl
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_with_different_encryption_keys
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_load_data_to_encrypted_tables
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_select_read_only_encrypted_tbl
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_disallow_transform
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_droppartition
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_sba_drop_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_alterpart_loc
org.apache.hadoop.hive.ql.security.TestStorageBasedClientSideAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropDatabase
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropTable
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropView
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProviderWithACL.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbFailure
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbSuccess
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableFailure
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableSuccess
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessing
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessingCustomSetWhitelistAppend
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3747/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3747/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3747/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 24 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12730381 - PreCommit-HIVE-TRUNK-Build

> ORC file dump in JSON format
> 
>
> Key: HIVE-10592
> URL: https://issues.apache.org/jira/browse/HIVE-10592
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 1.3.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-10592.1.patch, HIVE-10592.2.patch, 
> HIVE-10592.3.patch, HIVE-10592.4.patch
>
>
> ORC file dump uses custom format. Will be useful to dump ORC metadata in json 
> format so that other tools can be built on top it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10609) Vectorization : Q64 fails with ClassCastException

2015-05-05 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529972#comment-14529972
 ] 

Matt McCline commented on HIVE-10609:
-

This doesn't fail on my combined build of HIVE-9743 and HIVE-10565.  Will 
verify again when those JIRAs go in.

> Vectorization : Q64 fails with ClassCastException
> -
>
> Key: HIVE-10609
> URL: https://issues.apache.org/jira/browse/HIVE-10609
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 1.2.0
>Reporter: Mostafa Mokhtar
>Assignee: Matt McCline
> Fix For: 1.2.0
>
>
> TPC-DS Q64 fails with ClassCastException.
> Query
> {code}
> select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number 
> ,cs1.b_streen_name ,cs1.b_city
>  ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city 
> ,cs1.c_zip ,cs1.syear ,cs1.cnt
>  ,cs1.s1 ,cs1.s2 ,cs1.s3
>  ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt
> from
> (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
> store_name
>  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
> ,ad1.ca_street_name as b_streen_name
>  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
> c_street_number
>  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
> as c_zip
>  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
> as cnt
>  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
> ,sum(ss_coupon_amt) as s3
>   FROM   store_sales
> JOIN store_returns ON store_sales.ss_item_sk = 
> store_returns.sr_item_sk and store_sales.ss_ticket_number = 
> store_returns.sr_ticket_number
> JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
> JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
> JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 
> JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk
> JOIN store ON store_sales.ss_store_sk = store.s_store_sk
> JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= 
> cd1.cd_demo_sk
> JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = 
> cd2.cd_demo_sk
> JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk
> JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = 
> hd1.hd_demo_sk
> JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = 
> hd2.hd_demo_sk
> JOIN customer_address ad1 ON store_sales.ss_addr_sk = 
> ad1.ca_address_sk
> JOIN customer_address ad2 ON customer.c_current_addr_sk = 
> ad2.ca_address_sk
> JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk
> JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk
> JOIN item ON store_sales.ss_item_sk = item.i_item_sk
> JOIN
>  (select cs_item_sk
> ,sum(cs_ext_list_price) as 
> sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund
>   from catalog_sales JOIN catalog_returns
>   ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk
> and catalog_sales.cs_order_number = catalog_returns.cr_order_number
>   group by cs_item_sk
>   having 
> sum(cs_ext_list_price)>2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit))
>  cs_ui
> ON store_sales.ss_item_sk = cs_ui.cs_item_sk
>   WHERE  
>  cd1.cd_marital_status <> cd2.cd_marital_status and
>  i_color in ('maroon','burnished','dim','steel','navajo','chocolate') 
> and
>  i_current_price between 35 and 35 + 10 and
>  i_current_price between 35 + 1 and 35 + 15
> group by i_product_name ,i_item_sk ,s_store_name ,s_zip ,ad1.ca_street_number
>,ad1.ca_street_name ,ad1.ca_city ,ad1.ca_zip ,ad2.ca_street_number
>,ad2.ca_street_name ,ad2.ca_city ,ad2.ca_zip ,d1.d_year ,d2.d_year 
> ,d3.d_year
> ) cs1
> JOIN
> (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
> store_name
>  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
> ,ad1.ca_street_name as b_streen_name
>  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
> c_street_number
>  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
> as c_zip
>  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
> as cnt
>  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
> ,sum(ss_coupon_amt) as s3
>   FROM   store_sales
> JOIN store_returns ON store_sales.ss_item_sk = 
> store_returns.sr_item_sk and store_sales.ss_ticket_number = 
> store_returns.sr_ticket_number
> JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
> JOIN

[jira] [Commented] (HIVE-10618) Fix invocation of toString on byteArray in VerifyFast (250, 254)

2015-05-05 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529958#comment-14529958
 ] 

Prasanth Jayachandran commented on HIVE-10618:
--

+1

> Fix invocation of toString on byteArray in VerifyFast (250, 254)
> 
>
> Key: HIVE-10618
> URL: https://issues.apache.org/jira/browse/HIVE-10618
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
>Priority: Minor
> Attachments: rb33877.patch
>
>
> Arrays.toString(byteArray) can be used to convert byte[] to string



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10620) ZooKeeperHiveLock overrides equal() method but not hashcode()

2015-05-05 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529931#comment-14529931
 ] 

Ashutosh Chauhan commented on HIVE-10620:
-

+1

> ZooKeeperHiveLock overrides equal() method but not hashcode()
> -
>
> Key: HIVE-10620
> URL: https://issues.apache.org/jira/browse/HIVE-10620
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-10620.patch
>
>
> ZooKeeperHiveLock overrides the public boolean equals(Object o) method but 
> does not for public int hashCode(). It violates the Java contract and may 
> cause unexpected results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10621) serde typeinfo equals methods are not symmetric

2015-05-05 Thread Alexander Pivovarov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-10621:
---
Attachment: rb33880.patch

patch #1

> serde typeinfo equals methods are not symmetric
> ---
>
> Key: HIVE-10621
> URL: https://issues.apache.org/jira/browse/HIVE-10621
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
>Priority: Minor
> Attachments: rb33880.patch
>
>
> correct equals method implementation should start with
> {code}
>   if (this == other) {
> return true;
>   }
>   if (other == null || getClass() != other.getClass()) {
> return false;
>   }
> {code}
> DecimalTypeInfo, PrimitiveTypeInfo, VarcharTypeInfo, CharTypeInfo, 
> HiveDecimalWritable equals method implementation starts with
> {code}
>   if (other == null || !(other instanceof )) {
> return false
>   }
> {code}
> - first of all check for null is redundant
> - the second issue is that "other instanceof " check is not 
> symmetric.
> contract of equals() implies that, a.equals(b) is true if and only if 
> b.equals(a) is true
> Current implementation violates this contract.
> e.g.
> DecimalTypeInfo instanceof PrimitiveTypeInfo is true
> but
> PrimitiveTypeInfo instanceof DecimalTypeInfo is false
> See more details here 
> http://stackoverflow.com/questions/6518534/equals-method-overrides-equals-in-superclass-and-may-not-be-symmetric



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10621) serde typeinfo equals methods are not symmetric

2015-05-05 Thread Alexander Pivovarov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-10621:
---
Description: 
correct equals method implementation should start with
{code}
  if (this == other) {
return true;
  }
  if (other == null || getClass() != other.getClass()) {
return false;
  }
{code}
DecimalTypeInfo, PrimitiveTypeInfo, VarcharTypeInfo, CharTypeInfo, 
HiveDecimalWritable equals method implementation starts with
{code}
  if (other == null || !(other instanceof )) {
return false
  }
{code}
- first of all check for null is redundant
- the second issue is that "other instanceof " check is not 
symmetric.

contract of equals() implies that, a.equals(b) is true if and only if 
b.equals(a) is true
Current implementation violates this contract.
e.g.
DecimalTypeInfo instanceof PrimitiveTypeInfo is true
but
PrimitiveTypeInfo instanceof DecimalTypeInfo is false

See more details here 
http://stackoverflow.com/questions/6518534/equals-method-overrides-equals-in-superclass-and-may-not-be-symmetric

> serde typeinfo equals methods are not symmetric
> ---
>
> Key: HIVE-10621
> URL: https://issues.apache.org/jira/browse/HIVE-10621
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
>Priority: Minor
>
> correct equals method implementation should start with
> {code}
>   if (this == other) {
> return true;
>   }
>   if (other == null || getClass() != other.getClass()) {
> return false;
>   }
> {code}
> DecimalTypeInfo, PrimitiveTypeInfo, VarcharTypeInfo, CharTypeInfo, 
> HiveDecimalWritable equals method implementation starts with
> {code}
>   if (other == null || !(other instanceof )) {
> return false
>   }
> {code}
> - first of all check for null is redundant
> - the second issue is that "other instanceof " check is not 
> symmetric.
> contract of equals() implies that, a.equals(b) is true if and only if 
> b.equals(a) is true
> Current implementation violates this contract.
> e.g.
> DecimalTypeInfo instanceof PrimitiveTypeInfo is true
> but
> PrimitiveTypeInfo instanceof DecimalTypeInfo is false
> See more details here 
> http://stackoverflow.com/questions/6518534/equals-method-overrides-equals-in-superclass-and-may-not-be-symmetric



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10592) ORC file dump in JSON format

2015-05-05 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529907#comment-14529907
 ] 

Prasanth Jayachandran commented on HIVE-10592:
--

Added multifile support in the new patch. The output will now look like
{code}./bin/hive --orcfiledump --json --pretty 
file:///app/warehouse/alltypes_bloom/00_0 
file:///app/warehouse/alltypes_orc/00_0{code}
{code}
{"orcFileDumps": [
  {
"fileName": "file:\/\/\/app\/warehouse\/alltypes_bloom\/00_0",
"fileVersion": "0.12",
"writerVersion": "HIVE_8732",
"numberOfRows": 3,
"compression": "ZLIB",
...
  },
 {
"fileName": "file:\/\/\/app\/warehouse\/alltypes_orc\/00_0",
"fileVersion": "0.12",
"writerVersion": "HIVE_8732",
"numberOfRows": 2,
"compression": "ZLIB",
...
  }
{code}

> ORC file dump in JSON format
> 
>
> Key: HIVE-10592
> URL: https://issues.apache.org/jira/browse/HIVE-10592
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 1.3.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-10592.1.patch, HIVE-10592.2.patch, 
> HIVE-10592.3.patch, HIVE-10592.4.patch
>
>
> ORC file dump uses custom format. Will be useful to dump ORC metadata in json 
> format so that other tools can be built on top it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10592) ORC file dump in JSON format

2015-05-05 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-10592:
-
Attachment: HIVE-10592.4.patch

> ORC file dump in JSON format
> 
>
> Key: HIVE-10592
> URL: https://issues.apache.org/jira/browse/HIVE-10592
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 1.3.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-10592.1.patch, HIVE-10592.2.patch, 
> HIVE-10592.3.patch, HIVE-10592.4.patch
>
>
> ORC file dump uses custom format. Will be useful to dump ORC metadata in json 
> format so that other tools can be built on top it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10563) MiniTezCliDriver tests ordering issues

2015-05-05 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-10563:
-
Attachment: HIVE-10563.2.patch

> MiniTezCliDriver tests ordering issues
> --
>
> Key: HIVE-10563
> URL: https://issues.apache.org/jira/browse/HIVE-10563
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-10563.1.patch, HIVE-10563.2.patch
>
>
> There are a bunch of tests related to TestMiniTezCliDriver which gives 
> ordering issues when run on Centos/Windows/OSX



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10607) Combination of ReducesinkDedup + TopN optimization yields incorrect result if there are multiple GBY in reducer

2015-05-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529903#comment-14529903
 ] 

Hive QA commented on HIVE-10607:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12730369/HIVE-10607.patch

{color:red}ERROR:{color} -1 due to 25 failed/errored test(s), 8900 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_parts
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_unencrypted_tbl
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_with_different_encryption_keys
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_load_data_to_encrypted_tables
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_select_read_only_encrypted_tbl
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_disallow_transform
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_droppartition
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_sba_drop_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_alterpart_loc
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_limit_pushdown
org.apache.hadoop.hive.ql.security.TestStorageBasedClientSideAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropDatabase
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropTable
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropView
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProviderWithACL.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbFailure
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbSuccess
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableFailure
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableSuccess
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessing
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessingCustomSetWhitelistAppend
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3746/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3746/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3746/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 25 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12730369 - PreCommit-HIVE-TRUNK-Build

> Combination of ReducesinkDedup + TopN optimization yields incorrect result if 
> there are multiple GBY in reducer
> ---
>
> Key: HIVE-10607
> URL: https://issues.apache.org/jira/browse/HIVE-10607
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer, Tez
>Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.1.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-10607.patch
>
>
> {code:sql}
> select ctinyint, count(cdouble) from (select ctinyint, cdouble from 
> alltypesorc group by ctinyint, cdouble) t1 group by ctinyint order by 
> ctinyint limit 20;
> {code}
> This gives different result set depending on which set of optimizations are 
> on. In particular in .q test environment following two invocations will give 
> you different result set:
> {code}
> *   mvn test -Phadoop-2 -Dtest.output.overwrite=true 
> -Dtest=TestMiniTezCliDriver -Dqfile=test.q 
> -Dhive.optimize.reducededuplication.min.reducer=1 
> -Dhive.limit.pushdown.memory.usage=0.3f
> *   mvn t

[jira] [Commented] (HIVE-8065) Support HDFS encryption functionality on Hive

2015-05-05 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529892#comment-14529892
 ] 

Eugene Koifman commented on HIVE-8065:
--

How come the move restriction is not an issue for something like  Insert 
Overwrite tableEZ1 select * from tableEZ2 inner join tableEZ3?

> Support HDFS encryption functionality on Hive
> -
>
> Key: HIVE-8065
> URL: https://issues.apache.org/jira/browse/HIVE-8065
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.13.1
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>  Labels: Hive-Scrum
>
> The new encryption support on HDFS makes Hive incompatible and unusable when 
> this feature is used.
> HDFS encryption is designed so that an user can configure different 
> encryption zones (or directories) for multi-tenant environments. An 
> encryption zone has an exclusive encryption key, such as AES-128 or AES-256. 
> Because of security compliance, the HDFS does not allow to move/rename files 
> between encryption zones. Renames are allowed only inside the same encryption 
> zone. A copy is allowed between encryption zones.
> See HDFS-6134 for more details about HDFS encryption design.
> Hive currently uses a scratch directory (like /tmp/$user/$random). This 
> scratch directory is used for the output of intermediate data (between MR 
> jobs) and for the final output of the hive query which is later moved to the 
> table directory location.
> If Hive tables are in different encryption zones than the scratch directory, 
> then Hive won't be able to renames those files/directories, and it will make 
> Hive unusable.
> To handle this problem, we can change the scratch directory of the 
> query/statement to be inside the same encryption zone of the table directory 
> location. This way, the renaming process will be successful. 
> Also, for statements that move files between encryption zones (i.e. LOAD 
> DATA), a copy may be executed instead of a rename. This will cause an 
> overhead when copying large data files, but it won't break the encryption on 
> Hive.
> Another security thing to consider is when using joins selects. If Hive joins 
> different tables with different encryption key strengths, then the results of 
> the select might break the security compliance of the tables. Let's say two 
> tables with 128 bits and 256 bits encryption are joined, then the temporary 
> results might be stored in the 128 bits encryption zone. This will conflict 
> with the table encrypted with 256 bits temporary.
> To fix this, Hive should be able to select the scratch directory that is more 
> secured/encrypted in order to save the intermediate data temporary with no 
> compliance issues.
> For instance:
> {noformat}
> SELECT * FROM table-aes128 t1 JOIN table-aes256 t2 WHERE t1.id == t2.id;
> {noformat}
> - This should use a scratch directory (or staging directory) inside the 
> table-aes256 table location.
> {noformat}
> INSERT OVERWRITE TABLE table-unencrypted SELECT * FROM table-aes1;
> {noformat}
> - This should use a scratch directory inside the table-aes1 location.
> {noformat}
> FROM table-unencrypted
> INSERT OVERWRITE TABLE table-aes128 SELECT id, name
> INSERT OVERWRITE TABLE table-aes256 SELECT id, name
> {noformat}
> - This should use a scratch directory on each of the tables locations.
> - The first SELECT will have its scratch directory on table-aes128 directory.
> - The second SELECT will have its scratch directory on table-aes256 directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10620) ZooKeeperHiveLock overrides equal() method but not hashcode()

2015-05-05 Thread Chaoyu Tang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-10620:
---
Attachment: HIVE-10620.patch

[~szehon] [~ashutoshc] could you review the code? Thanks

> ZooKeeperHiveLock overrides equal() method but not hashcode()
> -
>
> Key: HIVE-10620
> URL: https://issues.apache.org/jira/browse/HIVE-10620
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-10620.patch
>
>
> ZooKeeperHiveLock overrides the public boolean equals(Object o) method but 
> does not for public int hashCode(). It violates the Java contract and may 
> cause unexpected results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10620) ZooKeeperHiveLock overrides equal() method but not hashcode()

2015-05-05 Thread Chaoyu Tang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-10620:
---
Description: ZooKeeperHiveLock overrides the public boolean equals(Object 
o) method but does not for public int hashCode(). It violates the Java contract 
and may cause unexpected results.  (was: ZooKeeperHiveLock overrides the public 
boolean equals(Object o) method but does not for public int hashCode(). It 
violates the Java contract that equal and may cause unexpected results.)

> ZooKeeperHiveLock overrides equal() method but not hashcode()
> -
>
> Key: HIVE-10620
> URL: https://issues.apache.org/jira/browse/HIVE-10620
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
>
> ZooKeeperHiveLock overrides the public boolean equals(Object o) method but 
> does not for public int hashCode(). It violates the Java contract and may 
> cause unexpected results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10619) Fix ConcurrentHashMap.get in MetadataListStructObjectInspector.getInstance (52)

2015-05-05 Thread Alexander Pivovarov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-10619:
---
Attachment: rb33878.patch

patch #1

> Fix ConcurrentHashMap.get in MetadataListStructObjectInspector.getInstance 
> (52)
> ---
>
> Key: HIVE-10619
> URL: https://issues.apache.org/jira/browse/HIVE-10619
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
>Priority: Minor
> Attachments: rb33878.patch
>
>
> cached.get(columnNames) should be replaced with cached.get(key) in the code 
> block below
> {code}
>   cached = new ConcurrentHashMap>, 
> MetadataListStructObjectInspector>();
>   public static MetadataListStructObjectInspector getInstance(
>   List columnNames) {
> ArrayList> key = new ArrayList>(1);
> key.add(columnNames);
> MetadataListStructObjectInspector result = cached.get(columnNames);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10618) Fix invocation of toString on byteArray in VerifyFast (250, 254)

2015-05-05 Thread Alexander Pivovarov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-10618:
---
Attachment: rb33877.patch

patch #1

> Fix invocation of toString on byteArray in VerifyFast (250, 254)
> 
>
> Key: HIVE-10618
> URL: https://issues.apache.org/jira/browse/HIVE-10618
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
>Priority: Minor
> Attachments: rb33877.patch
>
>
> Arrays.toString(byteArray) can be used to convert byte[] to string



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10539) set default value of hive.repl.task.factory

2015-05-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529819#comment-14529819
 ] 

Hive QA commented on HIVE-10539:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12730349/HIVE-10539.3.patch

{color:red}ERROR:{color} -1 due to 24 failed/errored test(s), 8900 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_parts
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_unencrypted_tbl
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_with_different_encryption_keys
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_load_data_to_encrypted_tables
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_select_read_only_encrypted_tbl
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_disallow_transform
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_droppartition
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_sba_drop_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_alterpart_loc
org.apache.hadoop.hive.ql.security.TestStorageBasedClientSideAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropDatabase
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropTable
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropView
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProviderWithACL.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbFailure
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbSuccess
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableFailure
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableSuccess
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessing
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessingCustomSetWhitelistAppend
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3745/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3745/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3745/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 24 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12730349 - PreCommit-HIVE-TRUNK-Build

> set default value of hive.repl.task.factory
> ---
>
> Key: HIVE-10539
> URL: https://issues.apache.org/jira/browse/HIVE-10539
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-10539.1.patch, HIVE-10539.2.patch, 
> HIVE-10539.3.patch
>
>
> hive.repl.task.factory does not have a default value set. It should be set to 
> org.apache.hive.hcatalog.api.repl.exim.EximReplicationTaskFactory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10538) Fix NPE in FileSinkOperator from hashcode mismatch

2015-05-05 Thread Peter Slawski (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Slawski updated HIVE-10538:
-
Attachment: HIVE-10538.2.patch

I've attached the second revision of the patch which updates failed Spark 
qtests.

> Fix NPE in FileSinkOperator from hashcode mismatch
> --
>
> Key: HIVE-10538
> URL: https://issues.apache.org/jira/browse/HIVE-10538
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.0.0, 1.2.0
>Reporter: Peter Slawski
>Assignee: Peter Slawski
>Priority: Critical
> Fix For: 1.2.0, 1.3.0
>
> Attachments: HIVE-10538.1.patch, HIVE-10538.1.patch, 
> HIVE-10538.1.patch, HIVE-10538.2.patch
>
>
> A Null Pointer Exception occurs when in FileSinkOperator when using bucketed 
> tables and distribute by with multiFileSpray enabled. The following snippet 
> query reproduces this issue:
> {code}
> set hive.enforce.bucketing = true;
> set hive.exec.reducers.max = 20;
> create table bucket_a(key int, value_a string) clustered by (key) into 256 
> buckets;
> create table bucket_b(key int, value_b string) clustered by (key) into 256 
> buckets;
> create table bucket_ab(key int, value_a string, value_b string) clustered by 
> (key) into 256 buckets;
> -- Insert data into bucket_a and bucket_b
> insert overwrite table bucket_ab
> select a.key, a.value_a, b.value_b from bucket_a a join bucket_b b on (a.key 
> = b.key) distribute by key;
> {code}
> The following stack trace is logged.
> {code}
> 2015-04-29 12:54:12,841 FATAL [pool-110-thread-1]: ExecReducer 
> (ExecReducer.java:reduce(255)) - 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"113","_col1":"val_113"}}
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:819)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:747)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235)
>   ... 8 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9743) Incorrect result set for vectorized left outer join

2015-05-05 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529775#comment-14529775
 ] 

Matt McCline commented on HIVE-9743:


[~vikram.dixit] Ok, SMB removed.  I think this one is good to go as soon as the 
Apache tests pass.

> Incorrect result set for vectorized left outer join
> ---
>
> Key: HIVE-9743
> URL: https://issues.apache.org/jira/browse/HIVE-9743
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 0.14.0
>Reporter: N Campbell
>Assignee: Matt McCline
> Attachments: HIVE-9743.01.patch, HIVE-9743.02.patch, 
> HIVE-9743.03.patch, HIVE-9743.04.patch, HIVE-9743.05.patch, 
> HIVE-9743.06.patch, HIVE-9743.08.patch, HIVE-9743.09.patch
>
>
> This query is supposed to return 3 rows and will when run without Tez but 
> returns 2 rows when run with Tez.
> select tjoin1.rnum, tjoin1.c1, tjoin1.c2, tjoin2.c2 as c2j2 from tjoin1 left 
> outer join tjoin2 on ( tjoin1.c1 = tjoin2.c1 and tjoin1.c2 > 15 )
> tjoin1.rnum   tjoin1.c1   tjoin1.c2   c2j2
> 1 20  25  
> 2   50  
> instead of
> tjoin1.rnum   tjoin1.c1   tjoin1.c2   c2j2
> 0 10  15  
> 1 20  25  
> 2   50  
> create table  if not exists TJOIN1 (RNUM int , C1 int, C2 int)
>  STORED AS orc ;
> 0|10|15
> 1|20|25
> 2|\N|50
> create table  if not exists TJOIN2 (RNUM int , C1 int, C2 char(2))
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' 
>  STORED AS TEXTFILE ;
> 0|10|BB
> 1|15|DD
> 2|\N|EE
> 3|10|FF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9743) Incorrect result set for vectorized left outer join

2015-05-05 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-9743:
---
Attachment: HIVE-9743.09.patch

> Incorrect result set for vectorized left outer join
> ---
>
> Key: HIVE-9743
> URL: https://issues.apache.org/jira/browse/HIVE-9743
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 0.14.0
>Reporter: N Campbell
>Assignee: Matt McCline
> Attachments: HIVE-9743.01.patch, HIVE-9743.02.patch, 
> HIVE-9743.03.patch, HIVE-9743.04.patch, HIVE-9743.05.patch, 
> HIVE-9743.06.patch, HIVE-9743.08.patch, HIVE-9743.09.patch
>
>
> This query is supposed to return 3 rows and will when run without Tez but 
> returns 2 rows when run with Tez.
> select tjoin1.rnum, tjoin1.c1, tjoin1.c2, tjoin2.c2 as c2j2 from tjoin1 left 
> outer join tjoin2 on ( tjoin1.c1 = tjoin2.c1 and tjoin1.c2 > 15 )
> tjoin1.rnum   tjoin1.c1   tjoin1.c2   c2j2
> 1 20  25  
> 2   50  
> instead of
> tjoin1.rnum   tjoin1.c1   tjoin1.c2   c2j2
> 0 10  15  
> 1 20  25  
> 2   50  
> create table  if not exists TJOIN1 (RNUM int , C1 int, C2 int)
>  STORED AS orc ;
> 0|10|15
> 1|20|25
> 2|\N|50
> create table  if not exists TJOIN2 (RNUM int , C1 int, C2 char(2))
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' 
>  STORED AS TEXTFILE ;
> 0|10|BB
> 1|15|DD
> 2|\N|EE
> 3|10|FF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10565) LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT OUTER JOIN repeated key correctly

2015-05-05 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-10565:

Attachment: HIVE-10565.07.patch

> LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT 
> OUTER JOIN repeated key correctly
> 
>
> Key: HIVE-10565
> URL: https://issues.apache.org/jira/browse/HIVE-10565
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 1.2.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 1.2.0, 1.3.0
>
> Attachments: HIVE-10565.01.patch, HIVE-10565.02.patch, 
> HIVE-10565.03.patch, HIVE-10565.04.patch, HIVE-10565.05.patch, 
> HIVE-10565.06.patch, HIVE-10565.07.patch
>
>
> Filtering can knock out some of the rows for a repeated key, but those 
> knocked out rows need to be included in the LEFT OUTER JOIN result and are 
> currently not when only some rows are filtered out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10605) Make hive version number update automatically in webhcat-default.xml during hive tar generation

2015-05-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529749#comment-14529749
 ] 

Hive QA commented on HIVE-10605:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12730340/HIVE-10605.patch

{color:red}ERROR:{color} -1 due to 25 failed/errored test(s), 8900 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_parts
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_unencrypted_tbl
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_with_different_encryption_keys
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_load_data_to_encrypted_tables
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_select_read_only_encrypted_tbl
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_disallow_transform
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_droppartition
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_sba_drop_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_alterpart_loc
org.apache.hadoop.hive.ql.security.TestStorageBasedClientSideAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropDatabase
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropTable
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropView
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProviderWithACL.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbFailure
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbSuccess
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableFailure
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableSuccess
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessing
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessingCustomSetWhitelistAppend
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3744/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3744/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3744/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 25 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12730340 - PreCommit-HIVE-TRUNK-Build

> Make hive version number update automatically in webhcat-default.xml during 
> hive tar generation
> ---
>
> Key: HIVE-10605
> URL: https://issues.apache.org/jira/browse/HIVE-10605
> Project: Hive
>  Issue Type: Bug
>  Components: WebHCat
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Fix For: 1.3.0
>
> Attachments: HIVE-10605.patch
>
>
> so we don't have to do HIVE-10604 on each release



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10595) Dropping a table can cause NPEs in the compactor

2015-05-05 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529725#comment-14529725
 ] 

Eugene Koifman commented on HIVE-10595:
---

I'm not sure I understand how this works.
The Initiator (if the table/partition is no longer there) will not add anything 
to compaction queue.
So then there is nothing for Worker/Cleaner to do in this case.

How will data from TXNS, COMPLETED_TXN_COMPONENTS, TXN_COMPONENTS which relates 
to these table get cleaned up?

> Dropping a table can cause NPEs in the compactor
> 
>
> Key: HIVE-10595
> URL: https://issues.apache.org/jira/browse/HIVE-10595
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.14.0, 1.0.0, 1.1.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-10595.patch
>
>
> Reproduction:
> # start metastore with compactor off
> # insert enough entries in a table to trigger a compaction
> # drop the table
> # stop metastore
> # restart metastore with compactor on
> Result:  NPE in the compactor threads.  I suspect this would also happen if 
> the inserts and drops were done in between a run of the compactor, but I 
> haven't proven it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName

2015-05-05 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9392:
--
Attachment: HIVE-9392.4.patch

> JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to 
> column names having duplicated fqColumnName
> 
>
> Key: HIVE-9392
> URL: https://issues.apache.org/jira/browse/HIVE-9392
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Pengcheng Xiong
>Priority: Critical
> Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch, HIVE-9392.3.patch, 
> HIVE-9392.4.patch
>
>
> In JoinStatsRule.process the join column statistics are stored in HashMap  
> joinedColStats, the key used which is the ColStatistics.fqColName is 
> duplicated between join column in the same vertex, as a result distinctVals 
> ends up having duplicated values which negatively affects the join 
> cardinality estimation.
> The duplicate keys are usually named KEY.reducesinkkey0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9534) incorrect result set for query that projects a windowed aggregate

2015-05-05 Thread Chaoyu Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529716#comment-14529716
 ] 

Chaoyu Tang commented on HIVE-9534:
---

Oracle 11.2 treats avg(distinct tsint.csint) over () as analytic function 
instead of aggregation function, so the query return 4 rows of returns 2.5. 
Note, there is not order by clause or window clause inside the parenthesis of 
"over". Could you try query like "select avg(distinct tsint.csint) over (order 
by rnum rows between 1 preceding and 1 following) from tsint" to see if it 
works in Oracle c12? It did not work in 11.2.

> incorrect result set for query that projects a windowed aggregate
> -
>
> Key: HIVE-9534
> URL: https://issues.apache.org/jira/browse/HIVE-9534
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Reporter: N Campbell
>Assignee: Chaoyu Tang
>
> Result set returned by Hive has one row instead of 5
> {code}
> select avg(distinct tsint.csint) over () from tsint 
> create table  if not exists TSINT (RNUM int , CSINT smallint)
>  ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' 
>  STORED AS TEXTFILE;
> 0|\N
> 1|-1
> 2|0
> 3|1
> 4|10
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9743) Incorrect result set for vectorized left outer join

2015-05-05 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529711#comment-14529711
 ] 

Matt McCline commented on HIVE-9743:


Given lack of time, I think I'll pull the SMB changes since the regular map 
join case repro is very clear.

> Incorrect result set for vectorized left outer join
> ---
>
> Key: HIVE-9743
> URL: https://issues.apache.org/jira/browse/HIVE-9743
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 0.14.0
>Reporter: N Campbell
>Assignee: Matt McCline
> Attachments: HIVE-9743.01.patch, HIVE-9743.02.patch, 
> HIVE-9743.03.patch, HIVE-9743.04.patch, HIVE-9743.05.patch, 
> HIVE-9743.06.patch, HIVE-9743.08.patch
>
>
> This query is supposed to return 3 rows and will when run without Tez but 
> returns 2 rows when run with Tez.
> select tjoin1.rnum, tjoin1.c1, tjoin1.c2, tjoin2.c2 as c2j2 from tjoin1 left 
> outer join tjoin2 on ( tjoin1.c1 = tjoin2.c1 and tjoin1.c2 > 15 )
> tjoin1.rnum   tjoin1.c1   tjoin1.c2   c2j2
> 1 20  25  
> 2   50  
> instead of
> tjoin1.rnum   tjoin1.c1   tjoin1.c2   c2j2
> 0 10  15  
> 1 20  25  
> 2   50  
> create table  if not exists TJOIN1 (RNUM int , C1 int, C2 int)
>  STORED AS orc ;
> 0|10|15
> 1|20|25
> 2|\N|50
> create table  if not exists TJOIN2 (RNUM int , C1 int, C2 char(2))
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' 
>  STORED AS TEXTFILE ;
> 0|10|BB
> 1|15|DD
> 2|\N|EE
> 3|10|FF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10538) Fix NPE in FileSinkOperator from hashcode mismatch

2015-05-05 Thread Peter Slawski (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529707#comment-14529707
 ] 

Peter Slawski commented on HIVE-10538:
--

Great, I've been working on just that. I'll be able to posted an updated patch 
tomorrow.

> Fix NPE in FileSinkOperator from hashcode mismatch
> --
>
> Key: HIVE-10538
> URL: https://issues.apache.org/jira/browse/HIVE-10538
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.0.0, 1.2.0
>Reporter: Peter Slawski
>Assignee: Peter Slawski
>Priority: Critical
> Fix For: 1.2.0, 1.3.0
>
> Attachments: HIVE-10538.1.patch, HIVE-10538.1.patch, 
> HIVE-10538.1.patch
>
>
> A Null Pointer Exception occurs when in FileSinkOperator when using bucketed 
> tables and distribute by with multiFileSpray enabled. The following snippet 
> query reproduces this issue:
> {code}
> set hive.enforce.bucketing = true;
> set hive.exec.reducers.max = 20;
> create table bucket_a(key int, value_a string) clustered by (key) into 256 
> buckets;
> create table bucket_b(key int, value_b string) clustered by (key) into 256 
> buckets;
> create table bucket_ab(key int, value_a string, value_b string) clustered by 
> (key) into 256 buckets;
> -- Insert data into bucket_a and bucket_b
> insert overwrite table bucket_ab
> select a.key, a.value_a, b.value_b from bucket_a a join bucket_b b on (a.key 
> = b.key) distribute by key;
> {code}
> The following stack trace is logged.
> {code}
> 2015-04-29 12:54:12,841 FATAL [pool-110-thread-1]: ExecReducer 
> (ExecReducer.java:reduce(255)) - 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"113","_col1":"val_113"}}
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:819)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:747)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235)
>   ... 8 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10591) Support limited integer type promotion in ORC

2015-05-05 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-10591:
-
Attachment: HIVE-10591.3.patch

Added fix for acid test failures.

> Support limited integer type promotion in ORC
> -
>
> Key: HIVE-10591
> URL: https://issues.apache.org/jira/browse/HIVE-10591
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 1.3.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-10591.1.patch, HIVE-10591.2.patch, 
> HIVE-10591.2.patch, HIVE-10591.3.patch
>
>
> ORC currently does not support schema-on-read. If we alter an ORC table with 
> 'int' type to 'bigint' and if we query the altered table ClassCastException 
> will be thrown as the schema on read from table descriptor will expect 
> LongWritable whereas ORC will return IntWritable based on file schema stored 
> within ORC file. OrcSerde currently doesn't do any type conversions or type 
> promotions for performance reasons in inner loop. Since smallints, ints and 
> bigints are stored in the same way in ORC, it will be possible be allow such 
> type promotions without hurting performance. Following type promotions can be 
> supported without any casting
> smallint -> int
> smallint -> bigint
> int -> bigint
> Tinyint promotion is not possible without casting as tinyints are stored 
> using RLE byte writer whereas smallints, ints and bigints are stored using 
> RLE integer writer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10614) schemaTool upgrade from 0.14.0 to 1.3.0 causes failure

2015-05-05 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-10614:
-
Attachment: HIVE-10614.1.master.patch

[~thejas] Thanks for the review, added HIVE-10614.1.master.patch for the master 
branch


> schemaTool upgrade from 0.14.0 to 1.3.0 causes failure
> --
>
> Key: HIVE-10614
> URL: https://issues.apache.org/jira/browse/HIVE-10614
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
>Priority: Critical
> Attachments: HIVE-10614.1.master.patch, HIVE-10614.1.patch
>
>
> ./schematool -dbType mysql -upgradeSchemaFrom 0.14.0 -verbose
> {code}
> ++--+
> | 
>|
> ++--+
> | < HIVE-7018 Remove Table and Partition tables column LINK_TARGET_ID from 
> Mysql for other DBs do not have it >  |
> ++--+
> 1 row selected (0.004 seconds)
> 0: jdbc:mysql://node-1.example.com/hive> DROP PROCEDURE IF EXISTS 
> RM_TLBS_LINKID
> No rows affected (0.005 seconds)
> 0: jdbc:mysql://node-1.example.com/hive> DROP PROCEDURE IF EXISTS 
> RM_PARTITIONS_LINKID
> No rows affected (0.006 seconds)
> 0: jdbc:mysql://node-1.example.com/hive> DROP PROCEDURE IF EXISTS RM_LINKID
> No rows affected (0.002 seconds)
> 0: jdbc:mysql://node-1.example.com/hive> CREATE PROCEDURE RM_TLBS_LINKID() 
> BEGIN IF EXISTS (SELECT * FROM `INFORMATION_SCHEMA`.`COLUMNS` WHERE 
> `TABLE_NAME` = 'TBLS' AND `COLUMN_NAME` = 'LINK_TARGET_ID') THEN ALTER TABLE 
> `TBLS` DROP FOREIGN KEY `TBLS_FK3` ; ALTER TABLE `TBLS` DROP KEY `TBLS_N51` ; 
> ALTER TABLE `TBLS` DROP COLUMN `LINK_TARGET_ID` ; END IF; END
> Error: You have an error in your SQL syntax; check the manual that 
> corresponds to your MySQL server version for the right syntax to use near '' 
> at line 1 (state=42000,code=1064)
> Closing: 0: jdbc:mysql://node-1.example.com/hive?createDatabaseIfNotExist=true
> org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore 
> state would be inconsistent !!
> org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore 
> state would be inconsistent !!
>   at 
> org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:229)
>   at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:468)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: java.io.IOException: Schema script failed, errorcode 2
>   at 
> org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:355)
>   at 
> org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:326)
>   at 
> org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:224)
> {code}
> Looks like HIVE-7018 has introduced stored procedure as part of mysql upgrade 
> script and it is causing issues with schematool upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10538) Fix NPE in FileSinkOperator from hashcode mismatch

2015-05-05 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529690#comment-14529690
 ] 

Prasanth Jayachandran commented on HIVE-10538:
--

The result difference seems to be an expected change because of hashcode 
difference. [~petersla] Can you put an updated patch by running the tests again 
with "-Dtest.output.overwrite=true" option? This will overwrite the q.out files.

> Fix NPE in FileSinkOperator from hashcode mismatch
> --
>
> Key: HIVE-10538
> URL: https://issues.apache.org/jira/browse/HIVE-10538
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.0.0, 1.2.0
>Reporter: Peter Slawski
>Assignee: Peter Slawski
>Priority: Critical
> Fix For: 1.2.0, 1.3.0
>
> Attachments: HIVE-10538.1.patch, HIVE-10538.1.patch, 
> HIVE-10538.1.patch
>
>
> A Null Pointer Exception occurs when in FileSinkOperator when using bucketed 
> tables and distribute by with multiFileSpray enabled. The following snippet 
> query reproduces this issue:
> {code}
> set hive.enforce.bucketing = true;
> set hive.exec.reducers.max = 20;
> create table bucket_a(key int, value_a string) clustered by (key) into 256 
> buckets;
> create table bucket_b(key int, value_b string) clustered by (key) into 256 
> buckets;
> create table bucket_ab(key int, value_a string, value_b string) clustered by 
> (key) into 256 buckets;
> -- Insert data into bucket_a and bucket_b
> insert overwrite table bucket_ab
> select a.key, a.value_a, b.value_b from bucket_a a join bucket_b b on (a.key 
> = b.key) distribute by key;
> {code}
> The following stack trace is logged.
> {code}
> 2015-04-29 12:54:12,841 FATAL [pool-110-thread-1]: ExecReducer 
> (ExecReducer.java:reduce(255)) - 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"113","_col1":"val_113"}}
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:819)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:747)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235)
>   ... 8 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10617) LLAP: allocator occasionally has a spurious failure to allocate due to "partitioned" locking and has to retry

2015-05-05 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10617:

Summary: LLAP: allocator occasionally has a spurious failure to allocate 
due to "partitioned" locking and has to retry  (was: LLAP: fix allocator 
concurrency rarely causing spurious failure to allocate due to "partitioned" 
locking)

> LLAP: allocator occasionally has a spurious failure to allocate due to 
> "partitioned" locking and has to retry
> -
>
> Key: HIVE-10617
> URL: https://issues.apache.org/jira/browse/HIVE-10617
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> See HIVE-10482 and the comment in code. Right now this is worked around by 
> retrying.
> Simple case - thread can reserve memory from manager and bounce between 
> checking arena 1 and arena 2 for memory as other threads allocate and 
> deallocate from respective arenas in reverse order, making it look like 
> there's no memory. More importantly this can happen when buddy blocks are 
> split when lots of stuff is allocated.
> This can be solved either with some form of helping (esp. for split case) or 
> by making allocator an "actor" (or set of actors, one per 1-N arenas that 
> they would own), to satisfy alloc requests more deterministically (and also 
> get rid of most sync).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10614) schemaTool upgrade from 0.14.0 to 1.3.0 causes failure

2015-05-05 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529684#comment-14529684
 ] 

Thejas M Nair commented on HIVE-10614:
--

+1 for current patch, it would work with 1.2 branch. We need another one for 
master (that also has similar change for hive-schema-1.3.0.mysql.sql)


> schemaTool upgrade from 0.14.0 to 1.3.0 causes failure
> --
>
> Key: HIVE-10614
> URL: https://issues.apache.org/jira/browse/HIVE-10614
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
>Priority: Critical
> Attachments: HIVE-10614.1.patch
>
>
> ./schematool -dbType mysql -upgradeSchemaFrom 0.14.0 -verbose
> {code}
> ++--+
> | 
>|
> ++--+
> | < HIVE-7018 Remove Table and Partition tables column LINK_TARGET_ID from 
> Mysql for other DBs do not have it >  |
> ++--+
> 1 row selected (0.004 seconds)
> 0: jdbc:mysql://node-1.example.com/hive> DROP PROCEDURE IF EXISTS 
> RM_TLBS_LINKID
> No rows affected (0.005 seconds)
> 0: jdbc:mysql://node-1.example.com/hive> DROP PROCEDURE IF EXISTS 
> RM_PARTITIONS_LINKID
> No rows affected (0.006 seconds)
> 0: jdbc:mysql://node-1.example.com/hive> DROP PROCEDURE IF EXISTS RM_LINKID
> No rows affected (0.002 seconds)
> 0: jdbc:mysql://node-1.example.com/hive> CREATE PROCEDURE RM_TLBS_LINKID() 
> BEGIN IF EXISTS (SELECT * FROM `INFORMATION_SCHEMA`.`COLUMNS` WHERE 
> `TABLE_NAME` = 'TBLS' AND `COLUMN_NAME` = 'LINK_TARGET_ID') THEN ALTER TABLE 
> `TBLS` DROP FOREIGN KEY `TBLS_FK3` ; ALTER TABLE `TBLS` DROP KEY `TBLS_N51` ; 
> ALTER TABLE `TBLS` DROP COLUMN `LINK_TARGET_ID` ; END IF; END
> Error: You have an error in your SQL syntax; check the manual that 
> corresponds to your MySQL server version for the right syntax to use near '' 
> at line 1 (state=42000,code=1064)
> Closing: 0: jdbc:mysql://node-1.example.com/hive?createDatabaseIfNotExist=true
> org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore 
> state would be inconsistent !!
> org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore 
> state would be inconsistent !!
>   at 
> org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:229)
>   at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:468)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: java.io.IOException: Schema script failed, errorcode 2
>   at 
> org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:355)
>   at 
> org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:326)
>   at 
> org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:224)
> {code}
> Looks like HIVE-7018 has introduced stored procedure as part of mysql upgrade 
> script and it is causing issues with schematool upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7018) Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others

2015-05-05 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529676#comment-14529676
 ] 

Thejas M Nair commented on HIVE-7018:
-

I think the change here was in the right direction, however it breaks the 
preferred way to upgrade hive (using schematool). This is a release blocker for 
1.2.0. .
A patch to revert the changes here has been uploaded to  HIVE-10614 . I think 
we should go ahead with that, and reopen this jira after it is committed. Once 
the schematool/beeline breakage is fixed, this change can go back into hive. 



> Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but 
> not others
> -
>
> Key: HIVE-7018
> URL: https://issues.apache.org/jira/browse/HIVE-7018
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Yongzhi Chen
> Fix For: 1.2.0
>
> Attachments: HIVE-7018.1.patch, HIVE-7018.2.patch
>
>
> It appears that at least postgres and oracle do not have the LINK_TARGET_ID 
> column while mysql does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10538) Fix NPE in FileSinkOperator from hashcode mismatch

2015-05-05 Thread Peter Slawski (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529673#comment-14529673
 ] 

Peter Slawski commented on HIVE-10538:
--

The Spark driver failures are caused by this change. This would be expected if 
a row's hashcode affected its ordering in Spark. This patch makes it so that 
HiveKey's hashcode outputted from ReduceSinkOperator is no longer always 
multiplied by 31 (as explained previously).

Also, for at least those failed qtests, the row ordering/output in the expected 
output differs across MapRed, Tez, and Spark. So, execution engine affects 
ordering.

>From 
>[spark/groupby_complex_types_multi_single_reducer.q.out#L221|https://github.com/apache/hive/blob/master/ql/src/test/results/clientpositive/spark/groupby_complex_types_multi_single_reducer.q.out#L221]
{code}
POSTHOOK: query: SELECT DEST2.* FROM DEST2
POSTHOOK: type: QUERY
POSTHOOK: Input: default@dest2
 A masked pattern was here 
{"120":"val_120"}   2
{"129":"val_129"}   2
{"160":"val_160"}   1
{"26":"val_26"} 2
{"27":"val_27"} 1
{"288":"val_288"}   2
{"298":"val_298"}   3
{"30":"val_30"} 1
{"311":"val_311"}   3
{"74":"val_74"} 1
{code}
>From 
>[groupby_complex_types_multi_single_reducer.q.out#L240|https://github.com/apache/hive/blob/master/ql/src/test/results/clientpositive/groupby_complex_types_multi_single_reducer.q.out#L240]
{code}
POSTHOOK: query: SELECT DEST2.* FROM DEST2
POSTHOOK: type: QUERY
POSTHOOK: Input: default@dest2
 A masked pattern was here 
{"0":"val_0"}   3
{"10":"val_10"} 1
{"100":"val_100"}   2
{"103":"val_103"}   2
{"104":"val_104"}   2
{"105":"val_105"}   1
{"11":"val_11"} 1
{"111":"val_111"}   1
{"113":"val_113"}   2
{"114":"val_114"}   1
{code}

> Fix NPE in FileSinkOperator from hashcode mismatch
> --
>
> Key: HIVE-10538
> URL: https://issues.apache.org/jira/browse/HIVE-10538
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.0.0, 1.2.0
>Reporter: Peter Slawski
>Assignee: Peter Slawski
>Priority: Critical
> Fix For: 1.2.0, 1.3.0
>
> Attachments: HIVE-10538.1.patch, HIVE-10538.1.patch, 
> HIVE-10538.1.patch
>
>
> A Null Pointer Exception occurs when in FileSinkOperator when using bucketed 
> tables and distribute by with multiFileSpray enabled. The following snippet 
> query reproduces this issue:
> {code}
> set hive.enforce.bucketing = true;
> set hive.exec.reducers.max = 20;
> create table bucket_a(key int, value_a string) clustered by (key) into 256 
> buckets;
> create table bucket_b(key int, value_b string) clustered by (key) into 256 
> buckets;
> create table bucket_ab(key int, value_a string, value_b string) clustered by 
> (key) into 256 buckets;
> -- Insert data into bucket_a and bucket_b
> insert overwrite table bucket_ab
> select a.key, a.value_a, b.value_b from bucket_a a join bucket_b b on (a.key 
> = b.key) distribute by key;
> {code}
> The following stack trace is logged.
> {code}
> 2015-04-29 12:54:12,841 FATAL [pool-110-thread-1]: ExecReducer 
> (ExecReducer.java:reduce(255)) - 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"113","_col1":"val_113"}}
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:819)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:747)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235)
>   ... 8 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10617) LLAP: fix allocator concurrency rarely causing spurious failure to allocate due to "partitioned" locking

2015-05-05 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529664#comment-14529664
 ] 

Sergey Shelukhin commented on HIVE-10617:
-

Will do this later. With 6 executors x 8 nodes I see an average of 0.5 
allocation retries (not task retries) per 1000 tasks in a query reading entire 
lineitem from TPCH 1Tb scale. So it's annoying to have these retries, but not 
super important.


> LLAP: fix allocator concurrency rarely causing spurious failure to allocate 
> due to "partitioned" locking
> 
>
> Key: HIVE-10617
> URL: https://issues.apache.org/jira/browse/HIVE-10617
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> See HIVE-10482 and the comment in code. Right now this is worked around by 
> retrying.
> Simple case - thread can reserve memory from manager and bounce between 
> checking arena 1 and arena 2 for memory as other threads allocate and 
> deallocate from respective arenas in reverse order, making it look like 
> there's no memory. More importantly this can happen when buddy blocks are 
> split when lots of stuff is allocated.
> This can be solved either with some form of helping (esp. for split case) or 
> by making allocator an "actor" (or set of actors, one per 1-N arenas that 
> they would own), to satisfy alloc requests more deterministically (and also 
> get rid of most sync).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-10617) LLAP: fix allocator concurrency rarely causing spurious failure to allocate due to "partitioned" locking

2015-05-05 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-10617:
---

Assignee: Sergey Shelukhin

> LLAP: fix allocator concurrency rarely causing spurious failure to allocate 
> due to "partitioned" locking
> 
>
> Key: HIVE-10617
> URL: https://issues.apache.org/jira/browse/HIVE-10617
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> See HIVE-10482 and the comment in code. Right now this is worked around by 
> retrying.
> Simple case - thread can reserve memory from manager and bounce between 
> checking arena 1 and arena 2 for memory as other threads allocate and 
> deallocate from respective arenas in reverse order, making it look like 
> there's no memory. More importantly this can happen when buddy blocks are 
> split when lots of stuff is allocated.
> This can be solved either with some form of helping (esp. for split case) or 
> by making allocator an "actor" (or set of actors, one per 1-N arenas that 
> they would own), to satisfy alloc requests more deterministically (and also 
> get rid of most sync).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10617) LLAP: fix allocator concurrency rarely causing spurious failure to allocate due to "partitioned" locking

2015-05-05 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10617:

Description: 
See HIVE-10482 and the comment in code. Right now this is worked around by 
retrying.
Simple case - thread can reserve memory from manager and bounce between 
checking arena 1 and arena 2 for memory as other threads allocate and 
deallocate from respective arenas in reverse order, making it look like there's 
no memory. More importantly this can happen when buddy blocks are split when 
lots of stuff is allocated.

This can be solved either with some form of helping (esp. for split case) or by 
making allocator an "actor" (or set of actors, one per 1-N arenas that they 
would own), to satisfy alloc requests more deterministically (and also get rid 
of most sync).

  was:
See HIVE-10482 and the comment in code.
Simple case - thread can reserve memory from manager and bounce between 
checking arena 1 and arena 2 for memory as other threads allocate and 
deallocate from respective arenas in reverse order, making it look like there's 
no memory. More importantly this can happen when buddy blocks are split when 
lots of stuff is allocated.

This can be solved either with some form of helping (esp. for split case) or by 
making allocator an "actor" (or set of actors, one per 1-N arenas that they 
would own), to satisfy alloc requests more deterministically (and also get rid 
of most sync).


> LLAP: fix allocator concurrency rarely causing spurious failure to allocate 
> due to "partitioned" locking
> 
>
> Key: HIVE-10617
> URL: https://issues.apache.org/jira/browse/HIVE-10617
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>
> See HIVE-10482 and the comment in code. Right now this is worked around by 
> retrying.
> Simple case - thread can reserve memory from manager and bounce between 
> checking arena 1 and arena 2 for memory as other threads allocate and 
> deallocate from respective arenas in reverse order, making it look like 
> there's no memory. More importantly this can happen when buddy blocks are 
> split when lots of stuff is allocated.
> This can be solved either with some form of helping (esp. for split case) or 
> by making allocator an "actor" (or set of actors, one per 1-N arenas that 
> they would own), to satisfy alloc requests more deterministically (and also 
> get rid of most sync).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10617) LLAP: fix allocator concurrency rarely causing spurious failure to allocate due to "partitioned" locking

2015-05-05 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10617:

Summary: LLAP: fix allocator concurrency rarely causing spurious failure to 
allocate due to "partitioned" locking  (was: fix allocator concurrency rarely 
causing spurious failure to allocate due to "partitioned" locking)

> LLAP: fix allocator concurrency rarely causing spurious failure to allocate 
> due to "partitioned" locking
> 
>
> Key: HIVE-10617
> URL: https://issues.apache.org/jira/browse/HIVE-10617
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>
> See HIVE-10482 and the comment in code.
> Simple case - thread can reserve memory from manager and bounce between 
> checking arena 1 and arena 2 for memory as other threads allocate and 
> deallocate from respective arenas in reverse order, making it look like 
> there's no memory. More importantly this can happen when buddy blocks are 
> split when lots of stuff is allocated.
> This can be solved either with some form of helping (esp. for split case) or 
> by making allocator an "actor" (or set of actors, one per 1-N arenas that 
> they would own), to satisfy alloc requests more deterministically (and also 
> get rid of most sync).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10614) schemaTool upgrade from 0.14.0 to 1.3.0 causes failure

2015-05-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529657#comment-14529657
 ] 

Hive QA commented on HIVE-10614:




{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12730668/HIVE-10614.1.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-METASTORE-Test/43/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-METASTORE-Test/43/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-METASTORE-Test-43/

This message is automatically generated.

ATTACHMENT ID: 12730668 - PreCommit-HIVE-METASTORE-Test

> schemaTool upgrade from 0.14.0 to 1.3.0 causes failure
> --
>
> Key: HIVE-10614
> URL: https://issues.apache.org/jira/browse/HIVE-10614
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
>Priority: Critical
> Attachments: HIVE-10614.1.patch
>
>
> ./schematool -dbType mysql -upgradeSchemaFrom 0.14.0 -verbose
> {code}
> ++--+
> | 
>|
> ++--+
> | < HIVE-7018 Remove Table and Partition tables column LINK_TARGET_ID from 
> Mysql for other DBs do not have it >  |
> ++--+
> 1 row selected (0.004 seconds)
> 0: jdbc:mysql://node-1.example.com/hive> DROP PROCEDURE IF EXISTS 
> RM_TLBS_LINKID
> No rows affected (0.005 seconds)
> 0: jdbc:mysql://node-1.example.com/hive> DROP PROCEDURE IF EXISTS 
> RM_PARTITIONS_LINKID
> No rows affected (0.006 seconds)
> 0: jdbc:mysql://node-1.example.com/hive> DROP PROCEDURE IF EXISTS RM_LINKID
> No rows affected (0.002 seconds)
> 0: jdbc:mysql://node-1.example.com/hive> CREATE PROCEDURE RM_TLBS_LINKID() 
> BEGIN IF EXISTS (SELECT * FROM `INFORMATION_SCHEMA`.`COLUMNS` WHERE 
> `TABLE_NAME` = 'TBLS' AND `COLUMN_NAME` = 'LINK_TARGET_ID') THEN ALTER TABLE 
> `TBLS` DROP FOREIGN KEY `TBLS_FK3` ; ALTER TABLE `TBLS` DROP KEY `TBLS_N51` ; 
> ALTER TABLE `TBLS` DROP COLUMN `LINK_TARGET_ID` ; END IF; END
> Error: You have an error in your SQL syntax; check the manual that 
> corresponds to your MySQL server version for the right syntax to use near '' 
> at line 1 (state=42000,code=1064)
> Closing: 0: jdbc:mysql://node-1.example.com/hive?createDatabaseIfNotExist=true
> org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore 
> state would be inconsistent !!
> org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore 
> state would be inconsistent !!
>   at 
> org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:229)
>   at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:468)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: java.io.IOException: Schema script failed, errorcode 2
>   at 
> org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:355)
>   at 
> org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:326)
>   at 
> org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:224)
> {code}
> Looks like HIVE-7018 has introduced stored procedure as part of mysql upgrade 
> script and it is causing issues with schematool upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-10482) LLAP: AsertionError cannot allocate when reading from orc

2015-05-05 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-10482.
-
Resolution: Fixed

committed a workaround

> LLAP: AsertionError cannot allocate when reading from orc
> -
>
> Key: HIVE-10482
> URL: https://issues.apache.org/jira/browse/HIVE-10482
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Sergey Shelukhin
> Fix For: llap
>
>
> This was from a run of tpch query 1. [~sershe] - not sure if you've already 
> seen this. Creating a jira so that it doesn't get lost.
> {code}
> 2015-04-24 13:11:54,180 
> [TezTaskRunner_attempt_1429683757595_0326_4_00_000199_0(container_1_0326_01_003216_sseth_20150424131137_8ec6200c-77c8-43ea-a6a3-a0ab1da6e1ac:4_Map
>  1_199_0)] ERROR org.apache.hadoop.hive.ql.exec.tez.TezProcessor: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.io.IOException: java.lang.AssertionError: Cannot allocate
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:74)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:314)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:329)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: java.io.IOException: 
> java.lang.AssertionError: Cannot allocate
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355)
> at 
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
> at 
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:137)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:62)
> ... 16 more
> Caused by: java.io.IOException: java.lang.AssertionError: Cannot allocate
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.rethrowErrorIfAny(LlapInputFormat.java:257)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.nextCvb(LlapInputFormat.java:209)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:147)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:97)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 22 more
> Caused by: java.lang.AssertionError: Cannot allocate
> at 
> org.apache.hadoop.hive.ql.io.orc.InStream.readEncodedStream(InStream.java:761)
> at 
> org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:441)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataRea

[jira] [Updated] (HIVE-10614) schemaTool upgrade from 0.14.0 to 1.3.0 causes failure

2015-05-05 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-10614:
-
Attachment: HIVE-10614.1.patch

This happens because schematool runs via beeline and when there is a ";" in the 
command, beeline interprets it as the command terminator. Stored procedures use 
";" as delimiter between statements, thus the entire stored procedure does not 
get send to mysql as a single command  and hence the above error. I am 
uploading a patch to back out the fix for HIVE-7018 for now. Once we have the 
fix for HIVE-7018 working with schematool, we can add them back. The task 
mentioned in the previous line can be done via a follow up jira. cc-ing 
[~sushanth], [~thejas] for reviewing the change.

Thanks
Hari

> schemaTool upgrade from 0.14.0 to 1.3.0 causes failure
> --
>
> Key: HIVE-10614
> URL: https://issues.apache.org/jira/browse/HIVE-10614
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
>Priority: Critical
> Attachments: HIVE-10614.1.patch
>
>
> ./schematool -dbType mysql -upgradeSchemaFrom 0.14.0 -verbose
> {code}
> ++--+
> | 
>|
> ++--+
> | < HIVE-7018 Remove Table and Partition tables column LINK_TARGET_ID from 
> Mysql for other DBs do not have it >  |
> ++--+
> 1 row selected (0.004 seconds)
> 0: jdbc:mysql://node-1.example.com/hive> DROP PROCEDURE IF EXISTS 
> RM_TLBS_LINKID
> No rows affected (0.005 seconds)
> 0: jdbc:mysql://node-1.example.com/hive> DROP PROCEDURE IF EXISTS 
> RM_PARTITIONS_LINKID
> No rows affected (0.006 seconds)
> 0: jdbc:mysql://node-1.example.com/hive> DROP PROCEDURE IF EXISTS RM_LINKID
> No rows affected (0.002 seconds)
> 0: jdbc:mysql://node-1.example.com/hive> CREATE PROCEDURE RM_TLBS_LINKID() 
> BEGIN IF EXISTS (SELECT * FROM `INFORMATION_SCHEMA`.`COLUMNS` WHERE 
> `TABLE_NAME` = 'TBLS' AND `COLUMN_NAME` = 'LINK_TARGET_ID') THEN ALTER TABLE 
> `TBLS` DROP FOREIGN KEY `TBLS_FK3` ; ALTER TABLE `TBLS` DROP KEY `TBLS_N51` ; 
> ALTER TABLE `TBLS` DROP COLUMN `LINK_TARGET_ID` ; END IF; END
> Error: You have an error in your SQL syntax; check the manual that 
> corresponds to your MySQL server version for the right syntax to use near '' 
> at line 1 (state=42000,code=1064)
> Closing: 0: jdbc:mysql://node-1.example.com/hive?createDatabaseIfNotExist=true
> org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore 
> state would be inconsistent !!
> org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore 
> state would be inconsistent !!
>   at 
> org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:229)
>   at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:468)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: java.io.IOException: Schema script failed, errorcode 2
>   at 
> org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:355)
>   at 
> org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:326)
>   at 
> org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:224)
> {code}
> Looks like HIVE-7018 has introduced stored procedure as part of mysql upgrade 
> script and it is causing issues with schematool upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10604) update webhcat-default.xml with 1.2 version numbers

2015-05-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529633#comment-14529633
 ] 

Hive QA commented on HIVE-10604:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12730316/HIVE-10604.patch

{color:red}ERROR:{color} -1 due to 24 failed/errored test(s), 8900 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_parts
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_unencrypted_tbl
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_with_different_encryption_keys
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_load_data_to_encrypted_tables
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_select_read_only_encrypted_tbl
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_disallow_transform
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_droppartition
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_sba_drop_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_alterpart_loc
org.apache.hadoop.hive.ql.security.TestStorageBasedClientSideAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropDatabase
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropTable
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropView
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProviderWithACL.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbFailure
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbSuccess
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableFailure
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableSuccess
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessing
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessingCustomSetWhitelistAppend
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3743/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3743/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3743/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 24 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12730316 - PreCommit-HIVE-TRUNK-Build

> update webhcat-default.xml with 1.2 version numbers
> ---
>
> Key: HIVE-10604
> URL: https://issues.apache.org/jira/browse/HIVE-10604
> Project: Hive
>  Issue Type: Bug
>  Components: WebHCat
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Minor
> Fix For: 1.2.0
>
> Attachments: HIVE-10604.patch
>
>
> no precommit tests



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10616) TypeInfoUtils doesn't handle DECIMAL with just precision specified

2015-05-05 Thread Thomas Friedrich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Friedrich updated HIVE-10616:

Attachment: HIVE-10616.1.patch

> TypeInfoUtils doesn't handle DECIMAL with just precision specified
> --
>
> Key: HIVE-10616
> URL: https://issues.apache.org/jira/browse/HIVE-10616
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 1.0.0
>Reporter: Thomas Friedrich
>Assignee: Thomas Friedrich
>Priority: Minor
> Attachments: HIVE-10616.1.patch
>
>
> The parseType method in TypeInfoUtils doesn't handle decimal types with just 
> precision specified although that's a valid type definition. 
> As a result, TypeInfoUtils.getTypeInfoFromTypeString will always return 
> decimal(10,0) for any decimal() string. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-6679) HiveServer2 should support configurable the server side socket timeout and keepalive for various transports types where applicable

2015-05-05 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529631#comment-14529631
 ] 

Thejas M Nair commented on HIVE-6679:
-

+1 .
Just a minor comment. Can you also update the description of 
HIVE_SERVER2_TCP_SOCKET_BLOCKING_TIMEOUT to say that its applicable only in 
binary mode, and for http mode, the equivalent is 
hive.server2.thrift.http.max.idle.time?

> HiveServer2 should support configurable the server side socket timeout and 
> keepalive for various transports types where applicable
> --
>
> Key: HIVE-6679
> URL: https://issues.apache.org/jira/browse/HIVE-6679
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0
>Reporter: Prasad Mujumdar
>Assignee: Navis
>  Labels: TODOC1.0, TODOC15
> Fix For: 1.2.0
>
> Attachments: HIVE-6679.1.patch.txt, HIVE-6679.2.patch.txt, 
> HIVE-6679.3.patch, HIVE-6679.4.patch, HIVE-6679.5.patch, HIVE-6679.6.patch
>
>
>  HiveServer2 should support configurable the server side socket read timeout 
> and TCP keep-alive option. Metastore server already support this (and the so 
> is the old hive server). 
> We now have multiple client connectivity options like Kerberos, Delegation 
> Token (Digest-MD5), Plain SASL, Plain SASL with SSL and raw sockets. The 
> configuration should be applicable to all types (if possible).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName

2015-05-05 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529626#comment-14529626
 ] 

Pengcheng Xiong commented on HIVE-9392:
---

rename the patch to get QA run.

> JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to 
> column names having duplicated fqColumnName
> 
>
> Key: HIVE-9392
> URL: https://issues.apache.org/jira/browse/HIVE-9392
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch, HIVE-9392.3.patch
>
>
> In JoinStatsRule.process the join column statistics are stored in HashMap  
> joinedColStats, the key used which is the ColStatistics.fqColName is 
> duplicated between join column in the same vertex, as a result distinctVals 
> ends up having duplicated values which negatively affects the join 
> cardinality estimation.
> The duplicate keys are usually named KEY.reducesinkkey0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName

2015-05-05 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9392:
--
Attachment: HIVE-9392.3.patch

> JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to 
> column names having duplicated fqColumnName
> 
>
> Key: HIVE-9392
> URL: https://issues.apache.org/jira/browse/HIVE-9392
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch, HIVE-9392.3.patch
>
>
> In JoinStatsRule.process the join column statistics are stored in HashMap  
> joinedColStats, the key used which is the ColStatistics.fqColName is 
> duplicated between join column in the same vertex, as a result distinctVals 
> ends up having duplicated values which negatively affects the join 
> cardinality estimation.
> The duplicate keys are usually named KEY.reducesinkkey0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName

2015-05-05 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9392:
--
Attachment: (was: HIVE-9392.01.patch)

> JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to 
> column names having duplicated fqColumnName
> 
>
> Key: HIVE-9392
> URL: https://issues.apache.org/jira/browse/HIVE-9392
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch, HIVE-9392.3.patch
>
>
> In JoinStatsRule.process the join column statistics are stored in HashMap  
> joinedColStats, the key used which is the ColStatistics.fqColName is 
> duplicated between join column in the same vertex, as a result distinctVals 
> ends up having duplicated values which negatively affects the join 
> cardinality estimation.
> The duplicate keys are usually named KEY.reducesinkkey0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName

2015-05-05 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529623#comment-14529623
 ] 

Pengcheng Xiong commented on HIVE-9392:
---

[~mmokhtar], could you please take a look? Thanks.

> JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to 
> column names having duplicated fqColumnName
> 
>
> Key: HIVE-9392
> URL: https://issues.apache.org/jira/browse/HIVE-9392
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: HIVE-9392.01.patch, HIVE-9392.1.patch, HIVE-9392.2.patch
>
>
> In JoinStatsRule.process the join column statistics are stored in HashMap  
> joinedColStats, the key used which is the ColStatistics.fqColName is 
> duplicated between join column in the same vertex, as a result distinctVals 
> ends up having duplicated values which negatively affects the join 
> cardinality estimation.
> The duplicate keys are usually named KEY.reducesinkkey0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName

2015-05-05 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9392:
--
Attachment: HIVE-9392.01.patch

After discussing with [~jpullokkaran], we assume that this patch will solve the 
problem. And we already tried TPCDS 70,89 to confirm

> JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to 
> column names having duplicated fqColumnName
> 
>
> Key: HIVE-9392
> URL: https://issues.apache.org/jira/browse/HIVE-9392
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: HIVE-9392.01.patch, HIVE-9392.1.patch, HIVE-9392.2.patch
>
>
> In JoinStatsRule.process the join column statistics are stored in HashMap  
> joinedColStats, the key used which is the ColStatistics.fqColName is 
> duplicated between join column in the same vertex, as a result distinctVals 
> ends up having duplicated values which negatively affects the join 
> cardinality estimation.
> The duplicate keys are usually named KEY.reducesinkkey0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9534) incorrect result set for query that projects a windowed aggregate

2015-05-05 Thread N Campbell (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529608#comment-14529608
 ] 

N Campbell commented on HIVE-9534:
--

re your comment about ORACLE

select avg(distinct tsint.csint) over () from tsint

null, -1, 0, 1, 10
ORACLE Oracle Database 12c Enterprise Edition ( 12.1.0.2.0)   returns 2.5,
2.5, 2.5, 2.5, 2.5





> incorrect result set for query that projects a windowed aggregate
> -
>
> Key: HIVE-9534
> URL: https://issues.apache.org/jira/browse/HIVE-9534
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Reporter: N Campbell
>Assignee: Chaoyu Tang
>
> Result set returned by Hive has one row instead of 5
> {code}
> select avg(distinct tsint.csint) over () from tsint 
> create table  if not exists TSINT (RNUM int , CSINT smallint)
>  ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' 
>  STORED AS TEXTFILE;
> 0|\N
> 1|-1
> 2|0
> 3|1
> 4|10
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10213) MapReduce jobs using dynamic-partitioning fail on commit.

2015-05-05 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-10213:

Attachment: MapRedExample.java

Thanks for the commit, [~sushanth].

I just verified this fix again with the YHive-13 and trunk, using the attached 
program. (The code reshuffles the specified 'source' data into a differently 
partitioned 'target' table, using dynamic partitioning.)

Here's the exception-trace for the bug. I've verified that the partitioning 
happens correctly with this patch applied:

{code}
Error: java.io.IOException: No callback registered for 
TaskAttemptID:attempt_1428474791204_201112_m_00_0@hdfs://crystalmyth.myth.net:8020/tmp/myth/mythdb/foobar_partitioned_dt_grid/_DYN0.6055391511914422/dt=__HIVE_DEFAULT_PARTITION__/grid=__HIVE_DEFAULT_PARTITION__
at 
org.apache.hive.hcatalog.mapreduce.TaskCommitContextRegistry.commitTask(TaskCommitContextRegistry.java:74)
at 
org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitTask(FileOutputCommitterContainer.java:143)
at org.apache.hadoop.mapred.Task.commit(Task.java:1163)
at org.apache.hadoop.mapred.Task.done(Task.java:1025)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:345)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
{code}

> MapReduce jobs using dynamic-partitioning fail on commit.
> -
>
> Key: HIVE-10213
> URL: https://issues.apache.org/jira/browse/HIVE-10213
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Fix For: 1.2.0
>
> Attachments: HIVE-10213.1.patch, MapRedExample.java
>
>
> I recently ran into a problem in {{TaskCommitContextRegistry}}, when using 
> dynamic-partitions.
> Consider a MapReduce program that reads HCatRecords from a table (using 
> HCatInputFormat), and then writes to another table (with identical schema), 
> using HCatOutputFormat. The Map-task fails with the following exception:
> {code}
> Error: java.io.IOException: No callback registered for 
> TaskAttemptID:attempt_1426589008676_509707_m_00_0@hdfs://crystalmyth.myth.net:8020/user/mithunr/mythdb/target/_DYN0.6784154320609959/grid=__HIVE_DEFAULT_PARTITION__/dt=__HIVE_DEFAULT_PARTITION__
> at 
> org.apache.hive.hcatalog.mapreduce.TaskCommitContextRegistry.commitTask(TaskCommitContextRegistry.java:56)
> at 
> org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitTask(FileOutputCommitterContainer.java:139)
> at org.apache.hadoop.mapred.Task.commit(Task.java:1163)
> at org.apache.hadoop.mapred.Task.done(Task.java:1025)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:345)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> {code}
> {{TaskCommitContextRegistry::commitTask()}} uses call-backs registered from 
> {{DynamicPartitionFileRecordWriter}}. But in case {{HCatInputFormat}} and 
> {{HCatOutputFormat}} are both used in the same job, the 
> {{DynamicPartitionFileRecordWriter}} might only be exercised in the Reducer.
> I'm relaxing the IOException, and log a warning message instead of just 
> failing.
> (I'll post the fix shortly.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-6679) HiveServer2 should support configurable the server side socket timeout and keepalive for various transports types where applicable

2015-05-05 Thread Vaibhav Gumashta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-6679:
---
Fix Version/s: (was: 1.1.0)
   1.2.0

> HiveServer2 should support configurable the server side socket timeout and 
> keepalive for various transports types where applicable
> --
>
> Key: HIVE-6679
> URL: https://issues.apache.org/jira/browse/HIVE-6679
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0
>Reporter: Prasad Mujumdar
>Assignee: Navis
>  Labels: TODOC1.0, TODOC15
> Fix For: 1.2.0
>
> Attachments: HIVE-6679.1.patch.txt, HIVE-6679.2.patch.txt, 
> HIVE-6679.3.patch, HIVE-6679.4.patch, HIVE-6679.5.patch, HIVE-6679.6.patch
>
>
>  HiveServer2 should support configurable the server side socket read timeout 
> and TCP keep-alive option. Metastore server already support this (and the so 
> is the old hive server). 
> We now have multiple client connectivity options like Kerberos, Delegation 
> Token (Digest-MD5), Plain SASL, Plain SASL with SSL and raw sockets. The 
> configuration should be applicable to all types (if possible).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-6679) HiveServer2 should support configurable the server side socket timeout and keepalive for various transports types where applicable

2015-05-05 Thread Vaibhav Gumashta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-6679:
---
Affects Version/s: 1.1.0
   1.0.0

> HiveServer2 should support configurable the server side socket timeout and 
> keepalive for various transports types where applicable
> --
>
> Key: HIVE-6679
> URL: https://issues.apache.org/jira/browse/HIVE-6679
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0
>Reporter: Prasad Mujumdar
>Assignee: Navis
>  Labels: TODOC1.0, TODOC15
> Fix For: 1.2.0
>
> Attachments: HIVE-6679.1.patch.txt, HIVE-6679.2.patch.txt, 
> HIVE-6679.3.patch, HIVE-6679.4.patch, HIVE-6679.5.patch, HIVE-6679.6.patch
>
>
>  HiveServer2 should support configurable the server side socket read timeout 
> and TCP keep-alive option. Metastore server already support this (and the so 
> is the old hive server). 
> We now have multiple client connectivity options like Kerberos, Delegation 
> Token (Digest-MD5), Plain SASL, Plain SASL with SSL and raw sockets. The 
> configuration should be applicable to all types (if possible).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-6679) HiveServer2 should support configurable the server side socket timeout and keepalive for various transports types where applicable

2015-05-05 Thread Vaibhav Gumashta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-6679:
---
Affects Version/s: 1.2.0

> HiveServer2 should support configurable the server side socket timeout and 
> keepalive for various transports types where applicable
> --
>
> Key: HIVE-6679
> URL: https://issues.apache.org/jira/browse/HIVE-6679
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0
>Reporter: Prasad Mujumdar
>Assignee: Navis
>  Labels: TODOC1.0, TODOC15
> Fix For: 1.2.0
>
> Attachments: HIVE-6679.1.patch.txt, HIVE-6679.2.patch.txt, 
> HIVE-6679.3.patch, HIVE-6679.4.patch, HIVE-6679.5.patch, HIVE-6679.6.patch
>
>
>  HiveServer2 should support configurable the server side socket read timeout 
> and TCP keep-alive option. Metastore server already support this (and the so 
> is the old hive server). 
> We now have multiple client connectivity options like Kerberos, Delegation 
> Token (Digest-MD5), Plain SASL, Plain SASL with SSL and raw sockets. The 
> configuration should be applicable to all types (if possible).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-10547) CBO (Calcite Return Path) : genFileSinkPlan uses wrong partition col to create FS

2015-05-05 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong resolved HIVE-10547.

Resolution: Fixed

> CBO (Calcite Return Path) : genFileSinkPlan uses wrong partition col to 
> create FS
> -
>
> Key: HIVE-10547
> URL: https://issues.apache.org/jira/browse/HIVE-10547
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 1.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (HIVE-10564) webhcat should use webhcat-site.xml properties for controller job submission

2015-05-05 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reopened HIVE-10564:
---

Unfortunately this has unexpected side effects.
Every time a job is submitted, various properties are passed in cmd line using 
-Dfoo=bar

This change causes AppConfig Configuration object to accumulate the union of 
all these properties so Job N+1 includes properties that belong previous jobs.

for example, if you run a job with "-D, templeton.statusdir=TestSqoop_1" and 
then another job that does not specify "statusdir", the 2nd job will write to 
TestSqoop_1

this will cause a major problem



> webhcat should use webhcat-site.xml properties for controller job submission
> 
>
> Key: HIVE-10564
> URL: https://issues.apache.org/jira/browse/HIVE-10564
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
>  Labels: TODOC1.2
> Fix For: 1.2.0
>
> Attachments: HIVE-10564.1.patch
>
>
> webhcat should use webhcat-site.xml in configuration for the 
> TempletonController map-only job that it launches. This will allow users to 
> set any MR/hdfs properties that want to see used for the controller job.
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10526) CBO (Calcite Return Path): HiveCost epsilon comparison should take row count in to account

2015-05-05 Thread Sushanth Sowmyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-10526:

Attachment: HIVE-10526.2.patch

Reuploading .1.patch as .2.patch

> CBO (Calcite Return Path): HiveCost epsilon comparison should take row count 
> in to account
> --
>
> Key: HIVE-10526
> URL: https://issues.apache.org/jira/browse/HIVE-10526
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Affects Versions: 0.12.0
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
> Fix For: 1.2.0
>
> Attachments: HIVE-10526.1.patch, HIVE-10526.2.patch, HIVE-10526.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10526) CBO (Calcite Return Path): HiveCost epsilon comparison should take row count in to account

2015-05-05 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529584#comment-14529584
 ] 

Sushanth Sowmyan commented on HIVE-10526:
-

I don't see this picked up in the test commit queue, and it's possible it'll 
fail out saying it's already processed this file, so I'm going to re-upload 
.1.patch as .2.patch and manually submit this into the queue.

> CBO (Calcite Return Path): HiveCost epsilon comparison should take row count 
> in to account
> --
>
> Key: HIVE-10526
> URL: https://issues.apache.org/jira/browse/HIVE-10526
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Affects Versions: 0.12.0
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
> Fix For: 1.2.0
>
> Attachments: HIVE-10526.1.patch, HIVE-10526.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9451) Add max size of column dictionaries to ORC metadata

2015-05-05 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529580#comment-14529580
 ] 

Sushanth Sowmyan commented on HIVE-9451:


Okay, thanks for the update. Will wait to hear more. :)

> Add max size of column dictionaries to ORC metadata
> ---
>
> Key: HIVE-9451
> URL: https://issues.apache.org/jira/browse/HIVE-9451
> Project: Hive
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>  Labels: ORC
> Fix For: 1.2.0
>
> Attachments: HIVE-9451.patch, HIVE-9451.patch
>
>
> To predict the amount of memory required to read an ORC file we need to know 
> the size of the dictionaries for the columns that we are reading. I propose 
> adding the number of bytes for each column's dictionary to the stripe's 
> column statistics. The file's column statistics would have the maximum 
> dictionary size for each column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10565) LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT OUTER JOIN repeated key correctly

2015-05-05 Thread Vikram Dixit K (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529578#comment-14529578
 ] 

Vikram Dixit K commented on HIVE-10565:
---

I am reviewing this one.

> LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT 
> OUTER JOIN repeated key correctly
> 
>
> Key: HIVE-10565
> URL: https://issues.apache.org/jira/browse/HIVE-10565
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 1.2.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 1.2.0, 1.3.0
>
> Attachments: HIVE-10565.01.patch, HIVE-10565.02.patch, 
> HIVE-10565.03.patch, HIVE-10565.04.patch, HIVE-10565.05.patch, 
> HIVE-10565.06.patch
>
>
> Filtering can knock out some of the rows for a repeated key, but those 
> knocked out rows need to be included in the LEFT OUTER JOIN result and are 
> currently not when only some rows are filtered out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10565) LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT OUTER JOIN repeated key correctly

2015-05-05 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529572#comment-14529572
 ] 

Sushanth Sowmyan commented on HIVE-10565:
-

Hi Matt, who would be the ideal person to review this patch?

> LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT 
> OUTER JOIN repeated key correctly
> 
>
> Key: HIVE-10565
> URL: https://issues.apache.org/jira/browse/HIVE-10565
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 1.2.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 1.2.0, 1.3.0
>
> Attachments: HIVE-10565.01.patch, HIVE-10565.02.patch, 
> HIVE-10565.03.patch, HIVE-10565.04.patch, HIVE-10565.05.patch, 
> HIVE-10565.06.patch
>
>
> Filtering can knock out some of the rows for a repeated key, but those 
> knocked out rows need to be included in the LEFT OUTER JOIN result and are 
> currently not when only some rows are filtered out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9451) Add max size of column dictionaries to ORC metadata

2015-05-05 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529570#comment-14529570
 ] 

Prasanth Jayachandran commented on HIVE-9451:
-

No.. The test failures looks related. [~owen.omalley] Can you take a look at 
the test failures? I am assuming all these are related to file size differences.

> Add max size of column dictionaries to ORC metadata
> ---
>
> Key: HIVE-9451
> URL: https://issues.apache.org/jira/browse/HIVE-9451
> Project: Hive
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>  Labels: ORC
> Fix For: 1.2.0
>
> Attachments: HIVE-9451.patch, HIVE-9451.patch
>
>
> To predict the amount of memory required to read an ORC file we need to know 
> the size of the dictionaries for the columns that we are reading. I propose 
> adding the number of bytes for each column's dictionary to the stripe's 
> column statistics. The file's column statistics would have the maximum 
> dictionary size for each column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9451) Add max size of column dictionaries to ORC metadata

2015-05-05 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529566#comment-14529566
 ] 

Sushanth Sowmyan commented on HIVE-9451:


Hi, given the previous +1 pending tests, and tests having run, do the tests 
look okay to commit?

> Add max size of column dictionaries to ORC metadata
> ---
>
> Key: HIVE-9451
> URL: https://issues.apache.org/jira/browse/HIVE-9451
> Project: Hive
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>  Labels: ORC
> Fix For: 1.2.0
>
> Attachments: HIVE-9451.patch, HIVE-9451.patch
>
>
> To predict the amount of memory required to read an ORC file we need to know 
> the size of the dictionaries for the columns that we are reading. I propose 
> adding the number of bytes for each column's dictionary to the stripe's 
> column statistics. The file's column statistics would have the maximum 
> dictionary size for each column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9743) Incorrect result set for vectorized left outer join

2015-05-05 Thread Vikram Dixit K (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529560#comment-14529560
 ] 

Vikram Dixit K commented on HIVE-9743:
--

That seems to be because with SMB there seems to be full delegation to the base 
class. I am not sure if we need the SMB changes at all.

> Incorrect result set for vectorized left outer join
> ---
>
> Key: HIVE-9743
> URL: https://issues.apache.org/jira/browse/HIVE-9743
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 0.14.0
>Reporter: N Campbell
>Assignee: Matt McCline
> Attachments: HIVE-9743.01.patch, HIVE-9743.02.patch, 
> HIVE-9743.03.patch, HIVE-9743.04.patch, HIVE-9743.05.patch, 
> HIVE-9743.06.patch, HIVE-9743.08.patch
>
>
> This query is supposed to return 3 rows and will when run without Tez but 
> returns 2 rows when run with Tez.
> select tjoin1.rnum, tjoin1.c1, tjoin1.c2, tjoin2.c2 as c2j2 from tjoin1 left 
> outer join tjoin2 on ( tjoin1.c1 = tjoin2.c1 and tjoin1.c2 > 15 )
> tjoin1.rnum   tjoin1.c1   tjoin1.c2   c2j2
> 1 20  25  
> 2   50  
> instead of
> tjoin1.rnum   tjoin1.c1   tjoin1.c2   c2j2
> 0 10  15  
> 1 20  25  
> 2   50  
> create table  if not exists TJOIN1 (RNUM int , C1 int, C2 int)
>  STORED AS orc ;
> 0|10|15
> 1|20|25
> 2|\N|50
> create table  if not exists TJOIN2 (RNUM int , C1 int, C2 char(2))
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' 
>  STORED AS TEXTFILE ;
> 0|10|BB
> 1|15|DD
> 2|\N|EE
> 3|10|FF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-8769) Physical optimizer : Incorrect CE results in a shuffle join instead of a Map join (PK/FK pattern not detected)

2015-05-05 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong reassigned HIVE-8769:
-

Assignee: Pengcheng Xiong  (was: Prasanth Jayachandran)

> Physical optimizer : Incorrect CE results in a shuffle join instead of a Map 
> join (PK/FK pattern not detected)
> --
>
> Key: HIVE-8769
> URL: https://issues.apache.org/jira/browse/HIVE-8769
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Pengcheng Xiong
> Fix For: 1.2.0
>
>
> TPC-DS Q82 is running slower than hive 13 because the join type is not 
> correct.
> The estimate for item x inventory x date_dim is 227 Million rows while the 
> actual is  3K rows.
> Hive 13 finishes in  753  seconds.
> Hive 14 finishes in  1,267  seconds.
> Hive 14 + force map join finished in 431 seconds.
> Query
> {code}
> select  i_item_id
>,i_item_desc
>,i_current_price
>  from item, inventory, date_dim, store_sales
>  where i_current_price between 30 and 30+30
>  and inv_item_sk = i_item_sk
>  and d_date_sk=inv_date_sk
>  and d_date between '2002-05-30' and '2002-07-30'
>  and i_manufact_id in (437,129,727,663)
>  and inv_quantity_on_hand between 100 and 500
>  and ss_item_sk = i_item_sk
>  group by i_item_id,i_item_desc,i_current_price
>  order by i_item_id
>  limit 100
> {code}
> Plan 
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   Edges:
> Map 7 <- Map 1 (BROADCAST_EDGE), Map 2 (BROADCAST_EDGE)
> Reducer 4 <- Map 3 (SIMPLE_EDGE), Map 7 (SIMPLE_EDGE)
> Reducer 5 <- Reducer 4 (SIMPLE_EDGE)
> Reducer 6 <- Reducer 5 (SIMPLE_EDGE)
>   DagName: mmokhtar_20141106005353_7a2eb8df-12ff-4fe9-89b4-30f1e4e3fb90:1
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: item
>   filterExpr: ((i_current_price BETWEEN 30 AND 60 and 
> (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: 
> boolean)
>   Statistics: Num rows: 462000 Data size: 663862160 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((i_current_price BETWEEN 30 AND 60 and 
> (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: 
> boolean)
> Statistics: Num rows: 115500 Data size: 34185680 Basic 
> stats: COMPLETE Column stats: COMPLETE
> Select Operator
>   expressions: i_item_sk (type: int), i_item_id (type: 
> string), i_item_desc (type: string), i_current_price (type: float)
>   outputColumnNames: _col0, _col1, _col2, _col3
>   Statistics: Num rows: 115500 Data size: 33724832 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 115500 Data size: 33724832 
> Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: _col1 (type: string), _col2 (type: 
> string), _col3 (type: float)
> Execution mode: vectorized
> Map 2 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: (d_date BETWEEN '2002-05-30' AND '2002-07-30' 
> and d_date_sk is not null) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 81741831 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: (d_date BETWEEN '2002-05-30' AND '2002-07-30' 
> and d_date_sk is not null) (type: boolean)
> Statistics: Num rows: 36524 Data size: 3579352 Basic 
> stats: COMPLETE Column stats: COMPLETE
> Select Operator
>   expressions: d_date_sk (type: int)
>   outputColumnNames: _col0
>   Statistics: Num rows: 36524 Data size: 146096 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 36524 Data size: 146096 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Select Operator
> expressions: _col0 (type: int)
> out

[jira] [Commented] (HIVE-10526) CBO (Calcite Return Path): HiveCost epsilon comparison should take row count in to account

2015-05-05 Thread Laljo John Pullokkaran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529502#comment-14529502
 ] 

Laljo John Pullokkaran commented on HIVE-10526:
---

uploaded modified patch last week.
For some reason QA run didn't kick in.

> CBO (Calcite Return Path): HiveCost epsilon comparison should take row count 
> in to account
> --
>
> Key: HIVE-10526
> URL: https://issues.apache.org/jira/browse/HIVE-10526
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Affects Versions: 0.12.0
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
> Fix For: 1.2.0
>
> Attachments: HIVE-10526.1.patch, HIVE-10526.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8769) Physical optimizer : Incorrect CE results in a shuffle join instead of a Map join (PK/FK pattern not detected)

2015-05-05 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529498#comment-14529498
 ] 

Pengcheng Xiong commented on HIVE-8769:
---

[~mohammedmostafa], I ran it with 1GB TPCDS and it seems that the PKFK is 
detected, although my plan is different from yours.
{code}
Vertex dependency in root stage
Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 5 (SIMPLE_EDGE), Map 6 (SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
Reducer 4 <- Reducer 3 (SIMPLE_EDGE)
Map 6 <- Map 7 (BROADCAST_EDGE)

Stage-0
   Fetch Operator
  limit:100
  Stage-1
 Reducer 4
 File Output Operator [FS_33]
compressed:false
Statistics:Num rows: 100 Data size: 39600 Basic stats: COMPLETE 
Column stats: COMPLETE

table:{"serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe","input 
format:":"org.apache.hadoop.mapred.TextInputFormat","output 
format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"}
Limit [LIM_32]
   Number of rows:100
   Statistics:Num rows: 100 Data size: 39600 Basic stats: COMPLETE 
Column stats: COMPLETE
   Select Operator [SEL_31]
   |  outputColumnNames:["_col0","_col1","_col2"]
   |  Statistics:Num rows: 142657470 Data size: 56492358120 Basic 
stats: COMPLETE Column stats: COMPLETE
   |<-Reducer 3 [SIMPLE_EDGE]
  Reduce Output Operator [RS_30]
 key expressions:_col0 (type: string)
 sort order:+
 Statistics:Num rows: 142657470 Data size: 56492358120 
Basic stats: COMPLETE Column stats: COMPLETE
 value expressions:_col1 (type: string), _col2 (type: 
decimal(7,2))
 Group By Operator [GBY_28]
 |  keys:KEY._col0 (type: string), KEY._col1 (type: 
string), KEY._col2 (type: decimal(7,2))
 |  outputColumnNames:["_col0","_col1","_col2"]
 |  Statistics:Num rows: 142657470 Data size: 56492358120 
Basic stats: COMPLETE Column stats: COMPLETE
 |<-Reducer 2 [SIMPLE_EDGE]
Reduce Output Operator [RS_27]
   key expressions:_col0 (type: string), _col1 (type: 
string), _col2 (type: decimal(7,2))
   Map-reduce partition columns:_col0 (type: string), 
_col1 (type: string), _col2 (type: decimal(7,2))
   sort order:+++
   Statistics:Num rows: 142657470 Data size: 
56492358120 Basic stats: COMPLETE Column stats: COMPLETE
   Group By Operator [GBY_26]
  keys:_col0 (type: string), _col1 (type: string), 
_col2 (type: decimal(7,2))
  outputColumnNames:["_col0","_col1","_col2"]
  Statistics:Num rows: 142657470 Data size: 
56492358120 Basic stats: COMPLETE Column stats: COMPLETE
  Select Operator [SEL_24]
 outputColumnNames:["_col0","_col1","_col2"]
 Statistics:Num rows: 142657470 Data size: 
56492358120 Basic stats: COMPLETE Column stats: COMPLETE
 Merge Join Operator [MERGEJOIN_49]
 |  condition map:[{"":"Inner Join 0 to 
1"},{"":"Inner Join 1 to 2"}]
 |  keys:{"2":"_col1 (type: int)","1":"_col0 
(type: int)","0":"_col0 (type: int)"}
 |  outputColumnNames:["_col2","_col3","_col4"]
 |  Statistics:Num rows: 142657470 Data size: 
56492358120 Basic stats: COMPLETE Column stats: COMPLETE
 |<-Map 1 [SIMPLE_EDGE]
 |  Reduce Output Operator [RS_18]
 | key expressions:_col0 (type: int)
 | Map-reduce partition columns:_col0 
(type: int)
 | sort order:+
 | Statistics:Num rows: 2880404 Data size: 
11521616 Basic stats: COMPLETE Column stats: COMPLETE
 | Select Operator [SEL_1]
 |outputColumnNames:["_col0"]
 |Statistics:Num rows: 2880404 Data 
size: 11521616 Basic stats: COMPLETE Column stats: COMPLETE
 |Filter Operator [FIL_44]
 |   predicate:ss_item_sk is not null 
(type: boolean)
 |   Statistics:Num rows: 2880404 Data 
size: 11521616 Basic stats: COMPLETE Column stats: COMPLETE
 |   TableScan [TS_0]
 |  ali

[jira] [Commented] (HIVE-10482) LLAP: AsertionError cannot allocate when reading from orc

2015-05-05 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529493#comment-14529493
 ] 

Sergey Shelukhin commented on HIVE-10482:
-

This happens when BuddyAllocator has one block of memory larger than target 
allocation. When memory is reserved and several threads go to allocate, they go 
from target size and then try to split larger sizes. If several threads try to 
split the block at the same time, one will split and re-add the remainder to 
lower level lists (e.g. 768k out of 1Mb block, after using 256k, will be added 
as one 512k block and one 256k block), but when the split is done, the others 
are waiting on the lock for the 1Mb-block list and will never again look at 
lower level lists.
There are several ways to fix this; adding some sort of "helping" to get 
threads to provide blocks to other threads after split is very complex (many 
special cases) and may have perf overhead in common case, plus in general case 
it may not solve similar issues e.g. with multiple arenas, where we examine 
full arena 1, then go to non-full arena 2, meanwhile someone allocates from 2 
and deallocates to 1, so we are screwed again; making allocator use 
"actor-like" model (removing all sync and having allocator thread that serves 
request queue); a retry loop that would retry as long as any changes have 
happened since last attempt.
Not sure yet if 2 or 3 are best.


> LLAP: AsertionError cannot allocate when reading from orc
> -
>
> Key: HIVE-10482
> URL: https://issues.apache.org/jira/browse/HIVE-10482
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Sergey Shelukhin
> Fix For: llap
>
>
> This was from a run of tpch query 1. [~sershe] - not sure if you've already 
> seen this. Creating a jira so that it doesn't get lost.
> {code}
> 2015-04-24 13:11:54,180 
> [TezTaskRunner_attempt_1429683757595_0326_4_00_000199_0(container_1_0326_01_003216_sseth_20150424131137_8ec6200c-77c8-43ea-a6a3-a0ab1da6e1ac:4_Map
>  1_199_0)] ERROR org.apache.hadoop.hive.ql.exec.tez.TezProcessor: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.io.IOException: java.lang.AssertionError: Cannot allocate
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:74)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:314)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:329)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: java.io.IOException: 
> java.lang.AssertionError: Cannot allocate
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355)
> at 
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
> at 
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:137)
>

[jira] [Commented] (HIVE-10506) CBO (Calcite Return Path): Disallow return path to be enable if CBO is off

2015-05-05 Thread Laljo John Pullokkaran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529489#comment-14529489
 ] 

Laljo John Pullokkaran commented on HIVE-10506:
---

+1

> CBO (Calcite Return Path): Disallow return path to be enable if CBO is off
> --
>
> Key: HIVE-10506
> URL: https://issues.apache.org/jira/browse/HIVE-10506
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 1.2.0
>
> Attachments: HIVE-10506.01.patch, HIVE-10506.patch
>
>
> If hive.cbo.enable=false and hive.cbo.returnpath=true then some optimizations 
> would kick in. It's quite possible that in customer environment, they might 
> end up in these scenarios; we should prevent it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10591) Support limited integer type promotion in ORC

2015-05-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529491#comment-14529491
 ] 

Hive QA commented on HIVE-10591:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12730595/HIVE-10591.2.patch

{color:red}ERROR:{color} -1 due to 53 failed/errored test(s), 8901 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_parts
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_all_non_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_tmp_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_where_no_match
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_where_non_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_update_delete
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_transform_acid
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_update_all_non_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_update_tmp_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_update_where_no_match
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_update_where_non_partitioned
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_unencrypted_tbl
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_with_different_encryption_keys
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_load_data_to_encrypted_tables
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_select_read_only_encrypted_tbl
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_all_non_partitioned
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_tmp_table
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_where_no_match
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_where_non_partitioned
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert_update_delete
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_update_all_non_partitioned
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_update_tmp_table
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_update_where_no_match
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_update_where_non_partitioned
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_disallow_transform
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_droppartition
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_sba_drop_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_alterpart_loc
org.apache.hadoop.hive.ql.TestTxnCommands2.testBucketizedInputFormat
org.apache.hadoop.hive.ql.TestTxnCommands2.testDeleteIn
org.apache.hadoop.hive.ql.TestTxnCommands2.testUpdateMixedCase
org.apache.hadoop.hive.ql.security.TestStorageBasedClientSideAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropDatabase
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropTable
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropView
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProviderWithACL.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbFailure
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbSuccess
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableFailure
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableSuccess
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessing
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessingCustomSetWhitelistAppend
org.apache.hadoop.hive.ql.txn.compactor.TestCompactor.majorCompactAfterAbort
org.apache.hadoop.hive.ql.txn.compa

[jira] [Commented] (HIVE-10482) LLAP: AsertionError cannot allocate when reading from orc

2015-05-05 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529476#comment-14529476
 ] 

Sergey Shelukhin commented on HIVE-10482:
-

I found the issue, not clear how to fix this yet though

> LLAP: AsertionError cannot allocate when reading from orc
> -
>
> Key: HIVE-10482
> URL: https://issues.apache.org/jira/browse/HIVE-10482
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Sergey Shelukhin
> Fix For: llap
>
>
> This was from a run of tpch query 1. [~sershe] - not sure if you've already 
> seen this. Creating a jira so that it doesn't get lost.
> {code}
> 2015-04-24 13:11:54,180 
> [TezTaskRunner_attempt_1429683757595_0326_4_00_000199_0(container_1_0326_01_003216_sseth_20150424131137_8ec6200c-77c8-43ea-a6a3-a0ab1da6e1ac:4_Map
>  1_199_0)] ERROR org.apache.hadoop.hive.ql.exec.tez.TezProcessor: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.io.IOException: java.lang.AssertionError: Cannot allocate
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:74)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:314)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:329)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: java.io.IOException: 
> java.lang.AssertionError: Cannot allocate
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355)
> at 
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
> at 
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:137)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:62)
> ... 16 more
> Caused by: java.io.IOException: java.lang.AssertionError: Cannot allocate
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.rethrowErrorIfAny(LlapInputFormat.java:257)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.nextCvb(LlapInputFormat.java:209)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:147)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:97)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 22 more
> Caused by: java.lang.AssertionError: Cannot allocate
> at 
> org.apache.hadoop.hive.ql.io.orc.InStream.readEncodedStream(InStream.java:761)
> at 
> org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:441)
>

[jira] [Commented] (HIVE-9743) Incorrect result set for vectorized left outer join

2015-05-05 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529477#comment-14529477
 ] 

Matt McCline commented on HIVE-9743:


[~vikram.dixit] I removed the annotations and the MR 
vector_left_outer_join3.q.out and fiddled with environment variables so that it 
now has "Sorted Merge Bucket Map Join Operator" operators; Tez has "Merge Join 
Operator" as you said.

The original LEFT OUTER JOIN problem does not repro with 
vector_left_outer_join3.q though.

> Incorrect result set for vectorized left outer join
> ---
>
> Key: HIVE-9743
> URL: https://issues.apache.org/jira/browse/HIVE-9743
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 0.14.0
>Reporter: N Campbell
>Assignee: Matt McCline
> Attachments: HIVE-9743.01.patch, HIVE-9743.02.patch, 
> HIVE-9743.03.patch, HIVE-9743.04.patch, HIVE-9743.05.patch, 
> HIVE-9743.06.patch, HIVE-9743.08.patch
>
>
> This query is supposed to return 3 rows and will when run without Tez but 
> returns 2 rows when run with Tez.
> select tjoin1.rnum, tjoin1.c1, tjoin1.c2, tjoin2.c2 as c2j2 from tjoin1 left 
> outer join tjoin2 on ( tjoin1.c1 = tjoin2.c1 and tjoin1.c2 > 15 )
> tjoin1.rnum   tjoin1.c1   tjoin1.c2   c2j2
> 1 20  25  
> 2   50  
> instead of
> tjoin1.rnum   tjoin1.c1   tjoin1.c2   c2j2
> 0 10  15  
> 1 20  25  
> 2   50  
> create table  if not exists TJOIN1 (RNUM int , C1 int, C2 int)
>  STORED AS orc ;
> 0|10|15
> 1|20|25
> 2|\N|50
> create table  if not exists TJOIN2 (RNUM int , C1 int, C2 char(2))
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' 
>  STORED AS TEXTFILE ;
> 0|10|BB
> 1|15|DD
> 2|\N|EE
> 3|10|FF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10610) hive command fails to get hadoop version

2015-05-05 Thread Sushanth Sowmyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-10610:

Assignee: Shwetha G S

> hive command fails to get hadoop version
> 
>
> Key: HIVE-10610
> URL: https://issues.apache.org/jira/browse/HIVE-10610
> Project: Hive
>  Issue Type: Bug
>Reporter: Shwetha G S
>Assignee: Shwetha G S
> Attachments: HIVE-10610.patch
>
>
> NO PRECOMMIT TESTS
> If debug level logging is enabled, hive command fails with the following 
> exception:
> {noformat}
> apache-hive-1.2.0-SNAPSHOT-bin$ ./bin/hive
> Unable to determine Hadoop version information from 13:54:07,683
> 'hadoop version' returned:
> 2015-05-05 13:54:08,014 DEBUG - [main:] ~ version: 2.5.0-cdh5.3.3 
> (VersionInfo:171) Hadoop 2.5.0-cdh5.3.3 Subversion 
> http://github.com/cloudera/hadoop -r 82a65209d6e9e4a2b41fdbcd8190c7ea38730627 
> Compiled by jenkins on 2015-04-08T22:00Z Compiled with protoc 2.5.0 From 
> source with checksum 1531e104cdad7489656f44875f3334b This command was run 
> using 
> /Users/sshivalingamurthy/installs/hadoop-2.5.0-cdh5.3.3/share/hadoop/common/hadoop-common-2.5.0-cdh5.3.3.jar
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10615) LLAP: Invalid containerId prefix

2015-05-05 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529473#comment-14529473
 ] 

Prasanth Jayachandran commented on HIVE-10615:
--

[~sseth] fyi..

> LLAP: Invalid containerId prefix
> 
>
> Key: HIVE-10615
> URL: https://issues.apache.org/jira/browse/HIVE-10615
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: llap
>Reporter: Prasanth Jayachandran
>
> I encountered this error when I ran a simple query in llap mode today. 
> {code}org.apache.hadoop.ipc.RemoteException(java.io.IOException): 
> java.lang.IllegalArgumentException: Invalid ContainerId prefix: 
>   at 
> org.apache.hadoop.yarn.api.records.ContainerId.fromString(ContainerId.java:211)
>   at 
> org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:178)
>   at 
> org.apache.tez.dag.app.TezTaskCommunicatorImpl$TezTaskUmbilicalProtocolImpl.heartbeat(TezTaskCommunicatorImpl.java:311)
>   at 
> org.apache.hadoop.hive.llap.tezplugins.LlapTaskCommunicator$LlapTaskUmbilicalProtocolImpl.heartbeat(LlapTaskCommunicator.java:398)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.ipc.WritableRpcEngine$Server$WritableRpcInvoker.call(WritableRpcEngine.java:514)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1468)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
>   at 
> org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:244)
>   at com.sun.proxy.$Proxy14.heartbeat(Unknown Source)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.heartbeat(LlapTaskReporter.java:256)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.call(LlapTaskReporter.java:184)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.call(LlapTaskReporter.java:126)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> 15/05/05 15:24:22 [Task-Executor-0] INFO task.TezTaskRunner : Interrupted 
> while waiting for task to complete. Interrupting task
> 15/05/05 15:24:22 [TezTaskRunner_attempt_1430816501738_0034_1_00_00_0] 
> INFO task.TezTaskRunner : Encounted an error while executing task: 
> attempt_1430816501738_0034_1_00_00_0
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2052)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at 
> java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:193)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.initialize(LogicalIOProcessorRuntimeTask.java:218)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:177)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(Thr

[jira] [Commented] (HIVE-8065) Support HDFS encryption functionality on Hive

2015-05-05 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529461#comment-14529461
 ] 

Brock Noland commented on HIVE-8065:


bq. have you considered creating a single encrypted staging dir for all queries 
to use instead of creating new ones under the table namespace? (this could be 
owned by Hive and encrypted with Hive's key). If so, why did you choose the 
current design?

This approach does not work since you cannot move files across encryption zones.

> Support HDFS encryption functionality on Hive
> -
>
> Key: HIVE-8065
> URL: https://issues.apache.org/jira/browse/HIVE-8065
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.13.1
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>  Labels: Hive-Scrum
>
> The new encryption support on HDFS makes Hive incompatible and unusable when 
> this feature is used.
> HDFS encryption is designed so that an user can configure different 
> encryption zones (or directories) for multi-tenant environments. An 
> encryption zone has an exclusive encryption key, such as AES-128 or AES-256. 
> Because of security compliance, the HDFS does not allow to move/rename files 
> between encryption zones. Renames are allowed only inside the same encryption 
> zone. A copy is allowed between encryption zones.
> See HDFS-6134 for more details about HDFS encryption design.
> Hive currently uses a scratch directory (like /tmp/$user/$random). This 
> scratch directory is used for the output of intermediate data (between MR 
> jobs) and for the final output of the hive query which is later moved to the 
> table directory location.
> If Hive tables are in different encryption zones than the scratch directory, 
> then Hive won't be able to renames those files/directories, and it will make 
> Hive unusable.
> To handle this problem, we can change the scratch directory of the 
> query/statement to be inside the same encryption zone of the table directory 
> location. This way, the renaming process will be successful. 
> Also, for statements that move files between encryption zones (i.e. LOAD 
> DATA), a copy may be executed instead of a rename. This will cause an 
> overhead when copying large data files, but it won't break the encryption on 
> Hive.
> Another security thing to consider is when using joins selects. If Hive joins 
> different tables with different encryption key strengths, then the results of 
> the select might break the security compliance of the tables. Let's say two 
> tables with 128 bits and 256 bits encryption are joined, then the temporary 
> results might be stored in the 128 bits encryption zone. This will conflict 
> with the table encrypted with 256 bits temporary.
> To fix this, Hive should be able to select the scratch directory that is more 
> secured/encrypted in order to save the intermediate data temporary with no 
> compliance issues.
> For instance:
> {noformat}
> SELECT * FROM table-aes128 t1 JOIN table-aes256 t2 WHERE t1.id == t2.id;
> {noformat}
> - This should use a scratch directory (or staging directory) inside the 
> table-aes256 table location.
> {noformat}
> INSERT OVERWRITE TABLE table-unencrypted SELECT * FROM table-aes1;
> {noformat}
> - This should use a scratch directory inside the table-aes1 location.
> {noformat}
> FROM table-unencrypted
> INSERT OVERWRITE TABLE table-aes128 SELECT id, name
> INSERT OVERWRITE TABLE table-aes256 SELECT id, name
> {noformat}
> - This should use a scratch directory on each of the tables locations.
> - The first SELECT will have its scratch directory on table-aes128 directory.
> - The second SELECT will have its scratch directory on table-aes256 directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9743) Incorrect result set for vectorized left outer join

2015-05-05 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-9743:
---
Attachment: HIVE-9743.08.patch

> Incorrect result set for vectorized left outer join
> ---
>
> Key: HIVE-9743
> URL: https://issues.apache.org/jira/browse/HIVE-9743
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 0.14.0
>Reporter: N Campbell
>Assignee: Matt McCline
> Attachments: HIVE-9743.01.patch, HIVE-9743.02.patch, 
> HIVE-9743.03.patch, HIVE-9743.04.patch, HIVE-9743.05.patch, 
> HIVE-9743.06.patch, HIVE-9743.08.patch
>
>
> This query is supposed to return 3 rows and will when run without Tez but 
> returns 2 rows when run with Tez.
> select tjoin1.rnum, tjoin1.c1, tjoin1.c2, tjoin2.c2 as c2j2 from tjoin1 left 
> outer join tjoin2 on ( tjoin1.c1 = tjoin2.c1 and tjoin1.c2 > 15 )
> tjoin1.rnum   tjoin1.c1   tjoin1.c2   c2j2
> 1 20  25  
> 2   50  
> instead of
> tjoin1.rnum   tjoin1.c1   tjoin1.c2   c2j2
> 0 10  15  
> 1 20  25  
> 2   50  
> create table  if not exists TJOIN1 (RNUM int , C1 int, C2 int)
>  STORED AS orc ;
> 0|10|15
> 1|20|25
> 2|\N|50
> create table  if not exists TJOIN2 (RNUM int , C1 int, C2 char(2))
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' 
>  STORED AS TEXTFILE ;
> 0|10|BB
> 1|15|DD
> 2|\N|EE
> 3|10|FF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9743) Incorrect result set for vectorized left outer join

2015-05-05 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-9743:
---
Attachment: (was: HIVE-9743.07.patch)

> Incorrect result set for vectorized left outer join
> ---
>
> Key: HIVE-9743
> URL: https://issues.apache.org/jira/browse/HIVE-9743
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 0.14.0
>Reporter: N Campbell
>Assignee: Matt McCline
> Attachments: HIVE-9743.01.patch, HIVE-9743.02.patch, 
> HIVE-9743.03.patch, HIVE-9743.04.patch, HIVE-9743.05.patch, HIVE-9743.06.patch
>
>
> This query is supposed to return 3 rows and will when run without Tez but 
> returns 2 rows when run with Tez.
> select tjoin1.rnum, tjoin1.c1, tjoin1.c2, tjoin2.c2 as c2j2 from tjoin1 left 
> outer join tjoin2 on ( tjoin1.c1 = tjoin2.c1 and tjoin1.c2 > 15 )
> tjoin1.rnum   tjoin1.c1   tjoin1.c2   c2j2
> 1 20  25  
> 2   50  
> instead of
> tjoin1.rnum   tjoin1.c1   tjoin1.c2   c2j2
> 0 10  15  
> 1 20  25  
> 2   50  
> create table  if not exists TJOIN1 (RNUM int , C1 int, C2 int)
>  STORED AS orc ;
> 0|10|15
> 1|20|25
> 2|\N|50
> create table  if not exists TJOIN2 (RNUM int , C1 int, C2 char(2))
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' 
>  STORED AS TEXTFILE ;
> 0|10|BB
> 1|15|DD
> 2|\N|EE
> 3|10|FF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10213) MapReduce jobs using dynamic-partitioning fail on commit.

2015-05-05 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529452#comment-14529452
 ] 

Sushanth Sowmyan commented on HIVE-10213:
-

This patch set off some warning flags for me with regards to the traditional 
M-R usecase, but it's because it's been a while since I looked at this piece of 
code. The traditional M-R usecase is still fine, because the 
DynamicPartitionFileRecordWriterContainer.close() will register an appropriate 
TaskCommitterProxy, and a commit on the OutputCommitter will be called in the 
same process scope, thus making it okay. For pig-based optimizations also, it'd 
continue to be okay as the singleton retains it in memory.

+1, and I'm okay with committing this patch as-is, tests have already run on 
this, and this section of code has not changed since then.

> MapReduce jobs using dynamic-partitioning fail on commit.
> -
>
> Key: HIVE-10213
> URL: https://issues.apache.org/jira/browse/HIVE-10213
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-10213.1.patch
>
>
> I recently ran into a problem in {{TaskCommitContextRegistry}}, when using 
> dynamic-partitions.
> Consider a MapReduce program that reads HCatRecords from a table (using 
> HCatInputFormat), and then writes to another table (with identical schema), 
> using HCatOutputFormat. The Map-task fails with the following exception:
> {code}
> Error: java.io.IOException: No callback registered for 
> TaskAttemptID:attempt_1426589008676_509707_m_00_0@hdfs://crystalmyth.myth.net:8020/user/mithunr/mythdb/target/_DYN0.6784154320609959/grid=__HIVE_DEFAULT_PARTITION__/dt=__HIVE_DEFAULT_PARTITION__
> at 
> org.apache.hive.hcatalog.mapreduce.TaskCommitContextRegistry.commitTask(TaskCommitContextRegistry.java:56)
> at 
> org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitTask(FileOutputCommitterContainer.java:139)
> at org.apache.hadoop.mapred.Task.commit(Task.java:1163)
> at org.apache.hadoop.mapred.Task.done(Task.java:1025)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:345)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> {code}
> {{TaskCommitContextRegistry::commitTask()}} uses call-backs registered from 
> {{DynamicPartitionFileRecordWriter}}. But in case {{HCatInputFormat}} and 
> {{HCatOutputFormat}} are both used in the same job, the 
> {{DynamicPartitionFileRecordWriter}} might only be exercised in the Reducer.
> I'm relaxing the IOException, and log a warning message instead of just 
> failing.
> (I'll post the fix shortly.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10608) Fix useless 'if' stamement in RetryingMetaStoreClient (135)

2015-05-05 Thread Szehon Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529430#comment-14529430
 ] 

Szehon Ho commented on HIVE-10608:
--

+1

> Fix useless 'if' stamement in RetryingMetaStoreClient (135)
> ---
>
> Key: HIVE-10608
> URL: https://issues.apache.org/jira/browse/HIVE-10608
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
>Priority: Minor
> Attachments: rb33861.patch
>
>
> "if" statement below is useless because it ends with ;
> {code}
>   } catch (MetaException e) {
> if (e.getMessage().matches("(?s).*(IO|TTransport)Exception.*"));
> caughtException = e;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8065) Support HDFS encryption functionality on Hive

2015-05-05 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529378#comment-14529378
 ] 

Eugene Koifman commented on HIVE-8065:
--

[~spena], when implementing this, have you considered creating a single 
encrypted staging dir for all queries to use instead of creating new ones under 
the table namespace?  (this could be owned by Hive and encrypted with Hive's 
key).  If so, why did you choose the current design?

Some possible issues with current design:
Requires write permission on the table dir
delete-on-exit (on stagingdir) is not completely reliable as far as I know.  
This may leave files around
in a query like "SELECT * FROM table-aes128 t1 JOIN table-aes256 t2 WHERE t1.id 
== t2.id;" when staging dir is created under table-aes256, someone how has a 
key for this EZ may read data (in theory at least) that came from table-aes128 
even if they don't have a key for EZ which contains table-aes128.

thanks

> Support HDFS encryption functionality on Hive
> -
>
> Key: HIVE-8065
> URL: https://issues.apache.org/jira/browse/HIVE-8065
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.13.1
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>  Labels: Hive-Scrum
>
> The new encryption support on HDFS makes Hive incompatible and unusable when 
> this feature is used.
> HDFS encryption is designed so that an user can configure different 
> encryption zones (or directories) for multi-tenant environments. An 
> encryption zone has an exclusive encryption key, such as AES-128 or AES-256. 
> Because of security compliance, the HDFS does not allow to move/rename files 
> between encryption zones. Renames are allowed only inside the same encryption 
> zone. A copy is allowed between encryption zones.
> See HDFS-6134 for more details about HDFS encryption design.
> Hive currently uses a scratch directory (like /tmp/$user/$random). This 
> scratch directory is used for the output of intermediate data (between MR 
> jobs) and for the final output of the hive query which is later moved to the 
> table directory location.
> If Hive tables are in different encryption zones than the scratch directory, 
> then Hive won't be able to renames those files/directories, and it will make 
> Hive unusable.
> To handle this problem, we can change the scratch directory of the 
> query/statement to be inside the same encryption zone of the table directory 
> location. This way, the renaming process will be successful. 
> Also, for statements that move files between encryption zones (i.e. LOAD 
> DATA), a copy may be executed instead of a rename. This will cause an 
> overhead when copying large data files, but it won't break the encryption on 
> Hive.
> Another security thing to consider is when using joins selects. If Hive joins 
> different tables with different encryption key strengths, then the results of 
> the select might break the security compliance of the tables. Let's say two 
> tables with 128 bits and 256 bits encryption are joined, then the temporary 
> results might be stored in the 128 bits encryption zone. This will conflict 
> with the table encrypted with 256 bits temporary.
> To fix this, Hive should be able to select the scratch directory that is more 
> secured/encrypted in order to save the intermediate data temporary with no 
> compliance issues.
> For instance:
> {noformat}
> SELECT * FROM table-aes128 t1 JOIN table-aes256 t2 WHERE t1.id == t2.id;
> {noformat}
> - This should use a scratch directory (or staging directory) inside the 
> table-aes256 table location.
> {noformat}
> INSERT OVERWRITE TABLE table-unencrypted SELECT * FROM table-aes1;
> {noformat}
> - This should use a scratch directory inside the table-aes1 location.
> {noformat}
> FROM table-unencrypted
> INSERT OVERWRITE TABLE table-aes128 SELECT id, name
> INSERT OVERWRITE TABLE table-aes256 SELECT id, name
> {noformat}
> - This should use a scratch directory on each of the tables locations.
> - The first SELECT will have its scratch directory on table-aes128 directory.
> - The second SELECT will have its scratch directory on table-aes256 directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7018) Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others

2015-05-05 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529366#comment-14529366
 ] 

Thejas M Nair commented on HIVE-7018:
-

This change breaks schematool upgrade - See HIVE-10614

> Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but 
> not others
> -
>
> Key: HIVE-7018
> URL: https://issues.apache.org/jira/browse/HIVE-7018
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Yongzhi Chen
> Fix For: 1.2.0
>
> Attachments: HIVE-7018.1.patch, HIVE-7018.2.patch
>
>
> It appears that at least postgres and oracle do not have the LINK_TARGET_ID 
> column while mysql does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9534) incorrect result set for query that projects a windowed aggregate

2015-05-05 Thread Chaoyu Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529304#comment-14529304
 ] 

Chaoyu Tang commented on HIVE-9534:
---

Looks like that the over(analytic_clause) part will be ignored in a query with 
distinct in Hive:
{code}
function
@init { gParent.pushMsg("function specification", state); }
@after { gParent.popMsg(state); }
:
functionName
LPAREN
  (
(STAR) => (star=STAR)
| (dist=KW_DISTINCT)? (selectExpression (COMMA selectExpression)*)?
  )
RPAREN (KW_OVER ws=window_specification)?
   -> {$star != null}? ^(TOK_FUNCTIONSTAR functionName $ws?)
   -> {$dist == null}? ^(TOK_FUNCTION functionName (selectExpression+)? 
$ws?)
-> ^(TOK_FUNCTIONDI functionName 
(selectExpression+)?)
;
{code}
the query like:
select avg(distinct col1) over() from testwindow; or 
select avg(distinct col1) over(order by col2 rows between 1 preceding and 1 
following) from testwindow;
the over(...) is totally ignored.
So I am going to fix this issue by throwing out unsupported error.

> incorrect result set for query that projects a windowed aggregate
> -
>
> Key: HIVE-9534
> URL: https://issues.apache.org/jira/browse/HIVE-9534
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Reporter: N Campbell
>Assignee: Chaoyu Tang
>
> Result set returned by Hive has one row instead of 5
> {code}
> select avg(distinct tsint.csint) over () from tsint 
> create table  if not exists TSINT (RNUM int , CSINT smallint)
>  ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' 
>  STORED AS TEXTFILE;
> 0|\N
> 1|-1
> 2|0
> 3|1
> 4|10
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10592) ORC file dump in JSON format

2015-05-05 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529258#comment-14529258
 ] 

Gopal V commented on HIVE-10592:


LGTM - +1

But a follow-up usability JIRA advised for the multi-file output scenarios - 
you need to produce an array of JSON objects instead of a JSON object of arrays.

Since there are no consumers for this output until there is some output, we 
need to iterate on this after writing some analysis scripts once this makes it 
into the build.

As an example of the difficulty in keeping JSON object walkers simple, try 
running

{code}
./dist/hive/bin/hive --service orcfiledump -j -p  
/apps/hive/warehouse/tpcds5_bin_partitioned_orc_200.db/customer_demographics/00_0
  
/apps/hive/warehouse/tpcds5_bin_partitioned_orc_200.db/customer_demographics/01_0
 

{
  "fileName": [

"\/apps\/hive\/warehouse\/tpcds5_bin_partitioned_orc_200.db\/customer_demographics\/00_0",

"\/apps\/hive\/warehouse\/tpcds5_bin_partitioned_orc_200.db\/customer_demographics\/01_0"
  ],
  "fileVersion": [
"0.12",
"0.12"
  ],
  "writerVersion": [
"HIVE_8732",
"HIVE_8732"
  ],
...
{code}

> ORC file dump in JSON format
> 
>
> Key: HIVE-10592
> URL: https://issues.apache.org/jira/browse/HIVE-10592
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 1.3.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-10592.1.patch, HIVE-10592.2.patch, 
> HIVE-10592.3.patch
>
>
> ORC file dump uses custom format. Will be useful to dump ORC metadata in json 
> format so that other tools can be built on top it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10595) Dropping a table can cause NPEs in the compactor

2015-05-05 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529236#comment-14529236
 ] 

Sushanth Sowmyan commented on HIVE-10595:
-

And by that, I mean +1 for inclusion to branch-1.2

> Dropping a table can cause NPEs in the compactor
> 
>
> Key: HIVE-10595
> URL: https://issues.apache.org/jira/browse/HIVE-10595
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.14.0, 1.0.0, 1.1.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-10595.patch
>
>
> Reproduction:
> # start metastore with compactor off
> # insert enough entries in a table to trigger a compaction
> # drop the table
> # stop metastore
> # restart metastore with compactor on
> Result:  NPE in the compactor threads.  I suspect this would also happen if 
> the inserts and drops were done in between a run of the compactor, but I 
> haven't proven it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10595) Dropping a table can cause NPEs in the compactor

2015-05-05 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529235#comment-14529235
 ] 

Sushanth Sowmyan commented on HIVE-10595:
-

>From the description, this is an NPE in a pretty core feature, so this 
>qualifies as an outage. Added to 
>https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status

> Dropping a table can cause NPEs in the compactor
> 
>
> Key: HIVE-10595
> URL: https://issues.apache.org/jira/browse/HIVE-10595
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.14.0, 1.0.0, 1.1.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-10595.patch
>
>
> Reproduction:
> # start metastore with compactor off
> # insert enough entries in a table to trigger a compaction
> # drop the table
> # stop metastore
> # restart metastore with compactor on
> Result:  NPE in the compactor threads.  I suspect this would also happen if 
> the inserts and drops were done in between a run of the compactor, but I 
> haven't proven it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10613) HCatSchemaUtils getHCatFieldSchema should include field comment

2015-05-05 Thread Thomas Friedrich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Friedrich updated HIVE-10613:

Attachment: HIVE-10613.1.patch

> HCatSchemaUtils getHCatFieldSchema should include field comment
> ---
>
> Key: HIVE-10613
> URL: https://issues.apache.org/jira/browse/HIVE-10613
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.0.0
>Reporter: Thomas Friedrich
>Assignee: Thomas Friedrich
>Priority: Minor
> Attachments: HIVE-10613.1.patch
>
>
> HCatSchemaUtils.getHCatFieldSchema converts a FieldSchema to a 
> HCatFieldSchema. Instead of initializing the comment property from the 
> FieldSchema object, the comment in the HCatFieldSchema is always set to null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10595) Dropping a table can cause NPEs in the compactor

2015-05-05 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529230#comment-14529230
 ] 

Alan Gates commented on HIVE-10595:
---

[~sushanth], I'd like to add this to 1.2 as it can jam the compactor.  It's 
already patch available, so I should be able to get it in quickly.

[~ekoifman], would you have time to review this?

> Dropping a table can cause NPEs in the compactor
> 
>
> Key: HIVE-10595
> URL: https://issues.apache.org/jira/browse/HIVE-10595
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.14.0, 1.0.0, 1.1.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-10595.patch
>
>
> Reproduction:
> # start metastore with compactor off
> # insert enough entries in a table to trigger a compaction
> # drop the table
> # stop metastore
> # restart metastore with compactor on
> Result:  NPE in the compactor threads.  I suspect this would also happen if 
> the inserts and drops were done in between a run of the compactor, but I 
> haven't proven it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10608) Fix useless 'if' stamement in RetryingMetaStoreClient (135)

2015-05-05 Thread Chaoyu Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529228#comment-14529228
 ] 

Chaoyu Tang commented on HIVE-10608:


+1 (non-binding)

> Fix useless 'if' stamement in RetryingMetaStoreClient (135)
> ---
>
> Key: HIVE-10608
> URL: https://issues.apache.org/jira/browse/HIVE-10608
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
>Priority: Minor
> Attachments: rb33861.patch
>
>
> "if" statement below is useless because it ends with ;
> {code}
>   } catch (MetaException e) {
> if (e.getMessage().matches("(?s).*(IO|TTransport)Exception.*"));
> caughtException = e;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10614) schemaTool upgrade from 0.14.0 to 1.3.0 causes failure

2015-05-05 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529229#comment-14529229
 ] 

Sushanth Sowmyan commented on HIVE-10614:
-

Marked as outage, approved for 1.2

> schemaTool upgrade from 0.14.0 to 1.3.0 causes failure
> --
>
> Key: HIVE-10614
> URL: https://issues.apache.org/jira/browse/HIVE-10614
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
>Priority: Critical
>
> ./schematool -dbType mysql -upgradeSchemaFrom 0.14.0 -verbose
> {code}
> ++--+
> | 
>|
> ++--+
> | < HIVE-7018 Remove Table and Partition tables column LINK_TARGET_ID from 
> Mysql for other DBs do not have it >  |
> ++--+
> 1 row selected (0.004 seconds)
> 0: jdbc:mysql://node-1.example.com/hive> DROP PROCEDURE IF EXISTS 
> RM_TLBS_LINKID
> No rows affected (0.005 seconds)
> 0: jdbc:mysql://node-1.example.com/hive> DROP PROCEDURE IF EXISTS 
> RM_PARTITIONS_LINKID
> No rows affected (0.006 seconds)
> 0: jdbc:mysql://node-1.example.com/hive> DROP PROCEDURE IF EXISTS RM_LINKID
> No rows affected (0.002 seconds)
> 0: jdbc:mysql://node-1.example.com/hive> CREATE PROCEDURE RM_TLBS_LINKID() 
> BEGIN IF EXISTS (SELECT * FROM `INFORMATION_SCHEMA`.`COLUMNS` WHERE 
> `TABLE_NAME` = 'TBLS' AND `COLUMN_NAME` = 'LINK_TARGET_ID') THEN ALTER TABLE 
> `TBLS` DROP FOREIGN KEY `TBLS_FK3` ; ALTER TABLE `TBLS` DROP KEY `TBLS_N51` ; 
> ALTER TABLE `TBLS` DROP COLUMN `LINK_TARGET_ID` ; END IF; END
> Error: You have an error in your SQL syntax; check the manual that 
> corresponds to your MySQL server version for the right syntax to use near '' 
> at line 1 (state=42000,code=1064)
> Closing: 0: jdbc:mysql://node-1.example.com/hive?createDatabaseIfNotExist=true
> org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore 
> state would be inconsistent !!
> org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore 
> state would be inconsistent !!
>   at 
> org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:229)
>   at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:468)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: java.io.IOException: Schema script failed, errorcode 2
>   at 
> org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:355)
>   at 
> org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:326)
>   at 
> org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:224)
> {code}
> Looks like HIVE-7018 has introduced stored procedure as part of mysql upgrade 
> script and it is causing issues with schematool upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10591) Support limited integer type promotion in ORC

2015-05-05 Thread Szehon Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-10591:
-
Attachment: HIVE-10591.2.patch

Attaching again due to build machine error.

> Support limited integer type promotion in ORC
> -
>
> Key: HIVE-10591
> URL: https://issues.apache.org/jira/browse/HIVE-10591
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 1.3.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-10591.1.patch, HIVE-10591.2.patch, 
> HIVE-10591.2.patch
>
>
> ORC currently does not support schema-on-read. If we alter an ORC table with 
> 'int' type to 'bigint' and if we query the altered table ClassCastException 
> will be thrown as the schema on read from table descriptor will expect 
> LongWritable whereas ORC will return IntWritable based on file schema stored 
> within ORC file. OrcSerde currently doesn't do any type conversions or type 
> promotions for performance reasons in inner loop. Since smallints, ints and 
> bigints are stored in the same way in ORC, it will be possible be allow such 
> type promotions without hurting performance. Following type promotions can be 
> supported without any casting
> smallint -> int
> smallint -> bigint
> int -> bigint
> Tinyint promotion is not possible without casting as tinyints are stored 
> using RLE byte writer whereas smallints, ints and bigints are stored using 
> RLE integer writer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10608) Fix useless 'if' stamement in RetryingMetaStoreClient (135)

2015-05-05 Thread Alexander Pivovarov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-10608:
---
Attachment: rb33861.patch

patch #1

> Fix useless 'if' stamement in RetryingMetaStoreClient (135)
> ---
>
> Key: HIVE-10608
> URL: https://issues.apache.org/jira/browse/HIVE-10608
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
>Priority: Minor
> Attachments: rb33861.patch
>
>
> "if" statement below is useless because it ends with ;
> {code}
>   } catch (MetaException e) {
> if (e.getMessage().matches("(?s).*(IO|TTransport)Exception.*"));
> caughtException = e;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10610) hive command fails to get hadoop version

2015-05-05 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529202#comment-14529202
 ] 

Sushanth Sowmyan commented on HIVE-10610:
-

+1.

Also approved for 1.2, since this is a trivial patch. Editing description to 
note "NO PRECOMMIT TESTS"

> hive command fails to get hadoop version
> 
>
> Key: HIVE-10610
> URL: https://issues.apache.org/jira/browse/HIVE-10610
> Project: Hive
>  Issue Type: Bug
>Reporter: Shwetha G S
> Attachments: HIVE-10610.patch
>
>
> If debug level logging is enabled, hive command fails with the following 
> exception:
> {noformat}
> apache-hive-1.2.0-SNAPSHOT-bin$ ./bin/hive
> Unable to determine Hadoop version information from 13:54:07,683
> 'hadoop version' returned:
> 2015-05-05 13:54:08,014 DEBUG - [main:] ~ version: 2.5.0-cdh5.3.3 
> (VersionInfo:171) Hadoop 2.5.0-cdh5.3.3 Subversion 
> http://github.com/cloudera/hadoop -r 82a65209d6e9e4a2b41fdbcd8190c7ea38730627 
> Compiled by jenkins on 2015-04-08T22:00Z Compiled with protoc 2.5.0 From 
> source with checksum 1531e104cdad7489656f44875f3334b This command was run 
> using 
> /Users/sshivalingamurthy/installs/hadoop-2.5.0-cdh5.3.3/share/hadoop/common/hadoop-common-2.5.0-cdh5.3.3.jar
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10610) hive command fails to get hadoop version

2015-05-05 Thread Sushanth Sowmyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-10610:

Description: 
NO PRECOMMIT TESTS

If debug level logging is enabled, hive command fails with the following 
exception:
{noformat}
apache-hive-1.2.0-SNAPSHOT-bin$ ./bin/hive
Unable to determine Hadoop version information from 13:54:07,683
'hadoop version' returned:
2015-05-05 13:54:08,014 DEBUG - [main:] ~ version: 2.5.0-cdh5.3.3 
(VersionInfo:171) Hadoop 2.5.0-cdh5.3.3 Subversion 
http://github.com/cloudera/hadoop -r 82a65209d6e9e4a2b41fdbcd8190c7ea38730627 
Compiled by jenkins on 2015-04-08T22:00Z Compiled with protoc 2.5.0 From source 
with checksum 1531e104cdad7489656f44875f3334b This command was run using 
/Users/sshivalingamurthy/installs/hadoop-2.5.0-cdh5.3.3/share/hadoop/common/hadoop-common-2.5.0-cdh5.3.3.jar
{noformat}

  was:
If debug level logging is enabled, hive command fails with the following 
exception:
{noformat}
apache-hive-1.2.0-SNAPSHOT-bin$ ./bin/hive
Unable to determine Hadoop version information from 13:54:07,683
'hadoop version' returned:
2015-05-05 13:54:08,014 DEBUG - [main:] ~ version: 2.5.0-cdh5.3.3 
(VersionInfo:171) Hadoop 2.5.0-cdh5.3.3 Subversion 
http://github.com/cloudera/hadoop -r 82a65209d6e9e4a2b41fdbcd8190c7ea38730627 
Compiled by jenkins on 2015-04-08T22:00Z Compiled with protoc 2.5.0 From source 
with checksum 1531e104cdad7489656f44875f3334b This command was run using 
/Users/sshivalingamurthy/installs/hadoop-2.5.0-cdh5.3.3/share/hadoop/common/hadoop-common-2.5.0-cdh5.3.3.jar
{noformat}


> hive command fails to get hadoop version
> 
>
> Key: HIVE-10610
> URL: https://issues.apache.org/jira/browse/HIVE-10610
> Project: Hive
>  Issue Type: Bug
>Reporter: Shwetha G S
> Attachments: HIVE-10610.patch
>
>
> NO PRECOMMIT TESTS
> If debug level logging is enabled, hive command fails with the following 
> exception:
> {noformat}
> apache-hive-1.2.0-SNAPSHOT-bin$ ./bin/hive
> Unable to determine Hadoop version information from 13:54:07,683
> 'hadoop version' returned:
> 2015-05-05 13:54:08,014 DEBUG - [main:] ~ version: 2.5.0-cdh5.3.3 
> (VersionInfo:171) Hadoop 2.5.0-cdh5.3.3 Subversion 
> http://github.com/cloudera/hadoop -r 82a65209d6e9e4a2b41fdbcd8190c7ea38730627 
> Compiled by jenkins on 2015-04-08T22:00Z Compiled with protoc 2.5.0 From 
> source with checksum 1531e104cdad7489656f44875f3334b This command was run 
> using 
> /Users/sshivalingamurthy/installs/hadoop-2.5.0-cdh5.3.3/share/hadoop/common/hadoop-common-2.5.0-cdh5.3.3.jar
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10591) Support limited integer type promotion in ORC

2015-05-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529200#comment-14529200
 ] 

Hive QA commented on HIVE-10591:




{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12730389/HIVE-10591.2.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3738/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3738/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3738/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Tests exited with: ExecutionException: java.util.concurrent.ExecutionException: 
java.lang.IllegalArgumentException: resource batch-exec.vm not found.
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12730389 - PreCommit-HIVE-TRUNK-Build

> Support limited integer type promotion in ORC
> -
>
> Key: HIVE-10591
> URL: https://issues.apache.org/jira/browse/HIVE-10591
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 1.3.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-10591.1.patch, HIVE-10591.2.patch
>
>
> ORC currently does not support schema-on-read. If we alter an ORC table with 
> 'int' type to 'bigint' and if we query the altered table ClassCastException 
> will be thrown as the schema on read from table descriptor will expect 
> LongWritable whereas ORC will return IntWritable based on file schema stored 
> within ORC file. OrcSerde currently doesn't do any type conversions or type 
> promotions for performance reasons in inner loop. Since smallints, ints and 
> bigints are stored in the same way in ORC, it will be possible be allow such 
> type promotions without hurting performance. Following type promotions can be 
> supported without any casting
> smallint -> int
> smallint -> bigint
> int -> bigint
> Tinyint promotion is not possible without casting as tinyints are stored 
> using RLE byte writer whereas smallints, ints and bigints are stored using 
> RLE integer writer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10607) Combination of ReducesinkDedup + TopN optimization yields incorrect result if there are multiple GBY in reducer

2015-05-05 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529191#comment-14529191
 ] 

Sushanth Sowmyan commented on HIVE-10607:
-

HIVE-10607 is currently marked as an outage in 
https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status, so 
yes, this is approved. Please go ahead and commit.

> Combination of ReducesinkDedup + TopN optimization yields incorrect result if 
> there are multiple GBY in reducer
> ---
>
> Key: HIVE-10607
> URL: https://issues.apache.org/jira/browse/HIVE-10607
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer, Tez
>Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.1.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-10607.patch
>
>
> {code:sql}
> select ctinyint, count(cdouble) from (select ctinyint, cdouble from 
> alltypesorc group by ctinyint, cdouble) t1 group by ctinyint order by 
> ctinyint limit 20;
> {code}
> This gives different result set depending on which set of optimizations are 
> on. In particular in .q test environment following two invocations will give 
> you different result set:
> {code}
> *   mvn test -Phadoop-2 -Dtest.output.overwrite=true 
> -Dtest=TestMiniTezCliDriver -Dqfile=test.q 
> -Dhive.optimize.reducededuplication.min.reducer=1 
> -Dhive.limit.pushdown.memory.usage=0.3f
> *   mvn test -Phadoop-2 -Dtest.output.overwrite=true 
> -Dtest=TestMiniTezCliDriver -Dqfile=test.q 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10563) MiniTezCliDriver tests ordering issues

2015-05-05 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529181#comment-14529181
 ] 

Thejas M Nair commented on HIVE-10563:
--

Note that the sort directive will not work for queries that involve limit. 
There is at least one such query in this case. But for other queries, the sort 
directive would be better.


> MiniTezCliDriver tests ordering issues
> --
>
> Key: HIVE-10563
> URL: https://issues.apache.org/jira/browse/HIVE-10563
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-10563.1.patch
>
>
> There are a bunch of tests related to TestMiniTezCliDriver which gives 
> ordering issues when run on Centos/Windows/OSX



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-10060) Provide more informative stage description in Spark Web UI [Spark Branch]

2015-05-05 Thread Jimmy Xiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang reassigned HIVE-10060:
--

Assignee: Jimmy Xiang

> Provide more informative stage description in Spark Web UI [Spark Branch]
> -
>
> Key: HIVE-10060
> URL: https://issues.apache.org/jira/browse/HIVE-10060
> Project: Hive
>  Issue Type: Sub-task
>  Components: spark-branch
>Affects Versions: 1.2.0
>Reporter: Chao Sun
>Assignee: Jimmy Xiang
>Priority: Minor
>
> Currently, for HoS in Spark Web UI, it displays stage information like the 
> following:
> Description:
> foreachAsync at RemoteHiveSparkClient.java:254
> org.apache.spark.api.java.JavaPairRDD.foreachAsync(JavaPairRDD.scala:45)
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:254)
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:366)
> It would be better to provide more useful information, like what is the part 
> of the query that this stage is about. Looks like this is implemented in 
> SparkSQL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 >

1 - 100 of 190 matches

Mail list logo