[jira] [Created] (HIVE-16158) Correct mistake in documentation for ALTER TABLE … ADD/REPLACE COLUMNS CASCADE

2017-03-09 Thread Illya Yalovyy (JIRA)
Illya Yalovyy created HIVE-16158:


 Summary: Correct mistake in documentation for ALTER TABLE … 
ADD/REPLACE COLUMNS CASCADE
 Key: HIVE-16158
 URL: https://issues.apache.org/jira/browse/HIVE-16158
 Project: Hive
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.0.0
Reporter: Illya Yalovyy


Current documentation says that key word CASCADE was introduced in Hive 0.15 
release. That information is incorrect and confuses users. The feature was 
actually released in Hive 1.1.0. (HIVE-8839) 

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Add/ReplaceColumns



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-15076) Improve scalability of LDAP authentication provider group filter

2016-10-26 Thread Illya Yalovyy (JIRA)
Illya Yalovyy created HIVE-15076:


 Summary: Improve scalability of LDAP authentication provider group 
filter
 Key: HIVE-15076
 URL: https://issues.apache.org/jira/browse/HIVE-15076
 Project: Hive
  Issue Type: Improvement
  Components: Authentication
Affects Versions: 2.1.0
Reporter: Illya Yalovyy
Assignee: Illya Yalovyy


Current implementation uses following algorithm:
#   For a given user find all groups that user is a member of. (A list of LDAP 
groups is constructed as a result of that request)
#  Match this list of groups with provided group filter.
 
Time/Memory complexity of this approach is O(N) on client side, where N – is a 
number of groups the user has membership in. On a large directory (800+ groups 
per user) we can observe up to 2x performance degradation and failures because 
of size of LDAP response (LDAP: error code 4 - Sizelimit Exceeded).
 
Some Directory Services (Microsoft Active Directory for instance) provide a 
virtual attribute for User Object that contains a list of groups that user 
belongs to. This attribute can be used to quickly determine whether this user 
passes or fails the group filter.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14927) Remove code duplication from tests in TestLdapAtnProviderWithMiniDS

2016-10-11 Thread Illya Yalovyy (JIRA)
Illya Yalovyy created HIVE-14927:


 Summary: Remove code duplication from tests in 
TestLdapAtnProviderWithMiniDS
 Key: HIVE-14927
 URL: https://issues.apache.org/jira/browse/HIVE-14927
 Project: Hive
  Issue Type: Improvement
  Components: Test
Reporter: Illya Yalovyy
Assignee: Illya Yalovyy


* Extract inner class User and implement a proper builder for it.
* Extract all common code to LdapAuthenticationTestCase class 
   * setting up the test case
   * executing test case
   * result validation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14875) Enhancement and refactoring of TestLdapAtnProviderWithMiniDS

2016-10-03 Thread Illya Yalovyy (JIRA)
Illya Yalovyy created HIVE-14875:


 Summary: Enhancement and refactoring of 
TestLdapAtnProviderWithMiniDS
 Key: HIVE-14875
 URL: https://issues.apache.org/jira/browse/HIVE-14875
 Project: Hive
  Issue Type: Test
  Components: Authentication, Tests
Reporter: Illya Yalovyy
Assignee: Illya Yalovyy


This makes the following enhancements to TestLdapAtnProviderWithMiniDS:
 
* Extract defined ldifs to a resource file. 
* Remove unneeded attributes defined in each ldif entry such as:
  * sn (Surname) and givenName from group entries
  * distinguishedName from all entries as this attribute serves more
as a parent type of many other attributes.
* Remove setting ExtensibleObject as an objectClass for all ldap entries
  as that is not needed. This objectClass would allow for adding any
  attribute to an entry.
* Add missing uid attribute to group entries whose dn refer to a uid
  attribute
* Add missing uidObject objectClass to entries that have the uid attribute
* Explicitly set organizationalPerson objectClass to user entries as
  they are using inetOrgPerson objectClass which is a subclass of
  the organizationalPerson objectClass
* Create indexes on cn and uid attributes as they are commonly
queried.
* Removed unused variables and imports.
* Fixed givenName for user3.
* Other minor code clean up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14713) LDAP Authentication Provider should be covered with unit tests

2016-09-07 Thread Illya Yalovyy (JIRA)
Illya Yalovyy created HIVE-14713:


 Summary: LDAP Authentication Provider should be covered with unit 
tests
 Key: HIVE-14713
 URL: https://issues.apache.org/jira/browse/HIVE-14713
 Project: Hive
  Issue Type: Test
  Components: Authentication, Tests
Affects Versions: 2.1.0
Reporter: Illya Yalovyy
Assignee: Illya Yalovyy


Currently LdapAuthenticationProviderImpl class is not covered with unit tests. 
To make this class testable some minor refactoring will be required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13510) Dynamic partitioning doesn’t work when remote metastore is used

2016-04-13 Thread Illya Yalovyy (JIRA)
Illya Yalovyy created HIVE-13510:


 Summary: Dynamic partitioning doesn’t work when remote metastore 
is used
 Key: HIVE-13510
 URL: https://issues.apache.org/jira/browse/HIVE-13510
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 2.1.0
 Environment: Hadoop 2.7.1
Reporter: Illya Yalovyy
Assignee: Illya Yalovyy
Priority: Critical


*Steps to reproduce:*
# Configure remote metastore (hive.metastore.uris)
# Create table t1 (a string);
# Create table t2 (a string) partitioned by (b string);
# set hive.exec.dynamic.partition.mode=nonstrict;
# Insert overwrite table t2 partition (b) select a,a from t1;

*Result:*
{noformat}
FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.thrift.TApplicationException: getMetaConf failed: unknown result
16/04/13 15:04:51 [c679e424-2501-4347-8146-cf1b1cae217c main]: ERROR ql.Driver: 
FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.thrift.TApplicationException: getMetaConf failed: unknown result
org.apache.hadoop.hive.ql.parse.SemanticException: 
org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.thrift.TApplicationException: getMetaConf failed: unknown result
at 
org.apache.hadoop.hive.ql.plan.DynamicPartitionCtx.(DynamicPartitionCtx.java:84)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:6550)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:9315)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:9204)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:10071)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9949)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10607)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:358)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10618)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:233)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:245)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:476)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:318)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1192)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1287)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1118)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1106)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:236)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:339)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:748)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:721)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:648)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.thrift.TApplicationException: getMetaConf failed: unknown result
at org.apache.hadoop.hive.ql.metadata.Hive.getMetaConf(Hive.java:3493)
at 
org.apache.hadoop.hive.ql.plan.DynamicPartitionCtx.(DynamicPartitionCtx.java:82)
... 29 more
Caused by: org.apache.thrift.TApplicationException: getMetaConf failed: unknown 
result
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_getMetaConf(ThriftHiveMetastore.java:666)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.getMetaConf(ThriftHiveMetastore.java:646)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getMetaConf(HiveMetaStoreClient.java:550)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 

[jira] [Created] (HIVE-13185) orc.ReaderImp.ensureOrcFooter() method fails on small text files with IndexOutOfBoundsException

2016-02-29 Thread Illya Yalovyy (JIRA)
Illya Yalovyy created HIVE-13185:


 Summary: orc.ReaderImp.ensureOrcFooter() method fails on small 
text files with IndexOutOfBoundsException
 Key: HIVE-13185
 URL: https://issues.apache.org/jira/browse/HIVE-13185
 Project: Hive
  Issue Type: Bug
  Components: ORC
Affects Versions: 2.1.0
Reporter: Illya Yalovyy


Steps to reproduce:
1. Create a Text source table with one line of data:
{code}
create table src (id int);
insert overwrite table src values (1);
{code}
2. Create a target table:
{code}
create table trg (id int);
{code}
3. Try to load small text file to the target table:
{code}
load data inpath 'user/hive/warehouse/src/00_0' into table trg;
{code}

*Error message:*
{quote}
FAILED: SemanticException Unable to load data to destination table. Error: 
java.lang.IndexOutOfBoundsException
{quote}

*Stack trace:*
{noformat}
org.apache.hadoop.hive.ql.parse.SemanticException: Unable to load data to 
destination table. Error: java.lang.IndexOutOfBoundsException
at 
org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.ensureFileFormatsMatch(LoadSemanticAnalyzer.java:340)
at 
org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.analyzeInternal(LoadSemanticAnalyzer.java:224)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:242)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:481)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:317)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1190)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1285)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1116)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1104)
...
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12715) Unit test for HIVE-10685 fix

2015-12-19 Thread Illya Yalovyy (JIRA)
Illya Yalovyy created HIVE-12715:


 Summary: Unit test for HIVE-10685 fix
 Key: HIVE-12715
 URL: https://issues.apache.org/jira/browse/HIVE-12715
 Project: Hive
  Issue Type: Test
Reporter: Illya Yalovyy


It seems like bugfix provided for HIVE-10685 is not covered with tests. This 
tricky scenario could happen not only when table gets concatenated but also in 
some other use cases. I'm going to implement a unit test for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11882) Fetch optimizer should stop source files traversal once it exceeds the hive.fetch.task.conversion.threshold

2015-09-18 Thread Illya Yalovyy (JIRA)
Illya Yalovyy created HIVE-11882:


 Summary: Fetch optimizer should stop source files traversal once 
it exceeds the hive.fetch.task.conversion.threshold
 Key: HIVE-11882
 URL: https://issues.apache.org/jira/browse/HIVE-11882
 Project: Hive
  Issue Type: Improvement
  Components: Physical Optimizer
Affects Versions: 1.0.0
Reporter: Illya Yalovyy


Hive 1.0's fetch optimizer tries to optimize queries of the form "select  
from  where  limit " to a fetch task (see the 
hive.fetch.task.conversion property). This optimization gets the lengths of all 
the files in the specified partition and does some comparison against a 
threshold value to determine whether it should use a fetch task or not (see the 
hive.fetch.task.conversion.threshold property). This process of getting the 
length of all files. One of the main problems in this optimization is the fetch 
optimizer doesn't seem to stop once it exceeds the 
hive.fetch.task.conversion.threshold. It works fine on HDFS, but could cause a 
significant performance degradation on other supported file systems. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11791) Add test for HIVE-10122

2015-09-10 Thread Illya Yalovyy (JIRA)
Illya Yalovyy created HIVE-11791:


 Summary: Add test for HIVE-10122
 Key: HIVE-11791
 URL: https://issues.apache.org/jira/browse/HIVE-11791
 Project: Hive
  Issue Type: Test
  Components: Metastore
Affects Versions: 1.1.0
Reporter: Illya Yalovyy
Priority: Minor


Unit tests for PartitionPruner.compactExpr()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11583) When PTF is used over a large partitions result could be corrupted

2015-08-17 Thread Illya Yalovyy (JIRA)
Illya Yalovyy created HIVE-11583:


 Summary: When PTF is used over a large partitions result could be 
corrupted
 Key: HIVE-11583
 URL: https://issues.apache.org/jira/browse/HIVE-11583
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Affects Versions: 1.2.1, 1.2.0, 1.0.0, 0.13.1, 0.14.0, 0.14.1
 Environment: Hadoop 2.6 + Apache hive built from trunk

Reporter: Illya Yalovyy
Priority: Critical


Dataset: 
 Window has 50001 record (2 blocks on disk and 1 block in memory)
 Size of the second block is 32Mb (2 splits)

Result:
When the last block is read from the disk only first split is actually loaded. 
The second split gets missed. The total count of the result dataset is correct, 
but some records are missing and another are duplicated.

Example:
{code:sql}
CREATE TABLE ptf_big_src (
  id INT,
  key STRING,
  grp STRING,
  value STRING
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

LOAD DATA LOCAL INPATH '../../data/files/ptf_3blocks.txt.gz' OVERWRITE INTO 
TABLE ptf_big_src;

SELECT grp, COUNT(1) cnt FROM ptf_big_trg GROUP BY grp ORDER BY cnt desc;
---
-- A25000
-- B2
-- C5001
---

CREATE TABLE ptf_big_trg AS SELECT *, row_number() OVER (PARTITION BY key ORDER 
BY grp) grp_num FROM ptf_big_src;

SELECT grp, COUNT(1) cnt FROM ptf_big_trg GROUP BY grp ORDER BY cnt desc;
-- 
-- A34296
-- B15704
-- C1
---
{code}
Counts by 'grp' are incorrect!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10980) Merge of dynamic partitions loads all data to default partition

2015-06-10 Thread Illya Yalovyy (JIRA)
Illya Yalovyy created HIVE-10980:


 Summary: Merge of dynamic partitions loads all data to default 
partition
 Key: HIVE-10980
 URL: https://issues.apache.org/jira/browse/HIVE-10980
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 0.14.0
 Environment: HDP 2.2.4 (also reproduced on apache hive built from 
trunk) 
Reporter: Illya Yalovyy


Conditions that lead to the issue:
1. Partition columns have different types
2. Both static and dynamic partitions are used in the query
3. Dynamically generated partitions require merge

Result: Final data is loaded to __HIVE_DEFAULT_PARTITION__.

Steps to reproduce:
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=strict;
set hive.optimize.sort.dynamic.partition=false;
set hive.merge.mapfiles=true;
set hive.merge.mapredfiles=true;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;

create external table sdp (
  dataint bigint,
  hour int,
  req string,
  cid string,
  caid string
)
row format delimited
fields terminated by ',';

load data local inpath '../../data/files/dynpartdata1.txt' into table sdp;
load data local inpath '../../data/files/dynpartdata2.txt' into table sdp;
...
load data local inpath '../../data/files/dynpartdataN.txt' into table sdp;

create table tdp (cid string, caid string)
partitioned by (dataint bigint, hour int, req string);

insert overwrite table tdp partition (dataint=20150316, hour=16, req)
select cid, caid, req from sdp where dataint=20150316 and hour=16;

select * from tdp order by caid;
show partitions tdp;

Example of the input file:
20150316,16,reqA,clusterIdA,cacheId1
20150316,16,reqB,clusterIdB,cacheId2 
20150316,16,reqA,clusterIdC,cacheId3  
20150316,16,reqD,clusterIdD,cacheId4
20150316,16,reqA,clusterIdA,cacheId5  

Actual result:
clusterIdA  cacheId12015031616  
__HIVE_DEFAULT_PARTITION__ 
clusterIdA  cacheId12015031616  
__HIVE_DEFAULT_PARTITION__
clusterIdB  cacheId22015031616  
__HIVE_DEFAULT_PARTITION__
clusterIdC  cacheId32015031616  
__HIVE_DEFAULT_PARTITION__
clusterIdD  cacheId42015031616  
__HIVE_DEFAULT_PARTITION__
clusterIdA  cacheId52015031616  
__HIVE_DEFAULT_PARTITION__
clusterIdD  cacheId82015031616  
__HIVE_DEFAULT_PARTITION__
clusterIdB  cacheId92015031616  
__HIVE_DEFAULT_PARTITION__  
  
dataint=20150316/hour=16/req=__HIVE_DEFAULT_PARTITION__  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9225) Windowing functions are not executing efficiently when the window is identical

2014-12-29 Thread Illya Yalovyy (JIRA)
Illya Yalovyy created HIVE-9225:
---

 Summary: Windowing functions are not executing efficiently when 
the window is identical
 Key: HIVE-9225
 URL: https://issues.apache.org/jira/browse/HIVE-9225
 Project: Hive
  Issue Type: Improvement
  Components: PTF-Windowing
Affects Versions: 0.13.0
 Environment: Linux
Reporter: Illya Yalovyy


Hive optimizer and the runtime are not smart enough to recognize if the 
windowing is the same. Even when the window is identical, the windowing is 
re-executed again and cause the runtime increase proportionally to # of 
windows. 

Example:
{code:sql}
select code,min(emp) over (partition by code order by emp  range between 
current row and 3 following)from sample_big limit 10;
{code}
*Time taken: 1h:36m:12s*

{code:sql}
select code,
min(emp) over (partition by code order by emp  range between current row and 
3 following),
max(emp) over (partition by code order by emp  range between current row and 
3 following),
min(salary) over (partition by code order by emp  range between current row and 
3 following),
max(salary) over (partition by code order by emp  range between current row and 
3 following)
from sample_big limit 10;
{code}
*Time taken: 4h:0m:37s*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)