Review Request 68099: SerDe to support Teradata Binary Format

2018-07-29 Thread Lu Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68099/
---

Review request for hive and Carl Steinbach.


Bugs: HIVE-20225
https://issues.apache.org/jira/browse/HIVE-20225


Repository: hive-git


Description
---

When using TPT/BTEQ to export Data from Teradata, Teradata will export binary 
files based on the schema.

A Customized SerDe is needed in order to directly read these files from Hive.

CREATE EXTERNAL TABLE `TABLE1`(
...)
PARTITIONED BY (
...)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.contrib.serde2.TeradataBinarySerde'
STORED AS INPUTFORMAT
 
'org.apache.hadoop.hive.contrib.fileformat.teradata.TeradataBinaryFileInputFormat'
OUTPUTFORMAT
 
'org.apache.hadoop.hive.contrib.fileformat.teradata.TeradataBinaryFileOutputFormat'
LOCATION ...;

SELECT * FROM `TABLE1`;
Problem Statement:

Right now the fast way to export data from Teradata is using TPT. However, the 
Hive could not directly utilize these exported binary format because it doesn't 
have a SerDe for these files.

Result:

Provided with the SerDe, Hive can operate upon the exported Teradata Binary 
Format file transparently.


Diffs
-

  
contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryFileInputFormat.java
 PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryFileOutputFormat.java
 PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryRecordReader.java
 PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataInputStream.java
 PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataOutputStream.java
 PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinarySerde.java
 PRE-CREATION 
  
contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestGeneralFunctions.java
 PRE-CREATION 
  
contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestTeradataBinarySerdeForDate.java
 PRE-CREATION 
  
contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestTeradataBinarySerdeForDecimal.java
 PRE-CREATION 
  
contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestTeradataBinarySerdeForTimeStamp.java
 PRE-CREATION 
  
contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestTeradataBinarySerdeGeneral.java
 PRE-CREATION 


Diff: https://reviews.apache.org/r/68099/diff/1/


Testing
---

Junit tests have been added for Serialization and Deserialization functions


Thanks,

Lu Li



[jira] [Created] (HIVE-20268) Please add OWASP Dependency Check to the build (pom.xml) of HIVE and all associated components

2018-07-29 Thread Albert Baker (JIRA)
Albert Baker created HIVE-20268:
---

 Summary: Please add OWASP Dependency Check to the build (pom.xml) 
of HIVE and all associated components
 Key: HIVE-20268
 URL: https://issues.apache.org/jira/browse/HIVE-20268
 Project: Hive
  Issue Type: New Feature
  Components: Authentication, Authorization, Build Infrastructure, CLI, 
distribution, Encryption, Hive, Parser, Security, Serializers/Deserializers, 
Testing Infrastructure
Affects Versions: All Versions
 Environment: All development, build, test, environments.
Reporter: Albert Baker


Please add OWASP Dependency Check to the build (pom.xml). OWASP DC makes an 
outbound REST call to MITRE Common Vulnerabilities & Exposures (CVE) to perform 
a lookup for each dependant .jar to list any/all known vulnerabilities for each 
jar. This step is needed because a manual MITRE CVE lookup/check on the main 
component does not include checking for vulnerabilities in components or in 
dependant libraries.

OWASP Dependency check : https://www.owasp.org/index.php/OWASP_Dependency_Check 
has plug-ins for most Java build/make types (ant, maven, ivy, gradle).

Also, add the appropriate command to the nightly build to generate a report of 
all known vulnerabilities in any/all third party libraries/dependencies that 
get pulled in. example : mvn -Powasp -Dtest=false -DfailIfNoTests=false clean 
aggregate

Generating this report nightly/weekly will help inform the project's 
development team if any dependant libraries have a reported known 
vulnerailities. Project teams that keep up with removing vulnerabilities on a 
weekly basis will help protect businesses that rely on these open source 
componets.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[ANNOUNCE] New committer: Slim Bouguerra

2018-07-29 Thread Ashutosh Chauhan
Apache Hive's Project Management Committee (PMC) has invited Slim Bouguerra
to become a committer, and we are pleased to announce that he has accepted.

Slim, welcome, thank you for your contributions, and we look forward your
further interactions with the community!

Ashutosh Chauhan (on behalf of the Apache Hive PMC)


[ANNOUNCE] New PMC Member : Vineet Garg

2018-07-29 Thread Ashutosh Chauhan
On behalf of the Hive PMC I am delighted to announce Vineet Garg is joining
Hive PMC.
Thanks Vineet for all your contributions till now. Looking forward to many
more.

Welcome, Vineet!

Thanks,
Ashutosh


[jira] [Created] (HIVE-20267) Expanding WebUI to include form to dynamically config log levels

2018-07-29 Thread Zoltan Chovan (JIRA)
Zoltan Chovan created HIVE-20267:


 Summary: Expanding WebUI to include form to dynamically config log 
levels 
 Key: HIVE-20267
 URL: https://issues.apache.org/jira/browse/HIVE-20267
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Chovan
Assignee: Zoltan Chovan


Expanding the possibility to change the log levels during runtime, the webUI 
can be extended to interact with the Log4j2ConfiguratorServlet, this way it can 
be directly used and users/admins don't need to execute curl commands from 
commandline.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] New PMC Member : Peter Vary

2018-07-29 Thread Vineet Garg
Congratulations Peter!

> On Jul 26, 2018, at 11:25 AM, Ashutosh Chauhan  wrote:
> 
> On behalf of the Hive PMC I am delighted to announce Peter Vary is joining
> Hive PMC.
> Thanks Peter for all your contributions till now. Looking forward to many
> more.
> 
> Welcome, Peter!
> 
> Thanks,
> Ashutosh



Re: [ANNOUNCE] New PMC Member : Sahil Takiar

2018-07-29 Thread Vineet Garg
Congratulations Sahil!

> On Jul 26, 2018, at 11:28 AM, Ashutosh Chauhan  wrote:
> 
> On behalf of the Hive PMC I am delighted to announce Sahil Takiar is
> joining Hive PMC.
> Thanks Sahil for all your contributions till now. Looking forward to many
> more.
> 
> Welcome, Sahil!
> 
> Thanks,
> Ashutosh



Re: [ANNOUNCE] New PMC Member : Vihang Karajgaonkar

2018-07-29 Thread Vineet Garg
Congratulations Vihang!

> On Jul 26, 2018, at 11:27 AM, Ashutosh Chauhan  wrote:
> 
> On behalf of the Hive PMC I am delighted to announce Vihang  Karajgaonkar
> is joining Hive PMC.
> Thanks Vihang for all your contributions till now. Looking forward to many
> more.
> 
> Welcome, Vihang!
> 
> Thanks,
> Ashutosh



Re: [VOTE] Apache Hive 3.1.0 Release Candidate 1

2018-07-29 Thread Vineet Garg
With 3 +1 votes and no -1, the vote passes. I’ll prepare the release now.

Thanks for testing and voting PMCs.

Vineet 

> On Jul 26, 2018, at 9:15 AM, Jesus Camacho Rodriguez 
>  wrote:
> 
> +1
> 
> * Built from sources and ran tests
> * Checked signatures
> 
> -Jesús
> 
> On 7/26/18, 1:41 AM, "Peter Vary"  wrote:
> 
>Never done this before, but here is what I did:
>- Built from source, added my usual configs, and run some basic query tests
>- Downloaded the artifacts from the URL Vineet provided
>- Checked sha and asc signatures
>- Checked that the source tar.gz contains the same files that my repository
> 
>If this considered enough, then +1 from me too.
> 
>> On Jul 26, 2018, at 06:57, Vineet Garg  wrote:
>> 
>> Thanks for voting Vihang and Ashutosh. 
>> 
>> I need 2 more votes for the release. Hive PMC members please test and vote.
>> 
>> Thanks,
>> Vineet
>> 
>>> On Jul 24, 2018, at 4:05 PM, Ashutosh Chauhan  wrote:
>>> 
>>> Built from sources.
>>> Ran some unit tests.
>>> Checked sha checksums.
>>> 
>>> Everything looks good.
>>> +1
>>> 
>>> 
>>> On Tue, Jul 24, 2018 at 2:27 PM Vihang Karajgaonkar
>>>  wrote:
>>> 
 - Built the source and ran some basic hive commands
 - Built standalone-metastore and deployed it with a non-hive
 application. Tested basic metastore operations like create, alter, drop
 on
 tables and partitions
 - Verified the signature for the tar files
 
 RC1 looks good to me.
 
 +1 (non-binding)
 
 On Mon, Jul 23, 2018 at 4:28 PM, Vineet Garg 
 wrote:
 
> Apache Hive 3.1.0 Release Candidate 1 is available here:
> 
> http://people.apache.org/~vgarg/apache-hive-3.1.0-rc-1
> 
> Maven artifacts are available here:
> 
> https://repository.apache.org/content/repositories/orgapachehive-1090/
> 
> Source tag: https://github.com/apache/hive/tree/release-3.1.0-rc1
> 
> Voting will conclude in 72 hours.
> 
> Hive PMC Members: Please test and vote.
> 
> Thanks.
> 
 
>> 
> 
> 
> 



Re: Review Request 68011: HIVE-19770 Support for CBO for queries with multiple same columns in select

2018-07-29 Thread Vineet Garg


> On July 24, 2018, 11:34 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/bucketsortoptimize_insert_4.q.out
> > Lines 84-85 (patched)
> > 
> >
> > This extra operator will result in perf loss. We do have an 
> > optimization rule to remove identity projects. Is that not able to optimize 
> > this select now?
> 
> Vineet Garg wrote:
> I am not sure. This query is now going through CBO resulting in this 
> extra select operator. I can open a jira to further investigate this.

@Ashutosh - isn't this extra select operator beneficial in this case? Since now 
Join operator is operating on less number of columns (virtual and other columns 
are elimiated by select)?


> On July 24, 2018, 11:34 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/llap/enforce_constraint_notnull.q.out
> > Line 3732 (original), 3729 (patched)
> > 
> >
> > Now we are shuffling an extra constant column between vertices. This 
> > will result in perf loss.
> 
> Vineet Garg wrote:
> This again is result of going through CBO. I'll open a jira to 
> investigate this.

HIVE-20266


> On July 24, 2018, 11:34 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/llap/vector_windowing.q.out
> > Lines 210 (patched)
> > 
> >
> > Seems like RSDeDup optimization failed to merge 2 RSs in this case.
> 
> Vineet Garg wrote:
> I had noticed this and had a comment about this in the jira. I plan to 
> open a jira to investigate this.

HIVE-20265


- Vineet


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68011/#review206425
---


On July 27, 2018, 11:14 p.m., Vineet Garg wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68011/
> ---
> 
> (Updated July 27, 2018, 11:14 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Jesús Camacho Rodríguez.
> 
> 
> Bugs: HIVE-19770
> https://issues.apache.org/jira/browse/HIVE-19770
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> See Jira
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java f008c4dfae 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/RowResolver.java 37c841fbd1 
>   ql/src/test/queries/clientpositive/masking_8.q 94e4106101 
>   ql/src/test/results/clientnegative/ambiguous_col.q.out a2915a4a5d 
>   ql/src/test/results/clientnegative/create_view_failure5.q.out d79dc64a30 
>   ql/src/test/results/clientpositive/bucketsortoptimize_insert_4.q.out 
> 6c45fcb7ac 
>   ql/src/test/results/clientpositive/char_udf1.q.out 69d76d7269 
>   ql/src/test/results/clientpositive/keyword_2.q.out f1d63b6e5f 
>   ql/src/test/results/clientpositive/llap/enforce_constraint_notnull.q.out 
> f707ab47be 
>   ql/src/test/results/clientpositive/llap/explainanalyze_2.q.out ab86821f07 
>   ql/src/test/results/clientpositive/llap/explainuser_2.q.out 5f5f5f6015 
>   ql/src/test/results/clientpositive/llap/limit_join_transpose.q.out 
> ebaac18127 
>   ql/src/test/results/clientpositive/llap/schema_evol_orc_acid_part.q.out 
> 97752f3c25 
>   
> ql/src/test/results/clientpositive/llap/schema_evol_orc_acid_part_llap_io.q.out
>  23c33a3141 
>   
> ql/src/test/results/clientpositive/llap/schema_evol_orc_acid_part_update.q.out
>  eeabb8cc61 
>   
> ql/src/test/results/clientpositive/llap/schema_evol_orc_acid_part_update_llap_io.q.out
>  f15a144a96 
>   ql/src/test/results/clientpositive/llap/schema_evol_orc_acid_table.q.out 
> a043b679ae 
>   
> ql/src/test/results/clientpositive/llap/schema_evol_orc_acid_table_llap_io.q.out
>  35c1fae6d0 
>   
> ql/src/test/results/clientpositive/llap/schema_evol_orc_acid_table_update.q.out
>  730d3d2312 
>   
> ql/src/test/results/clientpositive/llap/schema_evol_orc_acid_table_update_llap_io.q.out
>  95bfa2507d 
>   
> ql/src/test/results/clientpositive/llap/schema_evol_orc_acidvec_part_llap_io.q.out
>  7e1cce3f4f 
>   
> ql/src/test/results/clientpositive/llap/schema_evol_orc_acidvec_part_update.q.out
>  242b95e603 
>   
> ql/src/test/results/clientpositive/llap/schema_evol_orc_acidvec_part_update_llap_io.q.out
>  53cb8fc8c4 
>   ql/src/test/results/clientpositive/llap/schema_evol_orc_acidvec_table.q.out 
> 219ad7a82e 
>   
> ql/src/test/results/clientpositive/llap/schema_evol_orc_acidvec_table_llap_io.q.out
>  ce9fe84d1e 
>   
> ql/src/test/results/clientpositive/llap/schema_evol_orc_acidvec_table_update.q.out
>  f8df92faf7 
>   
> 

[jira] [Created] (HIVE-20266) Extra column is being shuffled in cbo as compared to non-cbo

2018-07-29 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-20266:
--

 Summary: Extra column is being shuffled in cbo as compared to 
non-cbo
 Key: HIVE-20266
 URL: https://issues.apache.org/jira/browse/HIVE-20266
 Project: Hive
  Issue Type: Improvement
  Components: Query Planning
Reporter: Vineet Garg
Assignee: Vineet Garg


{code:sql}
CREATE TABLE tablePartitioned (a STRING NOT NULL ENFORCED, b STRING, c STRING 
NOT NULL ENFORCED) PARTITIONED BY (p1 STRING, p2 INT NOT NULL DISABLE);
{code}

{code:sql}
explain INSERT INTO tablePartitioned partition(p1, p2) select key, value, 
value, key as p1, 3 as p2 from src limit 10;
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20265) PTF operator has an extra reducer in CBO

2018-07-29 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-20265:
--

 Summary: PTF operator has an extra reducer in CBO
 Key: HIVE-20265
 URL: https://issues.apache.org/jira/browse/HIVE-20265
 Project: Hive
  Issue Type: Improvement
  Components: Query Planning
Reporter: Vineet Garg
Assignee: Vineet Garg


{code:sql}
explain vectorization detail
select p_mfgr, p_name, p_size, 
min(p_retailprice),
rank() over(distribute by p_mfgr sort by p_name)as r,
dense_rank() over(distribute by p_mfgr sort by p_name) as dr,
p_size, p_size - lag(p_size,1,p_size) over(distribute by p_mfgr sort by p_name) 
as deltaSz
from part
group by p_mfgr, p_name, p_size
{code}

Above query generates extra reducer with CBO on.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20264) Bootstrap repl dump with concurrent write and drop of ACID table makes target inconsistent.

2018-07-29 Thread Sankar Hariappan (JIRA)
Sankar Hariappan created HIVE-20264:
---

 Summary: Bootstrap repl dump with concurrent write and drop of 
ACID table makes target inconsistent.
 Key: HIVE-20264
 URL: https://issues.apache.org/jira/browse/HIVE-20264
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2, repl
Affects Versions: 4.0.0, 3.2.0
Reporter: Sankar Hariappan
Assignee: Sankar Hariappan


During bootstrap dump of ACID tables, let's consider the below sequence.
 - Get lastReplId = last event ID logged.
 - Current session (Thread-1), REPL DUMP -> Open txn (Txn1) - Event-10
 - Another session (Thread-2), Open txn (Txn2) - Event-11
 - Thread-2 -> Insert data (T1.D1) to ACID table. - Event-12
 - Thread-2 -> Commit Txn (Txn2) - Event-13
 - Thread-2 -> Drop table (T1) - Event-14
 - Thread-1 -> Dump ACID tables based on validTxnList based on Txn1. --> This 
step skips all the data written by txns > Txn1. So, T1 will be missing.
 - Thread-1 -> Commit Txn (Txn1)
 - REPL LOAD from bootstrap dump will skip T1.
 - Incremental REPL DUMP will start from Event-10 and hence allocate write id 
for table T1 and drop table(T1) is idempotent. So, at target, exist entries in 
TXN_TO_WRITE_ID and NEXT_WRITE_ID metastore tables.
 - Now, when we create another table at source with same name T1 and replicate, 
then it may lead to incorrect data for readers at target on T1.

Couple of proposals:
1. Make allocate write ID idempotent which is not possible as table doesn't 
exist and MM table import may lead to allocate write id before creating table. 
So, cannot differentiate these 2 cases.
2. Make Drop table event to drop entries from TXN_TO_WRITE_ID and NEXT_WRITE_ID 
tables irrespective of table exist or not at target.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)