[jira] [Created] (HIVE-24819) CombineHiveInputFormat format seems to be returning row count in the multiple of Maps

2021-02-23 Thread Jitender Kumar (Jira)
Jitender Kumar created HIVE-24819:
-

 Summary: CombineHiveInputFormat format seems to be returning row 
count in the multiple of Maps 
 Key: HIVE-24819
 URL: https://issues.apache.org/jira/browse/HIVE-24819
 Project: Hive
  Issue Type: Bug
 Environment: Apache Hive (version 3.1.0.3.1.0.0-78)
Driver: Hive JDBC (version 3.1.0.3.1.0.0-78)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.0.3.1.0.0-78 by Apache Hive
Reporter: Jitender Kumar


Hi Team,

This is the first time I am writing a bug using apache Jira, so pardon me if I 
am unintentionally breaking any protocols. 

I am facing the following issue (on a multi-node cluster) when I set 
hive.tez.input.format to  org.apache.hadoop.hive.ql.io.CombineHiveInputFormat. 

Just for demonstration purposes, I will be executing the following query for 
multiple cases. 

_select count(1) from dbname.personal_data_rc tablesample(1000 rows);_

*Case1*

mapred.map.tasks=2

hive.tez.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat

*Output*

1000

*Case 2*

mapred.map.tasks=2

hive.tez.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat

*Output*

2000

*Case 3*

mapred.map.tasks=3

hive.tez.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat

*Output*

3000

After 3 maps set as default, out remains same, i.e multiple of 3. 

Can you help me understand why if I have TABLESAMPLE set to 1000 rows, it is 
giving me more number of rows? Is there any other property that must be used 
with CombineHiveInputFormat or is it an issue with CombineHiveInputFormat only? 

I have tried to look for a solution but in the end i had to come here. Please 
share your inputs ASAP as one of our client is looking for a solution or 
explaination regarding this? 
For now as a workaround we have changed it to following.  
*hive.tez.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat*

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Need help to create 2.3.9 release in Hive JIRA

2021-02-23 Thread Chao Sun
Bump this again. Can someone create the 2.3.9 release in JIRA, please?

On Thu, Jan 28, 2021 at 10:00 AM Chao Sun  wrote:

> Bump this, also cc Owen who helped me last time (sorry for directly
> emailing you).
>
> On Tue, Jan 19, 2021 at 4:07 PM Chao Sun  wrote:
>
>> Hi,
>>
>> Can someone help me to create 2.3.9 release in Hive JIRA so that we can
>> use that as fixed or targeted version? Thanks.
>>
>> Best,
>> Chao
>>
>


[jira] [Created] (HIVE-24818) REPL LOAD (Bootstrap ) of views with partitions fails

2021-02-23 Thread Anurag Shekhar (Jira)
Anurag Shekhar created HIVE-24818:
-

 Summary: REPL LOAD (Bootstrap ) of views with partitions fails 
 Key: HIVE-24818
 URL: https://issues.apache.org/jira/browse/HIVE-24818
 Project: Hive
  Issue Type: Bug
  Components: repl
Reporter: Anurag Shekhar
Assignee: Anurag Shekhar






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24817) "not in" clause returns incorrect data when there is coercion

2021-02-23 Thread Steve Carlin (Jira)
Steve Carlin created HIVE-24817:
---

 Summary: "not in" clause returns incorrect data when there is 
coercion
 Key: HIVE-24817
 URL: https://issues.apache.org/jira/browse/HIVE-24817
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Steve Carlin


When the query has a where clause that has an integer column checking against 
being "not in" a decimal column, the decimal column is being changed to null, 
causing incorrect results.

This is a sample query of a failure:
select count(*) from my_tbl where int_col not in (355.8);

Since the int_col can never be 355.8, one would expect all the rows to be 
returned, but it is changing the 355.8 into a null value causing no rows to be 
returned.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24816) Upgrade jackson to 2.10.5.1 or 2.11.0+ due to CVE-2020-25649

2021-02-23 Thread Sai Hemanth Gantasala (Jira)
Sai Hemanth Gantasala created HIVE-24816:


 Summary: Upgrade jackson to 2.10.5.1 or 2.11.0+ due to 
CVE-2020-25649
 Key: HIVE-24816
 URL: https://issues.apache.org/jira/browse/HIVE-24816
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Sai Hemanth Gantasala
Assignee: Sai Hemanth Gantasala


Currently, hive is pulling Jackson 2.10.5 version jar. Please upgrade to 
2.10.5.1 or 2.11.0+ due to CVE-2020-25649.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24815) Remove "IDXS" Table from Metastore Schema

2021-02-23 Thread Hunter Logan (Jira)
Hunter Logan created HIVE-24815:
---

 Summary: Remove "IDXS" Table from Metastore Schema
 Key: HIVE-24815
 URL: https://issues.apache.org/jira/browse/HIVE-24815
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Standalone Metastore
Affects Versions: 3.1.2, 3.1.1, 3.0.0, 3.1.0, 3.2.0, 4.0.0
Reporter: Hunter Logan


In Hive 3 the rarely used "INDEXES" was removed from the DDL

https://issues.apache.org/jira/browse/HIVE-18448

 

There are a few issues here:
 # The Standalone-Metastore schema for Hive 3+ all include the "IDXS" table, 
which has no function.
 ** 
[https://github.com/apache/hive/tree/master/standalone-metastore/metastore-server/src/main/sql/mysql]
 # The upgrade schemas from 2.x -> 3.x do not do any cleanup of the IDXS table
 ** If a user used the "INDEXES" feature in 2.x and then upgrades their 
metastore to 3.x+ they cannot drop any table that has an index on it due to 
"IDXS_FK1" constraint since the TBLS entry is referenced in the IDXS table
 ** Since INDEX is no longer in the DDL they cannot run any command from Hive 
to drop the index.
 ** Users can manually connect to the metastore and either drop the IDXS table 
or the foreign key constraint

 

Since indexes provide no benefits in Hive 3+ it should be fine to drop them 
completely in the schema upgrade scripts. At the very least the 2.x -> 3.x+ 
scripts should drop the fk constraint.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24814) Harmonize Hive Date-Time Formats

2021-02-23 Thread David Mollitor (Jira)
David Mollitor created HIVE-24814:
-

 Summary: Harmonize Hive Date-Time Formats
 Key: HIVE-24814
 URL: https://issues.apache.org/jira/browse/HIVE-24814
 Project: Hive
  Issue Type: Improvement
Reporter: David Mollitor
Assignee: David Mollitor


Harmonize Hive on JDK date-time formats.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24813) thrift regeneration is failing with cannot find symbol TABLE_IS_CTAS

2021-02-23 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-24813:


 Summary: thrift regeneration is failing with cannot find symbol 
TABLE_IS_CTAS
 Key: HIVE-24813
 URL: https://issues.apache.org/jira/browse/HIVE-24813
 Project: Hive
  Issue Type: Bug
  Components: Standalone Metastore
Reporter: Attila Magyar
Assignee: Attila Magyar
 Fix For: 4.0.0


{code:java}
[ERROR] 
/Users/amagyar/development/hive/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java:[2145,34]
 cannot find symbol
[ERROR]   symbol:   variable TABLE_IS_CTAS
[ERROR]   location: class org.apache.hadoop.hive.metastore.HMSHandler
[ERROR] 
/Users/amagyar/development/hive/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetastoreDefaultTransformer.java:[591,58]
 cannot find symbol
[ERROR]   symbol:   variable TABLE_IS_CTAS
[ERROR]   location: class 
org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer
[ERROR] -> [Help 1] {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [EXTERNAL] Hive meetup

2021-02-23 Thread Mass Dosage
I'm interested, I'd like to propose talking about future releases and
making these more regular as well as the absolute pain that the Hive build
is with all its flaky unit tests. I know some work has been done on this in
the past but I think it's a huge barrier to new developers, especially
casual ones who want to fix a small bug but can never get all the tests to
pass. Hive-Iceberg is another good topic.

On Tue, 23 Feb 2021 at 11:20, Peter Vary  wrote:

> +1 for the meetup
>
> If the team is interested, we can talk about Hive-Iceberg integration
>
> Thanks,
> Peter
>
> > On Feb 23, 2021, at 04:34, Aasha  wrote:
> >
> > +1
> >
> >> On 22-Feb-2021, at 11:54 PM, Matt McCline 
> >> 
> wrote:
> >>
> >> Definitely interested.
> >>
> >> -Original Message-
> >> From: Zoltan Haindrich 
> >> Sent: Monday, February 22, 2021 10:17 AM
> >> To: dev@hive.apache.org
> >> Subject: [EXTERNAL] Hive meetup
> >>
> >> Hey All!
> >>
> >> It was quite some time ago when we had a meetup - and in these covid
> times it would be online-only anyway :) We were mentioning this lately here
> and there at Cloudera.
> >> I think we could have a few talks spanning 2-3 hours or so.
> >>
> >> Are there any interest in it?
> >>
> >> I would be happy to talk about how hive-test-kube works and how
> hive-dev-box is employed during testing.
> >>
> >> cheers,
> >> Zoltan
>
>


Re: Any plan for new hive 3 or 4 release?

2021-02-23 Thread Mass Dosage
I would love to see a HIve 3.1 release which is capable of being used on
Java 11 like Hive 2 is.

What is the main difference going to be between Hive 3 and 4? The removal
of MR?

On Mon, 22 Feb 2021 at 16:46, Zoltan Haindrich  wrote:

> Hey Michel!
>
> Yes it was a long time ago we had a release; we have quite a few new
> features in master.
> I think we are scaring people for some time now that we will be dropping
> MR support...I think we should do that.
>
> I would really like to see a new Hive release in the near future as well -
> there is no way for users to even try out new features.
> I was planning to add nightly builds to package the latest master's state
> into a deployable artifact - I think a service like may help pretest our
> next release; I think it
> won't take much to do it so I'll probably throw it together in the next
> couple days!
>
> cheers,
> Zoltan
>
> On 2/21/21 2:27 PM, Michel Sumbul wrote:
> > Hi Guys,
> >
> > If I'm not wrong, the last release of Hive 3.x is 18 months old.
> > I wanted to ask if you had any roadmap / plan to release a new version of
> > Hive 3.x or Hive 4?
> >
> > Thanks,
> > Michel
> >
>


[jira] [Created] (HIVE-24812) Disable sharedworkoptimizer remove semijoin by default

2021-02-23 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-24812:
---

 Summary: Disable sharedworkoptimizer remove semijoin by default
 Key: HIVE-24812
 URL: https://issues.apache.org/jira/browse/HIVE-24812
 Project: Hive
  Issue Type: Sub-task
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


SJ removal backfired a bit when I was testing stuff - because of the additional 
opportunities paralleledges may enable ; because it will increased the shuffled 
memory amount and/or even make MJ broadcast inputs larger

set hive.optimize.shared.work.semijoin=false by default for now

right now it's better to leave dppunion to pick up these cases instead of 
removing the SJ fully - after HIVE-24376 we might enable it back 




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [EXTERNAL] Hive meetup

2021-02-23 Thread Peter Vary
+1 for the meetup

If the team is interested, we can talk about Hive-Iceberg integration

Thanks,
Peter

> On Feb 23, 2021, at 04:34, Aasha  wrote:
> 
> +1
> 
>> On 22-Feb-2021, at 11:54 PM, Matt McCline 
>>  wrote:
>> 
>> Definitely interested.
>> 
>> -Original Message-
>> From: Zoltan Haindrich  
>> Sent: Monday, February 22, 2021 10:17 AM
>> To: dev@hive.apache.org
>> Subject: [EXTERNAL] Hive meetup
>> 
>> Hey All!
>> 
>> It was quite some time ago when we had a meetup - and in these covid times 
>> it would be online-only anyway :) We were mentioning this lately here and 
>> there at Cloudera.
>> I think we could have a few talks spanning 2-3 hours or so.
>> 
>> Are there any interest in it?
>> 
>> I would be happy to talk about how hive-test-kube works and how hive-dev-box 
>> is employed during testing.
>> 
>> cheers,
>> Zoltan