Re: [ANNOUNCE] Apache Hive 4.0.0 Released

2024-03-31 Thread Battula, Brahma Reddy
Thank you for your hard work and dedication in releasing Apache Hive version 
4.0.0.

Congratulations to the entire team on this achievement. Keep up the great work!

Does this consider as GA.?

And Looks we need to update in the following location also.?
https://hive.apache.org/general/downloads/


From: Denys Kuzmenko 
Date: Saturday, March 30, 2024 at 00:07
To: user@hive.apache.org , d...@hive.apache.org 

Subject: [ANNOUNCE] Apache Hive 4.0.0 Released

The Apache Hive team is proud to announce the release of Apache Hive

version 4.0.0.



The Apache Hive (TM) data warehouse software facilitates querying and

managing large datasets residing in distributed storage. Built on top

of Apache Hadoop (TM), it provides, among others:



* Tools to enable easy data extract/transform/load (ETL)



* A mechanism to impose structure on a variety of data formats



* Access to files stored either directly in Apache HDFS (TM) or in other

  data storage systems such as Apache HBase (TM)



* Query execution via Apache Hadoop MapReduce, Apache Tez and Apache Spark 
frameworks. (MapReduce is deprecated, and Spark has been removed so the text 
needs to be modified depending on the release version)



For Hive release details and downloads, please visit:

https://hive.apache.org/downloads.html



Hive 4.0.0 Release Notes are available here:

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343343=Text=12310843



We would like to thank the many contributors who made this release

possible.



Regards,



The Apache Hive Team


Re: Announce: Hive-MR3 with Celeborn,

2023-10-31 Thread Battula, Brahma Reddy
Thanks for bringing up this. Good to see that it supports spark and flink.

Have you done comparison between uniffle and celeborn..?


On 30/10/23, 8:01 AM, "Keyong Zhou" mailto:zho...@apache.org>> wrote:


Great to hear this! It's encouraging that Celeborn helps MR3.


Celeborn is a general purpose remote shuffle service that stores and serves
shuffle data (and other intermediate data in the future) to help compute engines
better use disaggregated architecture, as well as become more efficient and
stable for huge shuffle sized jobs.


Currently Celeborn supports Hive on MR, and I think integrating with MR3
provides a good example to support Hive on Tez.


Thanks,
Keyong Zhou


On 2023/10/24 12:08:54 Sungwoo Park wrote:
> Hi Hive users,
>
> Before the impending release of MR3 1.8, we would like to announce the
> release of Hive-MR3 with Celeborn (Hive 3.1.3 on MR3 1.8 with Celeborn
> 0.3.1).
>
> Apache Celeborn [1] is remote shuffle service, similar to Magnet [2] and
> Apache Uniffle [3] (which was discussed in this Hive mailing list a while
> ago). Celeborn officially supports Spark and Flink, and we have implemented
> an MR3-extension for Celeborn.
>
> In addition to all the benefits of using remote shuffle service,
> Hive-MR3-Celeborn supports direct processing of mapper output on the
> reducer side, which means that reducers do not store mapper output on local
> disks (for unordered edges). In this way, Hive-MR3-Celeborn can eliminate
> over 95% of local disk writes when tested on the 10TB TPC-DS benchmark.
> This can be particularly useful when running Hive-MR3 on public clouds
> where fast local disk storage is expensive or not available.
>
> We have documented the usage of Hive-MR3-Celeborn in [4]. You can download
> Hive-MR3-Celeborn in [5].
>
> FYI, MR3 is an execution engine providing native support for Hadoop,
> Kubernetes, and standalone mode [6]. Hive-MR3, its main application,
> provides the performance of LLAP yet is very easy to install and operate.
> If you are using Hive-Tez for running ETL jobs, switching to Hive-MR3 will
> give you a much higher throughput thanks to its advanced resource sharing
> model.
>
> We have recently opened a Slack channel. If interested, please join the
> Slack channel and ask any question on MR3:
>
> https://join.slack.com/t/mr3-help/shared_invite/zt-1wpqztk35-AN8JRDznTkvxFIjtvhmiNg
>  
> 
>
> Thank you,
>
> --- Sungwoo
>
> [1] https://celeborn.apache.org/ 
> [2] https://www.vldb.org/pvldb/vol13/p3382-shen.pdf 
> 
> [3] https://uniffle.apache.org/ 
> [4] https://mr3docs.datamonad.com/docs/mr3/features/celeborn/ 
> 
> [5] https://github.com/mr3project/mr3-release/releases/tag/v1.8 
> 
> [6] https://mr3docs.datamonad.com/ 
>





Re: [ANNOUNCE] Apache Hive 4.0.0-beta-1 Released

2023-08-21 Thread Battula, Brahma Reddy
Nice!! Thanks to all who all make this happen..

Any draft plan GA for 4.0.0. ( If it's already discussed, please provide the 
reference.)


On 15/08/23, 12:13 PM, "Stamatis Zampetakis" mailto:zabe...@apache.org>> wrote:


The Apache Hive team is proud to announce the release of Apache Hive
version 4.0.0-beta-1.


The Apache Hive (TM) data warehouse software facilitates querying and
managing large datasets residing in distributed storage. Built on top
of Apache Hadoop (TM), it provides, among others:


* Tools to enable easy data extract/transform/load (ETL)


* A mechanism to impose structure on a variety of data formats


* Access to files stored either directly in Apache HDFS (TM) or in other
data storage systems such as Apache HBase (TM)


* Query execution via Apache Tez Frameworks.


For Hive release details and downloads, please visit:
https://hive.apache.org/downloads.html 


Hive 4.0.0-beta-1 Release Notes are available here:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12353351=Text=12310843
 



We would like to thank the many contributors who made this release
possible.


Regards,


The Apache Hive Team





Re: TPCDS query degrade with hive-3.1.2 because of wrong estimation for reducers

2022-10-03 Thread Battula, Brahma Reddy
db1c477d76%7C0%7C0%7C638003523457672652%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=AZcLqHcduwJCNoSrPq3XULhuKyX1EueVrso7zNFFuWY%3D=0>
https://issues.apache.org/jira/browse/HIVE-23485<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-23485=05%7C01%7Cbbattula%40visa.com%7C1978a7adfb6a494768ba08daa4d3019f%7C38305e12e15d4ee888b9c4db1c477d76%7C0%7C0%7C638003523457672652%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=fEw7VqFhGZM52jdUN%2B4ZZf2bGdloWro2XLdPxwLr2i0%3D=0>

On Sun, Oct 2, 2022 at 1:56 PM Battula, Brahma Reddy 
mailto:bbatt...@visa.com>> wrote:
+ Attaching the hs2 logs also.

From: "Battula, Brahma Reddy" mailto:bbatt...@visa.com>>
Date: Sunday, 2 October 2022 at 2:16 AM
To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Subject: TPCDS query degrade with hive-3.1.2 because of wrong estimation for 
reducers

Hi All,

We’ve ran TPCDS queries against hive-3.1.2 and trunk(little older version). 
(Attached files suffix “a” is trunk and “v” is 3.1.2)

The query execution time is higher in hive-3.1.2 as number of the reducers 
estimated is less (8) as compared to trunk version where it’s 46.

All the hive/tez/Yarn configs are same in both clusters. Even h/w resources are 
same. And query planner is also same.

The stats in reduce sink phase are not look same.

HIVE_TRUNK_CODE - 2022-09-26T05:58:23,786 INFO  
[07243354-f941-419d-8908-45009762e67d HiveServer2-Handler-Pool: Thread-168]: 
optimizer.ConvertJoinMapJoin (:()) - Join input#1; onlineDataSize:   9628; 
Statistics: Num rows:  359 Data size: 4308  Basic stats: COMPLETE Column stats: 
COMPLETE
HIVE_3.1.2_CODE - 2022-09-27T03:39:45,116 INFO  
[2fd1493c-f1a0-4874-acac-58f28e9c21ea HiveServer2-Handler-Pool: Thread-134]: 
optimizer.ConvertJoinMapJoin (:()) - Join input#1; onlineDataSize: 325856; 
Statistics: Num rows: 8116 Data size: 97392 Basic stats: COMPLETE Column stats: 
COMPLETE

Any idea how the reducers getting underestimated.?






hs2_masked_trunk.log
Description: hs2_masked_trunk.log


Re: [DISCUSS] End of life for Hive 1.x, 2.x, 3.x

2022-05-10 Thread Battula, Brahma Reddy
Agree to Peter and sunchao..

Even we are using the hive 3.x, we might contribute on bugfixes. 

Even I am +1 on 1.x EOL as it's hard to maintain so many releases and time to 
user's migrate to 2.x and 3.x.


On 09/05/22, 10:51 PM, "Chao Sun"  wrote:

Agree to Peter above. I know quite a few projects such as Spark,
Iceberg and Trino/Presto are depending on Hive 2.x and 3.x, and
periodically they may need new fixes in these. Upgrading them to use
4.x seems not an option for now since the core classified artifact has
been removed and the shading issue has to be solved before they can
consume the new jar.

On Mon, May 9, 2022 at 4:10 AM Peter Vary  wrote:
>
> Hi Team,
>
> My experience with the Iceberg community shows that there are some 
sizeable userbase around Hive 2.x. I have seen patches, contributions to Hive 
2.3.x branches, and the tests are in much better shape there.
>
> I would definitely vote for EOL Hive 1.x, but until we have a stable 4.x, 
I would be cautious about slashing 2.x, 3.x branches.
>
> Just my 2 cents.
>
> Peter
>
> On 2022. May 9., at 10:51, Alessandro Solimando 
 wrote:
>
> Hi Stamatis,
> thanks for bringing up this topic, I basically agree on everything you 
wrote.
>
> I just wanted to add that this kind of proposal might sound harsh, 
because in many contexts upgrading is a complex process, but it's in nobody's 
interest to keep release branches that are missing important fixes/improvements 
and that might not meet the quality standards that people expect, as mentioned.
>
> Since we don't have yet a stable 4.x release (only alpha for now) we 
might want to keep supporting the 3.x branch until the first 4.x stable release 
and EOL < 3.x branches, WDYT?
>
> Best regards,
> Alessandro
>
> On Fri, 6 May 2022 at 23:14, Stamatis Zampetakis  
wrote:
>>
>> Hi all,
>>
>> The current master has many critical bug fixes as well as important 
performance improvements that are not backported (and most likely never will) 
to the maintenance branches.
>>
>> Backporting changes from master usually requires adapting the code and 
tests in questions making it a non-trivial and time consuming task.
>>
>> The ASF bylaws require PMCs to deliver high quality software which 
satisfy certain criteria. Cutting new releases from maintenance branches with 
known critical bugs is not compliant with the ASF.
>>
>> CI is unstable in all maintenance branches making the quality of a 
release questionable and merging new PRs rather difficult. Enabling and running 
it frequently in all maintenance branches would require a big amount of 
resources on top of what we already need for master.
>>
>> History has shown that it is very difficult or impossible to properly 
maintain multiple release branches for Hive.
>>
>> I think it would be to the best interest of the project if the PMC 
decided to drop support for maintenance branches and focused on releasing 
exclusively from master.
>>
>> This mail is related to the discussion about the release cadence [1] 
since it would certainly help making Hive releases more regular. I decided to 
start a separate thread to avoid mixing multiple topics together.
>>
>> Looking forward to your thoughts.
>>
>> Best,
>> Stamatis
>>
>> [1] 
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fthread%2Fn245dd23kb2v3qrrfp280w3pto89khxjdata=05%7C01%7Cbbattula%40visa.com%7Ccba1383657724a00f0bb08da31e069bc%7C38305e12e15d4ee888b9c4db1c477d76%7C0%7C0%7C637877137169408371%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=X3BJyzgALXZVnjmd2PzbLrOi4lXMHxEQa8KwA1Pz7BQ%3Dreserved=0
>>
>



RE: Patches to Hive 3.1.2,

2021-11-23 Thread Battula, Brahma Reddy
Thanks, Sungwoo Park!!.
Looks 3.1.2 released on 26 August 2019, any plans to 3.1.3..?

How about cherry-pick following critical issues to branch-3.1 and release..?

From: Sungwoo Park 
Sent: Thursday, August 12, 2021 9:21 PM
To: user@hive.apache.org
Subject: Patches to Hive 3.1.2,

Hello Hive users,

We have updated the repository that backports patches to Hive 3.1.2. Now it 
backports about 350 patches from the master branch to branch-3.1 of November 
2020. You can ignore the last two commits which add MR3 backend and remove Hive 
on Spark.

https://github.com/mr3project/hive-mr3

The focus is mainly on fixing bugs in Hive 3.1.2 and stabilizing the 
performance when using AWS S3. We will keep backporting more patches, so if you 
think important patches are missing, please feel free to create issues.

Hope you find it useful!

--- Sungwoo


Re: Hive servers restarting every few hours

2021-11-22 Thread Battula, Brahma Reddy
Thanks sungwoo Park.

IMO, we should backport HIVE-21206 to branch-3.1.


From: Sungwoo Park 
Date: Wednesday, 13 October 2021 at 12:28 PM
To: user@hive.apache.org 
Subject: Re: Hive servers restarting every few hours
Hi,

For 1, Hive 3.1.2 has a bug which leaks Metastore connections. This was 
reported in HIVE-20600:

https://issues.apache.org/jira/browse/HIVE-20600

You might reproduce the bug by inserting values into a table and checking the 
number of connections, e.g.:
0: jdbc:hive2://blue0:9852/> CREATE TABLE leak_test (id int, value string);
0: jdbc:hive2://blue0:9852/> insert into leak_test values (1, 'hello'), (2, 
'world');
...
0: jdbc:hive2://blue0:9852/> insert into leak_test values (1, 'hello'), (2, 
'world');

2021-08-09T02:15:04,263  INFO [HiveServer2-Background-Pool: Thread-250] 
metastore.HiveMetaStoreClient: Closed a connection to metastore, current 
connections: 20
2021-08-09T02:15:04,269  INFO [HiveServer2-Background-Pool: Thread-250] 
metastore.HiveMetaStoreClient: Opened a connection to metastore, current 
connections: 21

Applying HIVE-21206 can fix the bug:

https://issues.apache.org/jira/browse/HIVE-21206

--- Sungwoo


On Mon, Oct 11, 2021 at 8:34 PM Manikaran Kathuria 
mailto:kathuriamanika...@gmail.com>> wrote:
Hi,
I hope everyone is doing good during this pandemic. I have some questions 
related to hive server configuration. In our current set up, we are running 6 
hive server instances on k8s pods. We are using hive version 3.1.2 with Java 8. 
The container memory associated with each pod is 24G. We are observing that the 
hive servers are crashing with the OOM Java heap error. We have set the max 
heap size to 12G. We are using Parallel GC collectors i.e., PS Scavenge and PS 
MarkSweep for young gen and the old gen GCs respectively. Following are our 
observations-
1. The connections to hive metastore kept increasing. Before the server 
crashed, we have seen the number of connections to metastore as high as 1.2k. 
Connection leakage?
2. We have also observed that a few times the servers crashed because the 
container memory was full. As we have set max heap size to 12G, the servers 
crashing because native memory was full felt strange. On digging the process 
map from another instance using high native memory (chart of the memory used by 
hive server attached), we found that the memory was allocated to multiple 64M 
blocks.These 64M blocks are called arenas. We can limit the memory growth by 
using jemalloc instead of malloc from glibc or setting the maximum number of 
allowed arenas. Is it a common issue in hive servers? Any recommendations on 
how to solve this issue of high native memory being used?
3. Another observation, when the hive servers restarted, we found the Old gen 
space of heap was full but the memory committed to young gen was much lesser 
than the maximum memory allocated to young gen pool. To be specific about one 
of the instances, total heap: 12G: Old Gen memory used: 8G: Young Gen Used 360M 
(Committed: 708M, Max: 4G). [Chart of heap memory usage attached]. This results 
in consecutive full GCs before the server crashes. Should we consider using 
some other GC? Any recommendations or tuning suggestions?
Please find the attached charts.
Any help would be highly appreciated.

Thanks,
Manikaran Kathuria


Re: Future release of hive

2021-09-17 Thread Battula, Brahma Reddy

Can you please give more details on issues which you faced with hive-3.1.2 and 
ranger-2.1.0..?


From: Antoine DUBOIS 
Date: Tuesday, 14 September 2021 at 6:20 PM
To: user@hive.apache.org 
Subject: Future release of hive
Hello
After trying to use hive 3.1.2 for several weeks with ranger, I stop.
It's seems way too complicated and tedious.
I wonder when or even if there will be any more release in the 3.0 branch.
I wonder if Hive 3.0 was just an experience as it seems maintenance is not 
really there.
Is there any plan for Hive 4.0 or should I use Hive 2.8 knowing I'm using 
Hadoop 3 ?
Any insight on hive release cycle woudl be awesome.

i hope you have a nice day.

Antoine DUBOIS



Re: Any best practices for hive upgrade from 1.2.1 to 3.1.2

2021-06-10 Thread Battula, Brahma Reddy
Thanks for prompt reply.

Do you’ve any reference or blogs written for this like what all issues faced 
during upgrade..?

If possible hive community can arrange one syncup call..?




From: Julien Tane 
Date: Thursday, 10 June 2021 at 2:27 PM
To: user@hive.apache.org 
Subject: AW: Any best practices for hive upgrade from 1.2.1 to 3.1.2

Hello,



we did an upgrade of our processes on a new cluster HDP 2.5 (more or less 
1.2.1) --> HDP 3.1 (Hive 3)  and a lot of things were problematic



But mainly  you need to make sure you take care of the int -> String because 
the new version of hive expects the right data type



So I would not expect a upgrade without problem, but YMMV.



my 2c,



Julien




Julien Tane
Big Data Engineer

[Tel.]
+49 721 98993-393
[Fax]
+49 721 98993-66
[E-Mail]
j...@solute.de<mailto:j...@solute.de>


solute GmbH
Zeppelinstraße 15
76185 Karlsruhe
Germany


[Logo Solute]


[SoluteDay]<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.solute.de%2Fger%2Fsoluteday%2F=04%7C01%7Cbbattula%40visa.com%7C68b0deb9f5b34ac5086308d92bed9bf5%7C38305e12e15d4ee888b9c4db1c477d76%7C0%7C0%7C637589122370643729%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000=UYPqg4LqUGSssrCY9m5137dKrvo7gAYngc60rAMGFrc%3D=0>


Marken der solute GmbH | brands of solute GmbH
[Marken]

Geschäftsführer | Managing Director: Dr. Thilo Gans, Bernd Vermaaten
Webseite | www.solute.de 
<https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.solute.de%2F=04%7C01%7Cbbattula%40visa.com%7C68b0deb9f5b34ac5086308d92bed9bf5%7C38305e12e15d4ee888b9c4db1c477d76%7C0%7C0%7C637589122370653686%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000=2h8PynRSdEwC7PDjz6yU8fWMA9JCzbSIgUkgXJ6ry%2BU%3D=0>
Sitz | Registered Office: Karlsruhe
Registergericht | Register Court: Amtsgericht Mannheim
Registernummer | Register No.: HRB 110579
USt-ID | VAT ID: DE234663798

Informationen zum Datenschutz | Information about privacy policy
https://www.solute.de/ger/datenschutz/grundsaetze-der-datenverarbeitung.php 
<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.solute.de%2Fger%2Fdatenschutz%2Fgrundsaetze-der-datenverarbeitung.php=04%7C01%7Cbbattula%40visa.com%7C68b0deb9f5b34ac5086308d92bed9bf5%7C38305e12e15d4ee888b9c4db1c477d76%7C0%7C0%7C637589122370653686%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000=hV2SwQCGCw%2BP9NVl6HCuXliu%2B9VGi9klJYc7H6t8P2o%3D=0>



____________
Von: Battula, Brahma Reddy 
Gesendet: Donnerstag, 10. Juni 2021 09:43:03
An: d...@hive.apache.org; user@hive.apache.org
Betreff: Any best practices for hive upgrade from 1.2.1 to 3.1.2

Hi All,

We are planning to upgrade the hive from 1.2.1 to 3.1.2, can we get any best 
practices ..?

Is any chance to upgrade without down time (if we disable new features like 
managed tables)..?


Thanks.





Any best practices for hive upgrade from 1.2.1 to 3.1.2

2021-06-10 Thread Battula, Brahma Reddy
Hi All,

We are planning to upgrade the hive from 1.2.1 to 3.1.2, can we get any best 
practices ..?

Is any chance to upgrade without down time (if we disable new features like 
managed tables)..?


Thanks.