Re: hiveserver2 hangs

2015-09-06 Thread Sanjeev Verma
*I am getting the following exception when the HS2 is crashing, any idea
why it has happening*

"pool-1-thread-121" prio=4 tid=19283 RUNNABLE
at java.lang.OutOfMemoryError.(OutOfMemoryError.java:48)
at java.util.Arrays.copyOf(Arrays.java:2271)
Local Variable: byte[]#1
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutput
Stream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
Local Variable: org.apache.thrift.TByteArrayOutputStream#42
Local Variable: byte[]#5378
at org.apache.thrift.transport.TSaslTransport.write(TSaslTransport.java:446)
at org.apache.thrift.transport.TSaslServerTransport.write(TSasl
ServerTransport.java:41)
at org.apache.thrift.protocol.TBinaryProtocol.writeI32(TBinaryP
rotocol.java:163)
at org.apache.thrift.protocol.TBinaryProtocol.writeString(TBina
ryProtocol.java:186)
Local Variable: byte[]#2
at org.apache.hive.service.cli.thrift.TStringColumn$TStringColu
mnStandardScheme.write(TStringColumn.java:490)
Local Variable: java.util.ArrayList$Itr#1
at org.apache.hive.service.cli.thrift.TStringColumn$TStringColu
mnStandardScheme.write(TStringColumn.java:433)
Local Variable: org.apache.hive.service.cli.thrift.TStringColumn$TStringColu
mnStandardScheme#1
at org.apache.hive.service.cli.thrift.TStringColumn.write(TStri
ngColumn.java:371)
at org.apache.hive.service.cli.thrift.TColumn.standardSchemeWri
teValue(TColumn.java:381)
Local Variable: org.apache.hive.service.cli.thrift.TColumn#504
Local Variable: org.apache.hive.service.cli.thrift.TStringColumn#453
at org.apache.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:244)
at org.apache.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:213)
at org.apache.thrift.TUnion.write(TUnion.java:152)



On Fri, Aug 21, 2015 at 6:16 AM, kulkarni.swar...@gmail.com <
kulkarni.swar...@gmail.com> wrote:

> Sanjeev,
>
> One possibility is that you are running into[1] which affects hive 0.13.
> Is it possible for you to apply the patch on [1] and see if it fixes your
> problem?
>
> [1] https://issues.apache.org/jira/browse/HIVE-10410
>
> On Thu, Aug 20, 2015 at 6:12 PM, Sanjeev Verma 
> wrote:
>
>> We are using hive-0.13 with hadoop1.
>>
>> On Thu, Aug 20, 2015 at 11:49 AM, kulkarni.swar...@gmail.com <
>> kulkarni.swar...@gmail.com> wrote:
>>
>>> Sanjeev,
>>>
>>> Can you tell me more details about your hive version/hadoop version etc.
>>>
>>> On Wed, Aug 19, 2015 at 1:35 PM, Sanjeev Verma <
>>> sanjeev.verm...@gmail.com> wrote:
>>>
 Can somebody gives me some pointer to looked upon?

 On Wed, Aug 19, 2015 at 9:26 AM, Sanjeev Verma <
 sanjeev.verm...@gmail.com> wrote:

> Hi
> We are experiencing a strange problem with the hiveserver2, in one of
> the job it gets the GC limit exceed from mapred task and hangs even having
> enough heap available.we are not able to identify what causing this issue.
> Could anybody help me identify the issue and let me know what pointers
> I need to looked up.
>
> Thanks
>


>>>
>>>
>>> --
>>> Swarnim
>>>
>>
>>
>
>
> --
> Swarnim
>


RE: Bucketing- Identify Number of Buckets

2015-09-06 Thread Mich Talebzadeh
Hi Saurabh.

 

Bucketing in Hive refers to hash partitioning where a hashing function is
applied. Likewise an RDBMS, Hive will apply a linear hashing algorithm to
prevent data from clustering within specific partitions. Hashing is very
effective if the column selected for bucketing has very high selectivity
like an ID column where selectivity (select
count(distinct(column))/count(column) ) = 1.  In this case, the created
partitions/ files will be as evenly sized as possible. In a nutshell
bucketing is a method to get data evenly distributed over many
partitions/files.  One should define the number of buckets by a power of two
-- 2^n,  like 2, 4, 8, 16 etc to achieve best results. Again bucketing will
help concurrency in Hive. It may even allow a partition wise join i.e. a
join between two tables that are bucketed on the same column with the same
number of buckets (anyone has tried this?).

 

You need to work out the selectivity of column you are using for bucketing
using the above formulae or something similar then decide on the number of
buckets say clustered by (object_id) into 256 buckets.

 

HTH

 

Mich Talebzadeh

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.
pdf

Author of the books "A Practitioner's Guide to Upgrading to Sybase ASE 15",
ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN
978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN:
978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume
one out shortly

 

http://talebzadehmich.wordpress.com

 

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Technology
Ltd, its subsidiaries or their employees, unless expressly so stated. It is
the responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Ltd, its subsidiaries nor their employees accept
any responsibility.

 

-Original Message-
From: Db-Blog [mailto:mpp.databa...@gmail.com] 
Sent: 06 September 2015 21:22
To: user@hive.apache.org
Subject: Re: Bucketing- Identify Number of Buckets

 

Details of Hive Version:

I am using Hive -14.0 with Tez as execution engine. 

 

Thanks,

Saurabh

 

Sent from my iPhone, please avoid typos.

 

> On 07-Sep-2015, at 1:51 am, Db-Blog < 
mpp.databa...@gmail.com> wrote:

> 

> Hi, 

> 

> I need to join two big tables in hive. The join key is the grain of both
these tables, hence clustering and sorting on the same will provide
significant performance optimisation while joining.  

> 

> However, i am not sure how to calculate the exact number of buckets while
creating these tables. Can someone please share any pointers on the same? 

> 

> Planning to keep these Clustered and Sorted tables as parquet/orc- for
columnar storage and better compression. 

> 

> Thanks,

> Saurabh

> 

> Sent from my iPhone, please avoid typos.



Re: Bucketing- Identify Number of Buckets

2015-09-06 Thread Db-Blog
Details of Hive Version:
I am using Hive -14.0 with Tez as execution engine. 

Thanks,
Saurabh

Sent from my iPhone, please avoid typos.

> On 07-Sep-2015, at 1:51 am, Db-Blog  wrote:
> 
> Hi, 
> 
> I need to join two big tables in hive. The join key is the grain of both 
> these tables, hence clustering and sorting on the same will provide 
> significant performance optimisation while joining.  
> 
> However, i am not sure how to calculate the exact number of buckets while 
> creating these tables. Can someone please share any pointers on the same? 
> 
> Planning to keep these Clustered and Sorted tables as parquet/orc- for 
> columnar storage and better compression. 
> 
> Thanks,
> Saurabh
> 
> Sent from my iPhone, please avoid typos.


Bucketing- Identify Number of Buckets

2015-09-06 Thread Db-Blog
Hi, 

I need to join two big tables in hive. The join key is the grain of both these 
tables, hence clustering and sorting on the same will provide significant 
performance optimisation while joining.  

However, i am not sure how to calculate the exact number of buckets while 
creating these tables. Can someone please share any pointers on the same? 

Planning to keep these Clustered and Sorted tables as parquet/orc- for columnar 
storage and better compression. 

Thanks,
Saurabh

Sent from my iPhone, please avoid typos.