Running multiple HMS pointing to the same MySQL

2020-02-14 Thread William Shen
Hi,

We're thinking about running multiple instances of Hive Metastore Server
(pointing to the same MySQL store) to tackle the HMS load issue we're
experiencing. We're thinking about having READ only use cases contact one
HMS and the mixed/heavier use cases contact the other HMS. In the past, we
only use the second HMS as a backup to the primary HMS.

Is there any problem, or gotchas, with having multiple HMS used as the
primary servers at the same time?


Thank you


Re: Query Failures

2020-02-14 Thread David Mollitor
Hive has many optimizations.  One is that it will load the data directly
from storage (HDFS) if it's a trivial query.  For example:

Select * from table limit 10;

In natural language it says "give me any ten rows (if available) from the
table."  You don't need the overhead of launching a full mapreduce job for
this.  Just read the rows from the file directly.

Adding additional predicates on the query requires a mapreduce job to do
the heavy lifting.  The error message you're getting is probably the result
of a failed mapreduce job.  Nine times out of ten, the problem is that the
mappers/reducers are not granted enough memory for their YARN containers.

On Tue, Feb 11, 2020, 10:41 AM Pau Tallada  wrote:

> Hi,
>
> Do you have more complete tracebacks?
>
> Missatge de Charles Givre  del dia dt., 11 de febr.
> 2020 a les 2:54:
>
>> Hello Everyone!
>> I recently joined a project that has a Hive/Impala installation and we
>> are experience a significant number of query failures.  We are using an
>> older version of Hive, and unfortunately there's nothing iI can do about
>> that, but I'm wondering is how I can make Hive do better with queries to
>> give our users a better experience.
>>
>> For example, I can execute a basic SELECT * query or SELECT 
>> query without issues.
>>
>> However, if I attempt to:
>> 1.  Add filters
>> 2.  Do a SELECT DISTINCT
>> 3.  Perform basic aggregation
>>
>> I get errors like this: Execution Error, return code 1 from
>> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.
>>
>> Could someone point me to some good guides for querying Hive and/or
>> assisting my engineers in preventing these errors?
>> Thanks,
>>
>>
>
> --
> --
> Pau Tallada Crespí
> Dep. d'Astrofísica i Cosmologia
> Port d'Informació Científica (PIC)
> Tel: +34 93 170 2729
> --
>
>


Re: Query Failures

2020-02-14 Thread David Mollitor
https://community.cloudera.com/t5/Support-Questions/Map-and-Reduce-Error-Java-heap-space/td-p/45874

On Fri, Feb 14, 2020, 6:58 PM David Mollitor  wrote:

> Hive has many optimizations.  One is that it will load the data directly
> from storage (HDFS) if it's a trivial query.  For example:
>
> Select * from table limit 10;
>
> In natural language it says "give me any ten rows (if available) from the
> table."  You don't need the overhead of launching a full mapreduce job for
> this.  Just read the rows from the file directly.
>
> Adding additional predicates on the query requires a mapreduce job to do
> the heavy lifting.  The error message you're getting is probably the result
> of a failed mapreduce job.  Nine times out of ten, the problem is that the
> mappers/reducers are not granted enough memory for their YARN containers.
>
> On Tue, Feb 11, 2020, 10:41 AM Pau Tallada  wrote:
>
>> Hi,
>>
>> Do you have more complete tracebacks?
>>
>> Missatge de Charles Givre  del dia dt., 11 de febr.
>> 2020 a les 2:54:
>>
>>> Hello Everyone!
>>> I recently joined a project that has a Hive/Impala installation and we
>>> are experience a significant number of query failures.  We are using an
>>> older version of Hive, and unfortunately there's nothing iI can do about
>>> that, but I'm wondering is how I can make Hive do better with queries to
>>> give our users a better experience.
>>>
>>> For example, I can execute a basic SELECT * query or SELECT 
>>> query without issues.
>>>
>>> However, if I attempt to:
>>> 1.  Add filters
>>> 2.  Do a SELECT DISTINCT
>>> 3.  Perform basic aggregation
>>>
>>> I get errors like this: Execution Error, return code 1 from
>>> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.
>>>
>>> Could someone point me to some good guides for querying Hive and/or
>>> assisting my engineers in preventing these errors?
>>> Thanks,
>>>
>>>
>>
>> --
>> --
>> Pau Tallada Crespí
>> Dep. d'Astrofísica i Cosmologia
>> Port d'Informació Científica (PIC)
>> Tel: +34 93 170 2729
>> --
>>
>>