Re: How do I determine a library mismatch between jdbc client and server?

2016-09-28 Thread Stephen Sprague
you might just end up using your own heuristics.  if the port is "alive"
(ie. you can list it via netstat or telnet to it) but you can't connect...
then you got yourself a problem.

kinda like a bootstrapping problem, eh? you need to connect to get the
version but you can't connect if you don't have the right version.

my 2 cents.

On Wed, Sep 28, 2016 at 4:13 PM, Bear Giles  wrote:

> Hi, I'm trying to do development in an environment where we have a mixed
> bag of clusters. Some Cloudera, some Hortonworks, different versions of
> each, etc.
>
> (I don't know if we'll see this mix in the field but we need the variety
> for testing our software.)
>
> Sometimes the poor overworked developer (cough) grabs the wrong jars for
> the server. In at least one case there was no indication of what the
> problem was - all I saw was a connection timeout. That caused some
> confusion since mismatched jars was pretty far down my list of likely
> causes for that error.
>
> Is there a standard way to query for the remote version before trying to
> establish a JDBC connection? I know that I can check the Connection's
> DatabaseMetaData information once I've established a connection but that
> doesn't help me when I'm getting nothing but a timeout exception.
>
>
>
> Bear Giles
>
> Sr. Java Application Engineer
> bgi...@snaplogic.com
> Mobile: 720-354-0766
> SnapLogic.com  | We're Hiring
> !
> 
> 
>   
>   
> 
>
>
>
> 
>
> SnapLogic Inc | 929 Pearl St #200 | Boulder | 80302 | Colorado
>
> SnapLogic Inc | 2 W Fifth Avenue Fourth Floor | San Mateo | 94402 |
> California
>
> This message is confidential. It may also be privileged or otherwise
> protected by work product immunity or other legal rules. If you have
> received it by mistake, please let us know by e-mail reply and delete it
> from your system; you may not copy this message or disclose its contents to
> anyone. The integrity and security of this message cannot be guaranteed on
> the Internet.
>


How do I determine a library mismatch between jdbc client and server?

2016-09-28 Thread Bear Giles
Hi, I'm trying to do development in an environment where we have a mixed
bag of clusters. Some Cloudera, some Hortonworks, different versions of
each, etc.

(I don't know if we'll see this mix in the field but we need the variety
for testing our software.)

Sometimes the poor overworked developer (cough) grabs the wrong jars for
the server. In at least one case there was no indication of what the
problem was - all I saw was a connection timeout. That caused some
confusion since mismatched jars was pretty far down my list of likely
causes for that error.

Is there a standard way to query for the remote version before trying to
establish a JDBC connection? I know that I can check the Connection's
DatabaseMetaData information once I've established a connection but that
doesn't help me when I'm getting nothing but a timeout exception.



Bear Giles

Sr. Java Application Engineer
bgi...@snaplogic.com
Mobile: 720-354-0766
SnapLogic.com  | We're Hiring
!


  
  






SnapLogic Inc | 929 Pearl St #200 | Boulder | 80302 | Colorado

SnapLogic Inc | 2 W Fifth Avenue Fourth Floor | San Mateo | 94402 |
California

This message is confidential. It may also be privileged or otherwise
protected by work product immunity or other legal rules. If you have
received it by mistake, please let us know by e-mail reply and delete it
from your system; you may not copy this message or disclose its contents to
anyone. The integrity and security of this message cannot be guaranteed on
the Internet.


Re: Hive queries rejected under heavy load

2016-09-28 Thread Stephen Sprague
gotta start by looking at the logs and run the local client to eliminate
HS2.   perhaps running hive as such:

$ hive -hiveconf hive.root.logger=DEBUG,console

do you see any smoking gun?

On Wed, Sep 28, 2016 at 7:34 AM, Jose Rozanec  wrote:

> Hi,
>
> We have a Hive cluster (Hive 2.1.0+Tez 0.8.4) which works well for most
> queries. Though for some heavy ones we observe that sometimes are able to
> execute and sometimes get rejected. We are not sure why we get a rejection
> instead of getting them enqueued and wait for execution until resources in
> cluster are available again. We notice that the connection waits for a
> minute, and if fails to assign resources, will drop the query.
> Looking at configuration parameters, is not clear to us if this can be
> changed. Did anyone had a similar experience and can provide us some
> guidance?
>
> Thank you in advance,
>
> Joze.
>
>
>


Re: how to dynamically find out hivesever2's host name?

2016-09-28 Thread Vihang Karajgaonkar
One of the changes (HIVE-14063) which I have been working on can potentially 
solve your problem. The change essentially detects the hiveserver2 host from 
the hive-site.xml. The change is still in review and not yet available but you 
can do something similar.


> On Sep 28, 2016, at 8:55 AM, Frank Luo  wrote:
> 
> I am trying to use one set of scripts for different Hadoop clusters in 
> different environments, for example DEV/QA/PROD environments with 
> corresponding clusters.
>  
> The difficulty I am facing is that the host name of hiveserver2 is a part of 
> the connection url, which has to vary between environments. So the most 
> straight forward way is to create one file for each environment, which is not 
> very “smart”.
>  
> So my question is whether possible to dynamically figure out the host name if 
> I am on one of the machines in a cluster and I have access to all config 
> files. If so, how to do it?
>  
> Thx
>  
>  
> Access the Q2 2016 Digital Marketing Report for a fresh set of trends and 
> benchmarks in digital marketing 
> 
> Download our latest report titled “The Case for Change: Exploring the Myths 
> of Customer-Centric Transformation” 
> 
> This email and any attachments transmitted with it are intended for use by 
> the intended recipient(s) only. If you have received this email in error, 
> please notify the sender immediately and then delete it. If you are not the 
> intended recipient, you must not keep, use, disclose, copy or distribute this 
> email without the author’s prior permission. We take precautions to minimize 
> the risk of transmitting software viruses, but we advise you to perform your 
> own virus checks on any attachment to this message. We cannot accept 
> liability for any loss or damage caused by software viruses. The information 
> contained in this communication may be confidential and may be subject to the 
> attorney-client privilege.
> 



how to dynamically find out hivesever2's host name?

2016-09-28 Thread Frank Luo
I am trying to use one set of scripts for different Hadoop clusters in 
different environments, for example DEV/QA/PROD environments with corresponding 
clusters.

The difficulty I am facing is that the host name of hiveserver2 is a part of 
the connection url, which has to vary between environments. So the most 
straight forward way is to create one file for each environment, which is not 
very “smart”.

So my question is whether possible to dynamically figure out the host name if I 
am on one of the machines in a cluster and I have access to all config files. 
If so, how to do it?

Thx



Access the Q2 2016 Digital Marketing Report for a fresh set of trends and 
benchmarks in digital 
marketing

Download our latest report titled “The Case for Change: Exploring the Myths of 
Customer-Centric Transformation” 


This email and any attachments transmitted with it are intended for use by the 
intended recipient(s) only. If you have received this email in error, please 
notify the sender immediately and then delete it. If you are not the intended 
recipient, you must not keep, use, disclose, copy or distribute this email 
without the author’s prior permission. We take precautions to minimize the risk 
of transmitting software viruses, but we advise you to perform your own virus 
checks on any attachment to this message. We cannot accept liability for any 
loss or damage caused by software viruses. The information contained in this 
communication may be confidential and may be subject to the attorney-client 
privilege.


RE: How to obtain concurrent query executions

2016-09-28 Thread Frank Luo
If you are using Hadoop 2.7 or newer, you can use 
mapreduce.job.running.map.limit and mapreduce.job.running.reduce.limit to 
restrict map and reduce tasks at each job level.

Another way is to use Scheduler to limit queue size.

From: Jose Rozanec [mailto:jose.roza...@mercadolibre.com]
Sent: Tuesday, September 27, 2016 5:54 PM
To: user@hive.apache.org
Subject: How to obtain concurrent query executions

Hi,

We have a Hive cluster. We notice that some queries consume all resources, 
which is not desirable to us, since we want to grant some degree of parallelism 
to incoming ones: any incoming query should be able to do at least some 
progress, not just wait the big one finish.

Is there way to do so? We use Hive 2.1.0 with Tez engine.

Thank you in advance,

Joze.

Access the Q2 2016 Digital Marketing Report for a fresh set of trends and 
benchmarks in digital 
marketing

Download our latest report titled “The Case for Change: Exploring the Myths of 
Customer-Centric Transformation” 


This email and any attachments transmitted with it are intended for use by the 
intended recipient(s) only. If you have received this email in error, please 
notify the sender immediately and then delete it. If you are not the intended 
recipient, you must not keep, use, disclose, copy or distribute this email 
without the author’s prior permission. We take precautions to minimize the risk 
of transmitting software viruses, but we advise you to perform your own virus 
checks on any attachment to this message. We cannot accept liability for any 
loss or damage caused by software viruses. The information contained in this 
communication may be confidential and may be subject to the attorney-client 
privilege.


Hive queries rejected under heavy load

2016-09-28 Thread Jose Rozanec
Hi,

We have a Hive cluster (Hive 2.1.0+Tez 0.8.4) which works well for most
queries. Though for some heavy ones we observe that sometimes are able to
execute and sometimes get rejected. We are not sure why we get a rejection
instead of getting them enqueued and wait for execution until resources in
cluster are available again. We notice that the connection waits for a
minute, and if fails to assign resources, will drop the query.
Looking at configuration parameters, is not clear to us if this can be
changed. Did anyone had a similar experience and can provide us some
guidance?

Thank you in advance,

Joze.


hiveserver2 hostname

2016-09-28 Thread Shylaja H. Nagenhalli
How do I dynamically get the configured hiveserver2 hostname for a cluster?

Thanks, Shyla

Access the Q2 2016 Digital Marketing Report for a fresh set of trends and 
benchmarks in digital 
marketing

Download our latest report titled “The Case for Change: Exploring the Myths of 
Customer-Centric Transformation” 


This email and any attachments transmitted with it are intended for use by the 
intended recipient(s) only. If you have received this email in error, please 
notify the sender immediately and then delete it. If you are not the intended 
recipient, you must not keep, use, disclose, copy or distribute this email 
without the author’s prior permission. We take precautions to minimize the risk 
of transmitting software viruses, but we advise you to perform your own virus 
checks on any attachment to this message. We cannot accept liability for any 
loss or damage caused by software viruses. The information contained in this 
communication may be confidential and may be subject to the attorney-client 
privilege.


Re: Configure hiveserver2 logs

2016-09-28 Thread Chetna C
Hi Kishore,
Setting above 3 parameters enables query/operations logs. If you are
looking for HiveServer2 process log, you'll need to configure it while
launching HiveServer2 Process  with following jvm params
"-Dhive.log.dir= -Dhive.log.file=hive-server2.log"

You can refer to "Hive Logging" section on GettingStarted

page for more details.

Thanks,
Chetna Chaudhari

On 26 September 2016 at 21:26, kishore kumar  wrote:

> Hi Hive Users,
>
> Using hive 1.2 version,
>
> I am connecting hiveserver2 via jdbc connection, could any one suggest me
> how to configure log file
>
> using this link https://cwiki.apache.org/confluence/display/Hive/
> HiveServer2+Clients
>
> I set these 3 parameters values
>
> - hive.server2.logging.operation.enabled
> 
>  =
> true
> - hive.server2.logging.operation.log.location
> =
> /${username}/logs
> - hive.server2.logging.operation.level
> 
> =VERBOSE
>
> Am i missing something ? kindly help me. I couldn't see any file
> generating logs in defined location.
>
> --
> Thanks,
> KK.
>


Re: Query consuming all resources

2016-09-28 Thread Per Ullberg
What Jörn said. We use the capacity scheduler to be able to give priority
to some user groups over others.

Regards
/Pelle

On Wednesday, September 28, 2016, Jörn Franke  wrote:

> You need to configure queues in yarn and use the fairscheduler. From your
> use case it looks like you need to also configure pre-emption
>
> > On 28 Sep 2016, at 00:52, Jose Rozanec  > wrote:
> >
> > Hi,
> >
> > We have a Hive cluster. We notice that some queries consume all
> resources, which is not desirable to us, since we want to grant some degree
> of parallelism to incoming ones: any incoming query should be able to do at
> least some progress, not just wait the big one finish.
> >
> > Is there way to do so? We use Hive 2.1.0 with Tez engine.
> >
> > Thank you in advance,
> >
> > Joze.
>


-- 

*Per Ullberg*
Data Vault Tech Lead
Odin Uppsala
+46 701612693 <+46+701612693>

Klarna AB (publ)
Sveavägen 46, 111 34 Stockholm
Tel: +46 8 120 120 00 <+46812012000>
Reg no: 556737-0431
klarna.com


Re: Query consuming all resources

2016-09-28 Thread Jörn Franke
You need to configure queues in yarn and use the fairscheduler. From your use 
case it looks like you need to also configure pre-emption 

> On 28 Sep 2016, at 00:52, Jose Rozanec  wrote:
> 
> Hi, 
> 
> We have a Hive cluster. We notice that some queries consume all resources, 
> which is not desirable to us, since we want to grant some degree of 
> parallelism to incoming ones: any incoming query should be able to do at 
> least some progress, not just wait the big one finish.
> 
> Is there way to do so? We use Hive 2.1.0 with Tez engine.
> 
> Thank you in advance,
> 
> Joze.