Re: How to obtain concurrent query executions

2016-09-29 Thread Jose Rozanec
Thanks!

2016-09-28 12:55 GMT-03:00 Frank Luo <j...@merkleinc.com>:

> If you are using Hadoop 2.7 or newer, you can use
> mapreduce.job.running.map.limit and mapreduce.job.running.reduce.limit to
> restrict map and reduce tasks at each job level.
>
> Another way is to use Scheduler to limit queue size.
>
>
>
> *From:* Jose Rozanec [mailto:jose.roza...@mercadolibre.com]
> *Sent:* Tuesday, September 27, 2016 5:54 PM
> *To:* user@hive.apache.org
> *Subject:* How to obtain concurrent query executions
>
>
>
> Hi,
>
>
>
> We have a Hive cluster. We notice that some queries consume all resources,
> which is not desirable to us, since we want to grant some degree of
> parallelism to incoming ones: any incoming query should be able to do at
> least some progress, not just wait the big one finish.
>
>
>
> Is there way to do so? We use Hive 2.1.0 with Tez engine.
>
>
>
> Thank you in advance,
>
>
>
> Joze.
>
> *Access the Q2 2016 Digital Marketing Report for a fresh set of trends and
> benchmarks in digital marketing*
> <http://www2.merkleinc.com/l/47252/2016-07-26/47gt7c>
>
> *Download our latest report titled “The Case for Change: Exploring the
> Myths of Customer-Centric Transformation” *
> <http://www2.merkleinc.com/l/47252/2016-08-04/4b9p7c>
>
> This email and any attachments transmitted with it are intended for use by
> the intended recipient(s) only. If you have received this email in error,
> please notify the sender immediately and then delete it. If you are not the
> intended recipient, you must not keep, use, disclose, copy or distribute
> this email without the author’s prior permission. We take precautions to
> minimize the risk of transmitting software viruses, but we advise you to
> perform your own virus checks on any attachment to this message. We cannot
> accept liability for any loss or damage caused by software viruses. The
> information contained in this communication may be confidential and may be
> subject to the attorney-client privilege.
>


Hive queries rejected under heavy load

2016-09-28 Thread Jose Rozanec
Hi,

We have a Hive cluster (Hive 2.1.0+Tez 0.8.4) which works well for most
queries. Though for some heavy ones we observe that sometimes are able to
execute and sometimes get rejected. We are not sure why we get a rejection
instead of getting them enqueued and wait for execution until resources in
cluster are available again. We notice that the connection waits for a
minute, and if fails to assign resources, will drop the query.
Looking at configuration parameters, is not clear to us if this can be
changed. Did anyone had a similar experience and can provide us some
guidance?

Thank you in advance,

Joze.


How to obtain concurrent query executions

2016-09-27 Thread Jose Rozanec
Hi,

We have a Hive cluster. We notice that some queries consume all resources,
which is not desirable to us, since we want to grant some degree of
parallelism to incoming ones: any incoming query should be able to do at
least some progress, not just wait the big one finish.

Is there way to do so? We use Hive 2.1.0 with Tez engine.

Thank you in advance,

Joze.


Query consuming all resources

2016-09-27 Thread Jose Rozanec
Hi,

We have a Hive cluster. We notice that some queries consume all resources,
which is not desirable to us, since we want to grant some degree of
parallelism to incoming ones: any incoming query should be able to do at
least some progress, not just wait the big one finish.

Is there way to do so? We use Hive 2.1.0 with Tez engine.

Thank you in advance,

Joze.


Upgrading Metastore schema 2.0.0->2.1.0

2016-06-29 Thread Jose Rozanec
Hi all,

Upgrading DB schema from 2.0.0 to 2.1.0 is causing an error. Did anyone
experience similar issues?

Below we leave the command and stacktrace.

Thanks,

*./schematool -dbType mysql -upgradeSchemaFrom 2.0.0*
Starting upgrade metastore schema from version 2.0.0 to 2.1.0
Upgrade script upgrade-2.0.0-to-2.1.0.mysql.sql
Error: Duplicate key name 'CONSTRAINTS_PARENT_TABLE_ID_INDEX'
Query is : CREATE INDEX `CONSTRAINTS_PARENT_TABLE_ID_INDEX` ON
KEY_CONSTRAINTS (`PARENT_TBL_ID`) USING BTREE (state=42000,code=1061)
org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED!
Metastore state would be inconsistent !!
Underlying cause: java.io.IOException : Schema script failed, errorcode 2
Use --verbose for detailed stacktrace.
*** schemaTool failed ***


LDAPS jdbc connection string

2016-06-22 Thread Jose Rozanec
Hi,

We set up a Hive cluster with LDAP and we are able to authenticate and use
if from beeline without issues:


beeline> !connect jdbc:hive2://localhost:1/default Connecting to
jdbc:hive2://localhost:1/default Enter username for
jdbc:hive2://localhost:1/default:
uid=,ou=People,ou=Mexico,dc=ms,dc=com Enter password for
jdbc:hive2://localhost:1/default:

When trying to connect via JDBC, we get rejected ("*Peer indicated failure:
Error validating the login*"). We are not sure where the issue lays, but we
suspect we may not be passing ldap parameters as expected. Did anyone face
a similar issue?

Below the groovy snippet we use to connect against Hive:

@GrabConfig(systemClassLoader= true)
@Grab(group='org.apache.hive', module='hive-jdbc', version='2.0.0')
@Grab(group='org.apache.hadoop', module='hadoop-common', version='2.7.2')
@Grab(group='org.apache.commons', module='commons-csv', version='1.1')
import groovy.sql.Sql;
TimeZone.setDefault(TimeZone.getTimeZone('UTC'))
def master = "masterip"
def port = 1
def jdbcurl = String.format("jdbc:hive2://%s:%s/default", master, port)
def db = [url:jdbcurl, user:'uid=,ou=People,ou=Mexicodc=ms,dc=com',
password:'ourpassword!', driver:'org.apache.hive.jdbc.HiveDriver']
def sql = Sql.newInstance(db.url, db.user, db.password, db.driver)
def query = "select * from ourtable limit 10"
sql.execute query

println "thanks! :)"


Re: LDAPS (Secure LDAP) Hive configuration

2016-06-16 Thread Jose Rozanec
Hi,

Yes, that is correct. We have LDAPS configured on 636, and certificate
available only at that port. 443 is not enabled in our case, and should not
bother, since communication is performed just on 636.



2016-06-15 23:20 GMT-03:00 Anurag Tangri <tangri.anu...@gmail.com>:

>
> Hey Joze,
> Ldaps is a different port like 636 or something. Default port does not
> work as far as I remember.
>
> Could you check if something on these lines ?
>
> Thanks,
> Anurag Tangri
>
> Sent from my iPhone
>
> On Jun 15, 2016, at 3:01 PM, Jose Rozanec <jose.roza...@mercadolibre.com>
> wrote:
>
> Hi,
>
> We upgraded to 2.1.0, but we still cannot get it working: we get "LDAP:
> error code 34 - invalid DN". We double-checked the DN configuration, and
> the ldap team agrees is ok.
> We then configured SSL parameters as well (hive.server2.use.SSL,
> hive.server2.keystore.path, hive.server2.keystore.password), so that Hive
> would know where the truststore is located and its password, but in that
> case we get the following error: "SSLException: Unrecognized SSL message,
> plaintext connection". Our LDAP server does not expose the ssl
> certificate on the default port (443), but in the one LDAPS is configured.
> May that cause some trouble?
>
> We would value any insight or guidance from those who already worked on
> this.
>
> Thanks!
>
> Joze.
>
>
>
>
>
> 2016-06-13 9:45 GMT-03:00 Jose Rozanec <jose.roza...@mercadolibre.com>:
>
>> Thank you for the quick response. Will try upgrading to version 2.1.0
>>
>> Thanks!
>>
>> 2016-06-13 4:34 GMT-03:00 Oleksiy S <osayankin.superu...@gmail.com>:
>>
>>> Hello,
>>>>
>>>> We are working on a Hive 2.0.0 cluster, to configure LDAPS
>>>> authentication, but I get some errors preventing a successful
>>>> authentication.
>>>> Does anyone have some insight on how to solve this?
>>>>
>>>> *The problem*
>>>> The errors we get are (first is most frequent):
>>>> - sun.security.provider.certpath.SunCertPathBuilderException: unable to
>>>> find valid certification path to requested target
>>>> - javax.naming.InvalidNameException: [LDAP: error code 34 - invalid DN]
>>>>
>>>> *Our config*
>>>> We configure the certificate obtaining a jssecacerts file and
>>>> overriding Java's default at master, as specified in this post
>>>> <http://nodsw.com/blog/leeland/2006/12/06-no-more-unable-find-valid-certification-path-requested-target>
>>>> .
>>>>
>>>> *hive-site.xml* has the following properties:
>>>>   
>>>>  hive.server2.authentication
>>>>  LDAP
>>>>   
>>>>   
>>>> hive.server2.authentication.ldap.url
>>>> ldaps://ip:port
>>>>   
>>>>   
>>>> hive.server2.authentication.ldap.baseDN
>>>> dc=net,dc=com
>>>>   
>>>>
>>>> Thanks!
>>>>
>>>> Joze.
>>>>
>>>
>>>
>>> This issue is fixed here
>>> https://issues.apache.org/jira/browse/HIVE-12885
>>>
>>> On Fri, Jun 10, 2016 at 10:41 PM, Jose Rozanec <
>>> jose.roza...@mercadolibre.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> We are working on a Hive 2.0.0 cluster, to configure LDAPS
>>>> authentication, but I get some errors preventing a successful
>>>> authentication.
>>>> Does anyone have some insight on how to solve this?
>>>>
>>>> *The problem*
>>>> The errors we get are (first is most frequent):
>>>> - sun.security.provider.certpath.SunCertPathBuilderException: unable to
>>>> find valid certification path to requested target
>>>> - javax.naming.InvalidNameException: [LDAP: error code 34 - invalid DN]
>>>>
>>>> *Our config*
>>>> We configure the certificate obtaining a jssecacerts file and
>>>> overriding Java's default at master, as specified in this post
>>>> <http://nodsw.com/blog/leeland/2006/12/06-no-more-unable-find-valid-certification-path-requested-target>
>>>> .
>>>>
>>>> *hive-site.xml* has the following properties:
>>>>   
>>>>  hive.server2.authentication
>>>>  LDAP
>>>>   
>>>>   
>>>> hive.server2.authentication.ldap.url
>>>> ldaps://ip:port
>>>>   
>>>>   
>>>> hive.server2.authentication.ldap.baseDN
>>>> dc=net,dc=com
>>>>   
>>>>
>>>> Thanks!
>>>>
>>>> Joze.
>>>>
>>>
>>>
>>>
>>> --
>>> Oleksiy
>>>
>>
>>
>


Re: LDAPS (Secure LDAP) Hive configuration

2016-06-15 Thread Jose Rozanec
Hi,

We upgraded to 2.1.0, but we still cannot get it working: we get "LDAP:
error code 34 - invalid DN". We double-checked the DN configuration, and
the ldap team agrees is ok.
We then configured SSL parameters as well (hive.server2.use.SSL,
hive.server2.keystore.path, hive.server2.keystore.password), so that Hive
would know where the truststore is located and its password, but in that
case we get the following error: "SSLException: Unrecognized SSL message,
plaintext connection". Our LDAP server does not expose the ssl certificate
on the default port (443), but in the one LDAPS is configured. May that
cause some trouble?

We would value any insight or guidance from those who already worked on
this.

Thanks!

Joze.





2016-06-13 9:45 GMT-03:00 Jose Rozanec <jose.roza...@mercadolibre.com>:

> Thank you for the quick response. Will try upgrading to version 2.1.0
>
> Thanks!
>
> 2016-06-13 4:34 GMT-03:00 Oleksiy S <osayankin.superu...@gmail.com>:
>
>> Hello,
>>>
>>> We are working on a Hive 2.0.0 cluster, to configure LDAPS
>>> authentication, but I get some errors preventing a successful
>>> authentication.
>>> Does anyone have some insight on how to solve this?
>>>
>>> *The problem*
>>> The errors we get are (first is most frequent):
>>> - sun.security.provider.certpath.SunCertPathBuilderException: unable to
>>> find valid certification path to requested target
>>> - javax.naming.InvalidNameException: [LDAP: error code 34 - invalid DN]
>>>
>>> *Our config*
>>> We configure the certificate obtaining a jssecacerts file and overriding
>>> Java's default at master, as specified in this post
>>> <http://nodsw.com/blog/leeland/2006/12/06-no-more-unable-find-valid-certification-path-requested-target>
>>> .
>>>
>>> *hive-site.xml* has the following properties:
>>>   
>>>  hive.server2.authentication
>>>  LDAP
>>>   
>>>   
>>> hive.server2.authentication.ldap.url
>>> ldaps://ip:port
>>>   
>>>   
>>> hive.server2.authentication.ldap.baseDN
>>> dc=net,dc=com
>>>   
>>>
>>> Thanks!
>>>
>>> Joze.
>>>
>>
>>
>> This issue is fixed here https://issues.apache.org/jira/browse/HIVE-12885
>>
>> On Fri, Jun 10, 2016 at 10:41 PM, Jose Rozanec <
>> jose.roza...@mercadolibre.com> wrote:
>>
>>> Hello,
>>>
>>> We are working on a Hive 2.0.0 cluster, to configure LDAPS
>>> authentication, but I get some errors preventing a successful
>>> authentication.
>>> Does anyone have some insight on how to solve this?
>>>
>>> *The problem*
>>> The errors we get are (first is most frequent):
>>> - sun.security.provider.certpath.SunCertPathBuilderException: unable to
>>> find valid certification path to requested target
>>> - javax.naming.InvalidNameException: [LDAP: error code 34 - invalid DN]
>>>
>>> *Our config*
>>> We configure the certificate obtaining a jssecacerts file and overriding
>>> Java's default at master, as specified in this post
>>> <http://nodsw.com/blog/leeland/2006/12/06-no-more-unable-find-valid-certification-path-requested-target>
>>> .
>>>
>>> *hive-site.xml* has the following properties:
>>>   
>>>  hive.server2.authentication
>>>  LDAP
>>>   
>>>   
>>> hive.server2.authentication.ldap.url
>>> ldaps://ip:port
>>>   
>>>   
>>> hive.server2.authentication.ldap.baseDN
>>> dc=net,dc=com
>>>   
>>>
>>> Thanks!
>>>
>>> Joze.
>>>
>>
>>
>>
>> --
>> Oleksiy
>>
>
>


Re: Query fails if condition placed on Parquet struct field

2016-05-03 Thread Jose Rozanec
Hi!

Is not due to memory allocation. I found that I am able to perform que
query ok, if I rewrite it as:

select *a.user_agent* from (SELECT *device.user_agent* as *user_agent* FROM
sometable WHERE ds >= '2016-03-30 00' AND ds <= '2016-03-30 01')a where
*a.user_agent* LIKE 'Mozilla%'  LIMIT 1;

I see the amount of mappers and execution time is almost the same, but this
way we are able to execute ok and get the results.
Any ideas why may this happen?



2016-05-03 17:02 GMT-03:00 Haas, Nichole <nichole.h...@concur.com>:

> What are you memory allocations set to?  When using something as expensive
> as LIKE and a date range together, I often have to increase my standard
> memory allocation.
>
> Try changing your memory allocation settings to:
> Key: *​**mapreduce.map.memory.mb**​* Value: *​**2048**​* and Key: *​*
> *mapreduce.map.java.opts**​* Value: *​**-Xmx1500m*
>
> In HUE, this is the settings tab and you enter them manually.  I’m unsure
> about command line.
>
>
> From: Jose Rozanec <jose.roza...@mercadolibre.com>
> Reply-To: "user@hive.apache.org" <user@hive.apache.org>
> Date: Tuesday, May 3, 2016 at 12:45 PM
> To: "user@hive.apache.org" <user@hive.apache.org>
> Subject: Query fails if condition placed on Parquet struct field
>
> Hello,
>
> We are running queries on Hive against parquet files.
> In the schema definition, we have a parquet struct called device with a
> string field user_agent.
>
> If we run query from Example 1, it returns results as expected.
> If we run query from Example 2, execution fails and exits with error.
>
> Did anyone face a similar case?
>
> Thanks!
>
> *Example 1:*
> SELECT *device.user_agent* FROM sometable WHERE ds >= '2016-03-30 00' AND
> ds <= '2016-03-30 01' LIMIT 1;
>
> *Example 2:*
> SELECT *device.user_agent* FROM sometable WHERE ds >= '2016-03-30 00' AND
> ds <= '2016-03-30 01' AND *device.user_agent* LIKE 'Mozilla%'  LIMIT 1;
>
>
> The error and trace we get is:
>
> Exception from container-launch.
> FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> Container exited with a non-zero exit code 1
>
> *Stack trace: ExitCodeException exitCode=1:*
> *at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)*
> *at org.apache.hadoop.util.Shell.run(Shell.java:456)*
> *at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)*
> *at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)*
> *at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)*
> *at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)*
> *at java.util.concurrent.FutureTask.run(FutureTask.java:262)*
> *at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)*
> *at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)*
> *at java.lang.Thread.run(Thread.java:745)*
>
>
> --
>
> This e-mail message is authorized for use by the intended recipient only
> and may contain information that is privileged and confidential. If you
> received this message in error, please call us immediately at (425)
> 590-5000 and ask to speak to the message sender. Please do not copy,
> disseminate, or retain this message unless you are the intended recipient.
> In addition, to ensure the security of your data, please do not send any
> unencrypted credit card or personally identifiable information to this
> email address. Thank you.
>


Query fails if condition placed on Parquet struct field

2016-05-03 Thread Jose Rozanec
Hello,

We are running queries on Hive against parquet files.
In the schema definition, we have a parquet struct called device with a
string field user_agent.

If we run query from Example 1, it returns results as expected.
If we run query from Example 2, execution fails and exits with error.

Did anyone face a similar case?

Thanks!

*Example 1:*
SELECT *device.user_agent* FROM sometable WHERE ds >= '2016-03-30 00' AND
ds <= '2016-03-30 01' LIMIT 1;

*Example 2:*
SELECT *device.user_agent* FROM sometable WHERE ds >= '2016-03-30 00' AND
ds <= '2016-03-30 01' AND *device.user_agent* LIKE 'Mozilla%'  LIMIT 1;


The error and trace we get is:

Exception from container-launch.
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
Container exited with a non-zero exit code 1

*Stack trace: ExitCodeException exitCode=1:*
* at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)*
* at org.apache.hadoop.util.Shell.run(Shell.java:456)*
* at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)*
* at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)*
* at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)*
* at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)*
* at java.util.concurrent.FutureTask.run(FutureTask.java:262)*
* at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)*
* at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)*
* at java.lang.Thread.run(Thread.java:745)*


Re: Hue 3.7.1 issue at EMR 4.3.0: fails to install Hive Editor

2016-04-04 Thread Jose Rozanec
Hello,

Just for the record: we found out that a property
at /etc/hue/conf.empty/hue.ini was not matching Hive configurations: at
Hive we had authentication set to a value different from NONE; while at Hue
configuration we found security_enabled=false
After changing the value to security_enabled=true and restarting the
process, we got it running as expected.

2016-04-04 13:00 GMT-03:00 Jose Rozanec <jose.roza...@mercadolibre.com>:

> Hello,
>
> Today morning we started getting an issue at Hue instance, when creating
> an EMR cluster with Hive/Hue. At this link we provide the trace we get,
> when attempting to setup Hue's Hive Editor: http://pastebin.com/LTN15k7r
> Did anyone face this? A few days ago we had no issues creating a cluster
> with this same configurations.
>
> Thank you in advance,
>


Hue 3.7.1 issue at EMR 4.3.0: fails to install Hive Editor

2016-04-04 Thread Jose Rozanec
Hello,

Today morning we started getting an issue at Hue instance, when creating an
EMR cluster with Hive/Hue. At this link we provide the trace we get, when
attempting to setup Hue's Hive Editor: http://pastebin.com/LTN15k7r
Did anyone face this? A few days ago we had no issues creating a cluster
with this same configurations.

Thank you in advance,