El 5/4/2011 7:48 AM, Paul Ingles escribió:
For future reference I've posted a little more about our setup here:
http://oobaloo.co.uk/multiple-connections-with-hive
On Tue, May 3, 2011 at 8:01 PM, Paul Ingles <p...@oobaloo.co.uk
<mailto:p...@oobaloo.co.uk>> wrote:
Nothing specifically about our Hive setup although some of us at
Forward have blogged bits and pieces about Hive + Hadoop and have
a few Hadoop/Hive related libs on our GitHub account:
https://github.com/forward.
I've blogged a few bits (http://www.oobaloo.co.uk/) as has one of
my colleagues
(http://blog.fingertap.org/post/1255463384/hive-thrift-client).
Another colleague also presented a little about our setup during a
Hadoop meetup last summer
(http://skillsmatter.com/podcast/home/hadoop-in-context-1591). The
numbers Andy mentioned will be a little out of date but it does
include some screenshots of a few of the surrounding apps we built
that connect to Hive and Hadoop (including a web based Hive query
tool + work queue).
I had a quick search through the mailing lists when we had
connection problems but I think most of it was discussed/resolved
during a chat I had with Shevek from Karmasphere at a London pub
following a Hadoop meetup :)
If you're interested, I've posted a gist
(https://gist.github.com/953926) that contains our HAProxy config;
clients connect to 10000 and are balanced between :10001 and
:10005 on 2 servers (so actually 10 backend servers).
Be happy to talk more about our experience- feel free to ping me
an email off list if you'd like.
On 3 May 2011, at 19:18, Matthew Rathbone wrote:
> Hey Paul,
>
> I'd be very interested in reading about your hadoop/hive setup,
do you have a blog post or anything describing this setup, or some
of the issues you've have with hive?
>
> --
> Matthew Rathbone
> Foursquare | Software Engineer | Server Engineering Team
> matt...@foursquare.com <mailto:matt...@foursquare.com> |
@rathboma | 4sq
>
> On Tuesday, May 3, 2011 at 2:15 PM, Paul Ingles wrote:
> HiveServer does seem to support multiple connections but I think
it still has thread-safety problems
(https://issues.apache.org/jira/browse/HIVE-80).
>>
>> We've (www.forward.co.uk <http://www.forward.co.uk>) certainly
had instability problems with the thrift server in the past and
now run 5 or so instances behind the HAProxy load-balancer
(http://haproxy.1wt.eu/). Since we did that it's been
significantly better.
>>
>> I think the JDBC server still operates using thrift to connect
to the HiveServer so I would expect it to have similar problems
(but I may have got that wrong :)
>>
>>
>> On 3 May 2011, at 18:59, Matthew Rathbone wrote:
>>
>>> Even if it is single threaded it certainly seems to support
multiple connections.
>>>
>>> We run 5 workers all connected at the same time executing a
different query each ( with a different connection per worker).
>>>
>>> Hope that helps
>>>
>>> Matthew
>>> On Tuesday, May 3, 2011 at 1:40 PM, V.Senthil Kumar wrote:
>>> Thanks Matthew. The wiki page
http://wiki.apache.org/hadoop/Hive/HiveServer says
>>>> its single threaded. I have a queue of queries which gets
added dynamically all
>>>> the time. By the time I run 1 query using 1 JDBC connection,
the queue gets
>>>> added more queries and builds up a backlog. So, I was that's
why I was wondering
>>>> whether I can run two or more instances to avoid having a big
backlog in queue.
>>>>
>>>>
>>>>
>>>> ----- Original Message ----
>>>> From: Matthew Rathbone <matt...@foursquare.com
<mailto:matt...@foursquare.com>>
>>>> To: user@hive.apache.org <mailto:user@hive.apache.org>
>>>> Sent: Tue, May 3, 2011 7:46:49 AM
>>>> Subject: Re: HIVE Server multiple instances
>>>>
>>>> Why would you want to run two? I think it is multithreaded,
so you can query it
>>>> from two different connections
>>>>
>>>> --
>>>> Matthew Rathbone
>>>> Foursquare | Software Engineer | Server Engineering Team
>>>> matt...@foursquare.com <mailto:matt...@foursquare.com> |
@rathboma | 4sq
>>>>
>>>> On Monday, May 2, 2011 at 6:41 PM, V.Senthil Kumar wrote:
>>>> Hello,
>>>>>
>>>>> I have one instance of HIVE JDBC server running on port
10000. Can I run
>>>>> another
>>>>>
>>>>> instance on different port ? Would it cause a concurrency
issue on the
>>>>> underlying data warehouse files ? Please clarify.
>>>>>
>>>>> Thanks,
>>>>> V.Senthil Kumar
>>
>
Wow, good piece of information.
Thanks for share it
--
Marcos Luís Ortíz Valmaseda
Software Engineer (Large-Scaled Distributed Systems)
University of Information Sciences,
La Habana, Cuba
Linux User # 418229
http://about.me/marcosortiz