Re: HIVE Server multiple instances

Marcos Ortiz Wed, 04 May 2011 06:11:33 -0700

El 5/4/2011 7:48 AM, Paul Ingles escribió:

For future reference I've posted a little more about our setup here:http://oobaloo.co.uk/multiple-connections-with-hive

On Tue, May 3, 2011 at 8:01 PM, Paul Ingles <p...@oobaloo.co.uk<mailto:p...@oobaloo.co.uk>> wrote:


    Nothing specifically about our Hive setup although some of us at
    Forward have blogged bits and pieces about Hive + Hadoop and have
    a few Hadoop/Hive related libs on our GitHub account:
    https://github.com/forward.

    I've blogged a few bits (http://www.oobaloo.co.uk/) as has one of
    my colleagues
    (http://blog.fingertap.org/post/1255463384/hive-thrift-client).

    Another colleague also presented a little about our setup during a
    Hadoop meetup last summer
    (http://skillsmatter.com/podcast/home/hadoop-in-context-1591). The
    numbers Andy mentioned will be a little out of date but it does
    include some screenshots of a few of the surrounding apps we built
    that connect to Hive and Hadoop (including a web based Hive query
    tool + work queue).

    I had a quick search through the mailing lists when we had
    connection problems but I think most of it was discussed/resolved
    during a chat I had with Shevek from Karmasphere at a London pub
    following a Hadoop meetup :)

    If you're interested, I've posted a gist
    (https://gist.github.com/953926) that contains our HAProxy config;
    clients connect to 10000 and are balanced between :10001 and
    :10005 on 2 servers (so actually 10 backend servers).

    Be happy to talk more about our experience- feel free to ping me
    an email off list if you'd like.


    On 3 May 2011, at 19:18, Matthew Rathbone wrote:

    > Hey Paul,
    >
    > I'd be very interested in reading about your hadoop/hive setup,
    do you have a blog post or anything describing this setup, or some
    of the issues you've have with hive?
    >
    > --
    > Matthew Rathbone
    > Foursquare | Software Engineer | Server Engineering Team
    > matt...@foursquare.com <mailto:matt...@foursquare.com> |
    @rathboma | 4sq
    >
    > On Tuesday, May 3, 2011 at 2:15 PM, Paul Ingles wrote:
    > HiveServer does seem to support multiple connections but I think
    it still has thread-safety problems
    (https://issues.apache.org/jira/browse/HIVE-80).
    >>
    >> We've (www.forward.co.uk <http://www.forward.co.uk>) certainly
    had instability problems with the thrift server in the past and
    now run 5 or so instances behind the HAProxy load-balancer
    (http://haproxy.1wt.eu/). Since we did that it's been
    significantly better.
    >>
    >> I think the JDBC server still operates using thrift to connect
    to the HiveServer so I would expect it to have similar problems
    (but I may have got that wrong :)
    >>
    >>
    >> On 3 May 2011, at 18:59, Matthew Rathbone wrote:
    >>
    >>> Even if it is single threaded it certainly seems to support
    multiple connections.
    >>>
    >>> We run 5 workers all connected at the same time executing a
    different query each ( with a different connection per worker).
    >>>
    >>> Hope that helps
    >>>
    >>> Matthew
    >>> On Tuesday, May 3, 2011 at 1:40 PM, V.Senthil Kumar wrote:
    >>> Thanks Matthew. The wiki page
    http://wiki.apache.org/hadoop/Hive/HiveServer says
    >>>> its single threaded. I have a queue of queries which gets
    added dynamically all
    >>>> the time. By the time I run 1 query using 1 JDBC connection,
    the queue gets
    >>>> added more queries and builds up a backlog. So, I was that's
    why I was wondering
    >>>> whether I can run two or more instances to avoid having a big
    backlog in queue.
    >>>>
    >>>>
    >>>>
    >>>> ----- Original Message ----
    >>>> From: Matthew Rathbone <matt...@foursquare.com
    <mailto:matt...@foursquare.com>>
    >>>> To: user@hive.apache.org <mailto:user@hive.apache.org>
    >>>> Sent: Tue, May 3, 2011 7:46:49 AM
    >>>> Subject: Re: HIVE Server multiple instances
    >>>>
    >>>> Why would you want to run two? I think it is multithreaded,
    so you can query it
    >>>> from two different connections
    >>>>
    >>>> --
    >>>> Matthew Rathbone
    >>>> Foursquare | Software Engineer | Server Engineering Team
    >>>> matt...@foursquare.com <mailto:matt...@foursquare.com> |
    @rathboma | 4sq
    >>>>
    >>>> On Monday, May 2, 2011 at 6:41 PM, V.Senthil Kumar wrote:
    >>>> Hello,
    >>>>>
    >>>>> I have one instance of HIVE JDBC server running on port
    10000. Can I run
    >>>>> another
    >>>>>
    >>>>> instance on different port ? Would it cause a concurrency
    issue on the
    >>>>> underlying data warehouse files ? Please clarify.
    >>>>>
    >>>>> Thanks,
    >>>>> V.Senthil Kumar
    >>
    >

Wow, good piece of information.
Thanks for share it

--
Marcos Luís Ortíz Valmaseda
 Software Engineer (Large-Scaled Distributed Systems)
 University of Information Sciences,
 La Habana, Cuba
 Linux User # 418229
 http://about.me/marcosortiz

Re: HIVE Server multiple instances

Reply via email to