Konstantin Pelykh wrote:
Thanks for a suggestion, bellow are some details explaining the reason
for such balancer:
I'm basing my application on accumulo-wikipedia example, so there can be
multiple partitions per tablet. Some partitions are larger others are
smaller.

Are you talking about the "sharded" table or the "inverted index" table? Assuming you mean the "sharded" table (given your mention of partitions), a skew here implies a poor choice of a partitioning algorithm. How are you choosing the partitions at ingest time? Hash-based? Something else?

A good hash used to generate your partitions at ingest time should prevent such skew at query time.

There's a possibility to split partition range manually afger
ingestion is complete and rely on default balancer to spread tablets
accross cluster, however in this case some servers end up overloaded
compared to others.
Currently the slowest server (hosting the largest tablet) defines final
time for search query, so I want to distribute entities accorss the
cluster so that they are well balanced and all servers spend simillir
amount of time processing documents though OptimizedQueryIterators.

Konstantin
--------
Big Data / Search Consultant
LinkedIn: linkedin.com/in/kpelykh <http://www.linkedin.com/in/kpelykh>
Website: www.kpelykh.com <http://www.kpelykh.com>

On Wed, Jul 29, 2015 at 9:18 PM, mohit.kaushik <[email protected]
<mailto:[email protected]>> wrote:

    If I am not getting you wrong, for this purpose, you can simply
    pre-split tables based on range to evenly distribute data across
    tablets.
    
https://accumulo.apache.org/1.7/accumulo_user_manual.html#_pre_splitting_tables




    On 07/30/2015 07:46 AM, Konstantin Pelykh wrote:
    In this specific case, ingest happens only once. It's write-once,
    read-many type of application, so with such balancer I would want
    to balance tablets based on number of entities after ingest is
    fully complete.

    --------
    Big Data / Search Consultant
    Cell: +1 (646) 639-3916
    E-mail: [email protected] <mailto:[email protected]>
    LinkedIn: linkedin.com/in/kpelykh <http://www.linkedin.com/in/kpelykh>
    Website: www.kpelykh.com <http://www.kpelykh.com>

    On Wed, Jul 29, 2015 at 6:06 PM, dlmarion <[email protected]
    <mailto:[email protected]>> wrote:

        Hotspotting was the first thing that came to my mind with the
        proposed balancer. The fservers don't keep all the K/V in
        memory. You are balancing query and live ingest across your
        resources.





        -------- Original message --------
        From: Eric Newton <[email protected]
        <mailto:[email protected]>>
        Date: 07/29/2015 8:46 PM (GMT-05:00)
        To: [email protected] <mailto:[email protected]>
        Subject: Re: Entry-based TableBalancer

        To my knowledge, nobody has written such a balancer.

        In the history of the project, we started writing advanced,
        complicated balancers that moved tablets around much too
        quickly, which degraded performance. After that, we wrote much
        simpler balancers to avoid the chaos. We're moving back to
        more complex balancers, but mostly just to ensure that we
        aren't hotspoting, based on known ingest patterns (date
        related, for example).

        If you write a new balancer, make it slow to move tablets, and
        very simple.  Avoid over-optimizing tablet placement.

        -Eric

        On Wed, Jul 29, 2015 at 8:20 PM, Konstantin Pelykh
        <[email protected] <mailto:[email protected]>> wrote:

            Hi,

            I'm looking for a tablet balancer which operates based on
            a number of entries per tablet as opposed to a number of
            tablets per tablet server. My goal is to get even
            distribution of entries across the cluster.

            As an example:

            tablet #1  15M entries
            tablet #2   5M entries
            tablet #3   8M entries

            After balancing tablets I would want to get:

            Server 1 hosts: tablet1
            Server 2 hosts: tablet2, tablet3

            The idea is pretty simple and I believe such balancer has
            already been developed, so I decided to check before
            reinventing the wheel.

            Thanks!
            Konstantin

            --------
            Big Data / Lucene and Solr Consultant
            LinkedIn: linkedin.com/in/kpelykh
            <http://www.linkedin.com/in/kpelykh>
            Website: www.kpelykh.com <http://www.kpelykh.com>





    --

    *Mohit Kaushik*
    Software Engineer
    A Square,Plot No. 278, Udyog Vihar, Phase 2, Gurgaon 122016, India
    *Tel:*+91 (124) 4969352 | *Fax:*+91 (124) 4033553

    <http://politicomapper.orkash.com>interactive social intelligence at
    work...

    <https://www.facebook.com/Orkash2012>
    <http://www.linkedin.com/company/orkash-services-private-limited>
    <https://twitter.com/Orkash> <http://www.orkash.com/blog/>
    <http://www.orkash.com>
    <http://www.orkash.com> ... ensuring Assurance in complexity and
    uncertainty

    /This message including the attachments, if any, is a confidential
    business communication. If you are not the intended recipient it may
    be unlawful for you to read, copy, distribute, disclose or otherwise
    use the information in this e-mail. If you have received it in error
    or are not the intended recipient, please destroy it and notify the
    sender immediately. Thank you /


Reply via email to