If I am not getting you wrong, for this purpose, you can simply pre-split tables based on range to evenly distribute data across tablets.
https://accumulo.apache.org/1.7/accumulo_user_manual.html#_pre_splitting_tables



On 07/30/2015 07:46 AM, Konstantin Pelykh wrote:
In this specific case, ingest happens only once. It's write-once, read-many type of application, so with such balancer I would want to balance tablets based on number of entities after ingest is fully complete.

--------
Big Data / Search Consultant
Cell: +1 (646) 639-3916
E-mail: [email protected] <mailto:[email protected]>
LinkedIn: linkedin.com/in/kpelykh <http://www.linkedin.com/in/kpelykh>
Website: www.kpelykh.com <http://www.kpelykh.com>

On Wed, Jul 29, 2015 at 6:06 PM, dlmarion <[email protected] <mailto:[email protected]>> wrote:

    Hotspotting was the first thing that came to my mind with the
    proposed balancer. The fservers don't keep all the K/V in memory.
    You are balancing query and live ingest across your resources.





    -------- Original message --------
    From: Eric Newton <[email protected]
    <mailto:[email protected]>>
    Date: 07/29/2015 8:46 PM (GMT-05:00)
    To: [email protected] <mailto:[email protected]>
    Subject: Re: Entry-based TableBalancer

    To my knowledge, nobody has written such a balancer.

    In the history of the project, we started writing advanced,
    complicated balancers that moved tablets around much too quickly,
    which degraded performance. After that, we wrote much simpler
    balancers to avoid the chaos. We're moving back to more complex
    balancers, but mostly just to ensure that we aren't hotspoting,
    based on known ingest patterns (date related, for example).

    If you write a new balancer, make it slow to move tablets, and
    very simple.  Avoid over-optimizing tablet placement.

    -Eric

    On Wed, Jul 29, 2015 at 8:20 PM, Konstantin Pelykh
    <[email protected] <mailto:[email protected]>> wrote:

        Hi,

        I'm looking for a tablet balancer which operates based on a
        number of entries per tablet as opposed to a number of tablets
        per tablet server. My goal is to get even distribution of
        entries across the cluster.

        As an example:

        tablet #1  15M entries
        tablet #2   5M entries
        tablet #3   8M entries

        After balancing tablets I would want to get:

        Server 1 hosts: tablet1
        Server 2 hosts: tablet2, tablet3

        The idea is pretty simple and I believe such balancer has
        already been developed, so I decided to check before
        reinventing the wheel.

        Thanks!
        Konstantin

        --------
        Big Data / Lucene and Solr Consultant
        LinkedIn: linkedin.com/in/kpelykh
        <http://www.linkedin.com/in/kpelykh>
        Website: www.kpelykh.com <http://www.kpelykh.com>





--
Signature

*Mohit Kaushik*
Software Engineer
A Square,Plot No. 278, Udyog Vihar, Phase 2, Gurgaon 122016, India
*Tel:*+91 (124) 4969352 | *Fax:*+91 (124) 4033553

<http://politicomapper.orkash.com>interactive social intelligence at work...

<https://www.facebook.com/Orkash2012> <http://www.linkedin.com/company/orkash-services-private-limited> <https://twitter.com/Orkash> <http://www.orkash.com/blog/> <http://www.orkash.com>
<http://www.orkash.com> ... ensuring Assurance in complexity and uncertainty

/This message including the attachments, if any, is a confidential business communication. If you are not the intended recipient it may be unlawful for you to read, copy, distribute, disclose or otherwise use the information in this e-mail. If you have received it in error or are not the intended recipient, please destroy it and notify the sender immediately. Thank you /

Reply via email to