Could also use JVisualVM which is capable of giving some better reports on benchmarks compared to manually inspecting jstacks.

Keith Turner wrote:
You can try sampling using jstack as a simple and quick way to profile.
Jstack a process writing rfiles ~10 times, with some pause he tween.
Then look at a particular thread writing data across the jstack saves,
do you see the same code being executed in multiple jstacks?  If so what
code is that?

Sent from phone. Please excuse typos and brevity.

On Jan 3, 2015 12:46 AM, "Ara Ebrahimi" <[email protected]
<mailto:[email protected]>> wrote:

    Hi,

    I’m trying to optimize our map/reduce job which generates RFiles
    using AccumuloFileOutputFormat. We have a specific time window and
    within that time window we need to generate a predefined amount of
    simulation data and in terms of number of core we also have an upper
    bound we can use. Disks are also fixed at 4 per node and they are
    all SSDs. So I can’t employ more machines or more disks or cores to
    achieve higher write/s numbers.

    So far we’ve managed to utilize 100% of all available cores and the
    SSD disks are also highly utilized. I’m trying to reduce processing
    time and we are willing to waste more disk space to achieve higher
    data generation speed. The data itself is 10s of columns of floating
    numbers, all serialized to fixed 9-byte values which doesn’t lend
    well to compression. With no compression and replication set to 1 we
    can generate the same amount of data in almost half the time. With
    snappy it’s almost 10% more data generation time compared to no
    compression and almost twice more size on disk for the all the
    generated RFiles.

    dataBlockSize doesn’t seem to change anything for non-compressed
    data. indexBlockSize also didn't change anything (tried 64K vs the
    default 128K).

    Any other tricks I could employ to achieve higher write/s numbers?

    Ara.



    ________________________________

    This message is for the designated recipient only and may contain
    privileged, proprietary, or otherwise confidential information. If
    you have received it in error, please notify the sender immediately
    and delete the original. Any other use of the e-mail by you is
    prohibited. Thank you in advance for your cooperation.

    ________________________________

Reply via email to