Re: Help with tuning for larger clusters

Denis Magda Thu, 29 Oct 2015 01:58:58 -0700

Hi Dmitriy,

Definitely we should. I'll put this tips into a new block on that page abit later.


--
Denis

On 10/29/2015 8:46 AM, Dmitriy Setrakyan wrote:

Should we add some of these performance and tuning tips to ourdocumentation?


http://apacheignite.gridgain.org/docs/performance-tips

D.

On Wed, Oct 28, 2015 at 7:04 AM, Denis Magda <[email protected]<mailto:[email protected]>> wrote:


    Hi Joe,

    No problems, I'll guide you until we get to the bottom.

    Do you start pre-loading the caches with data right after the
    cluster is ready? If so let's postpone doing this until you have a
    stable cluster with caches rebalanced and ready to be used.

    Please, do the following as the next steps:

    1) Set 'failureDetectionTimeout' to a bigger value (~ 15 secs);

    2) Set CacheConfiguration.setRebalanceTimeout to a value that is
    approximately equal to the time when all the nodes are joined the
    topology (~ 1 minute or so).

    3) Enable verbose logging for every node by passing
    -DIGNITE_QUEIT=false parameter to virtual machine arguments list.
    If you use ignite.sh script then just pass '-v' flag.

    4) Enable garbage collection logs for every node by passing this
    string to virtual machine arguments list -Xloggc:./gc.log
    -XX:+PrintGCDetails -verbose:gc

    When you did a test run taking into account all the points above
    please gather all the logs (including garbage collection logs) and
    send us for further investigation.

    Regards,
    Denis


    On 10/28/2015 1:40 PM, [email protected] <mailto:[email protected]> wrote:

    Thanks for the info Denis.

    Removing the failureDetectionTimeout and using the networkTimeout
    seems to allow the nodes to join the topology in about the same
    amount of time. I'm still only having occasional success running
    anything (even just the pi estimator)

    I seem to always see a bunch of warnings...a summary is dumped
    below along with my config at the end, any guidance you can
    provide is appreciated.

    Thanks,
    Joe

    Every node seems to see a bunch of "Retrying preload partition",
    with the lowest locNodeOrder having fewer nodes in the remaining

    [14:52:38,979][WARN
    ][ignite-#104%sys-null%][GridDhtPartitionsExchangeFuture]
    Retrying preload partition exchange due to timeout [done=false,
    dummy=false, exchId=GridDhtPartitionExchangeId
    [topVer=AffinityTopologyVersion [topVer=62, minorTopVer=0],
    nodeId=fd9620f5, evt=NODE_JOINED], rcvdIds=[], rmtIds=[0ab29a08,
    5216f6ba, f882885f, 0d232f1a, b74f5ebb, 5790761a, 55d2082e,
    b1bf93b3, 2fd79f9f, a899ccce, 3dd74aba, 320d05fd, 0d44a4b3,
    9a00f235, 4426467e, 7837fdfc, e8778da0, 4a988e3e, f8cabdbb,
    494ad6fd, 7c05abfb, 5902c851, c406028e, a0b57685, e213b903,
    c85a0b46, df981c08, 187cd54f, f0b7b298, 94ec7576, 041975f5,
    aecba5d0, 5549256d, f9b5a77a, 596d0df7, 26266d8c, 0e664e25,
    97d112b2, aac08043, 6b81a2b1, 5a2a1012, 534ac94b, b34cb942,
    837785eb, 966d70b2, 3aab732e, 4e34ad89, 6df0ffff, 4c7c3c47,
    85eea5fe, 1c5e2f6b, 3f426f4e, 27a9bef9, cd874e96, dc3256a7,
    4da50521, 1d370c9e, 19c334eb, 24be15dd, 6c922af3, 01ea2812],
    remaining=[0ab29a08, 5216f6ba, f882885f, 0d232f1a, b74f5ebb,
    5790761a, 55d2082e, b1bf93b3, 2fd79f9f, a899ccce, 3dd74aba,
    320d05fd, 0d44a4b3, 9a00f235, 4426467e, 7837fdfc, e8778da0,
    4a988e3e, f8cabdbb, 494ad6fd, 7c05abfb, 5902c851, c406028e,
    a0b57685, e213b903, c85a0b46, df981c08, 187cd54f, f0b7b298,
    94ec7576, 041975f5, aecba5d0, 5549256d, f9b5a77a, 596d0df7,
    26266d8c, 0e664e25, 97d112b2, aac08043, 6b81a2b1, 5a2a1012,
    534ac94b, b34cb942, 837785eb, 966d70b2, 3aab732e, 4e34ad89,
    6df0ffff, 4c7c3c47, 85eea5fe, 1c5e2f6b, 3f426f4e, 27a9bef9,
    cd874e96, dc3256a7, 4da50521, 1d370c9e, 19c334eb, 24be15dd,
    6c922af3, 01ea2812], init=true, initFut=true, ready=true,
    replied=false, added=true, oldest=0d44a4b3, oldestOrder=1,
    evtLatch=0, locNodeOrder=62,
    locNodeId=fd9620f5-3ebb-4a71-a482-73d6a81b1688]


    [14:38:41,893][WARN
    ][ignite-#95%sys-null%][GridDhtPartitionsExchangeFuture] Retrying
    preload partition exchange due to timeout [done=false,
    dummy=false, exchId=GridDhtPartitionExchangeId
    [topVer=AffinityTopologyVersion [topVer=25, minorTopVer=0],
    nodeId=df981c08, evt=NODE_JOINED], rcvdIds=[7c05abfb, b34cb942,
    e213b903, 320d05fd, 5902c851, f0b7b298, 1d370c9e, 0d232f1a,
    494ad6fd, 5a2a1012, b1bf93b3, 55d2082e, 7837fdfc, 85eea5fe,
    4e34ad89, 5790761a, 3f426f4e, aac08043, 187cd54f, 01ea2812,
    c406028e, 24be15dd, 966d70b2], rmtIds=[0d232f1a, 5790761a,
    55d2082e, b1bf93b3, aac08043, 5a2a1012, b34cb942, 320d05fd,
    966d70b2, 4e34ad89, 85eea5fe, 7837fdfc, 3f426f4e, 1d370c9e,
    494ad6fd, 7c05abfb, 5902c851, c406028e, 24be15dd, e213b903,
    df981c08, 187cd54f, f0b7b298, 01ea2812], remaining=[df981c08],
    init=true, initFut=true, ready=true, replied=false, added=true,
    oldest=0d44a4b3, oldestOrder=1, evtLatch=0, locNodeOrder=1,
    locNodeId=0d44a4b3-4d10-4f67-b8bd-005be226b1df]


    I also see a little over half the nodes getting "Still waiting
    for initial partition map exchange" warnings like this


    [14:39:37,848][WARN ][main][GridCachePartitionExchangeManager]
    Still waiting for initial partition map exchange
    [fut=GridDhtPartitionsExchangeFuture [dummy=false,
    forcePreload=false, reassign=false, discoEvt=DiscoveryEvent
    [evtNode=TcpDiscoveryNode
    [id=27a9bef9-de04-486d-aac0-bfa749e9007d,
    addrs=[0:0:0:0:0:0:0:1%1, 10.148.0.87, 10.159.1.182, 127.0.0.1],
    sockAddrs=[r1i4n10.redacted.com/10.148.0.87:47500
    <http://r1i4n10.redacted.com/10.148.0.87:47500>,
    /0:0:0:0:0:0:0:1%1:47500, /10.159.1.182:47500
    <http://10.159.1.182:47500>, /10.148.0.87:47500
    <http://10.148.0.87:47500>, /10.159.1.182:47500
    <http://10.159.1.182:47500>, /127.0.0.1:47500
    <http://127.0.0.1:47500>], discPort=47500, order=48, intOrder=48,
    lastExchangeTime=1445974777828, loc=true,
    ver=1.4.0#19691231-sha1:00000000, isClient=false], topVer=48,
    nodeId8=27a9bef9, msg=null, type=NODE_JOINED,
    tstamp=1445974647187], rcvdIds=GridConcurrentHashSet
    [elements=[]], rmtIds=[0ab29a08-9c95-4054-8035-225f5828b3d4,
    0d232f1a-0f46-4798-a39a-63a17dc4dc7f,
    f9b5a77a-a4c1-46aa-872e-aeaca9b76ee3,
    596d0df7-3edf-4078-8f4a-ffa3d96296c6,
    5790761a-aeeb-44d1-9fce-3fee31ef39b7,
    55d2082e-3517-4828-8d47-57b4ed5a41bc,
    26266d8c-cc87-4472-9fa4-c526d6da2233,
    0e664e25-8dde-4df8-966b-53b60f9a1087,
    b1bf93b3-24bb-4520-ade0-31d05a93558d,
    aac08043-875a-485a-ab2c-cd7e66d68f8f,
    2fd79f9f-9590-41d2-962e-004a3d7690b5,
    5a2a1012-0766-448c-9583-25873c305de9,
    534ac94b-8dd1-4fa8-a481-539fa4f4ce55,
    b34cb942-e960-4a00-b4fb-10add6466a93,
    320d05fd-e021-40ac-83bc-62f54756771b,
    0d44a4b3-4d10-4f67-b8bd-005be226b1df,
    837785eb-24e0-496a-a0cc-f795b64b5929,
    9a00f235-3b6a-4be5-b0e3-93cd1beacaf4,
    966d70b2-e1dc-4e20-9876-b63736545abd,
    3aab732e-a075-4b19-9525-e97a1260a4fe,
    4e34ad89-fa46-4503-a599-b8c937ca1f47,
    4c7c3c47-6e5c-4c15-80a9-408192596bc2,
    85eea5fe-9aff-4821-970c-4ce006ee853a,
    7837fdfc-6255-4784-8088-09d4e6e37bb9,
    3f426f4e-2d0c-402a-a4af-9d7656f46484,
    e8778da0-a764-4ad9-afba-8a748564e12a,
    4a988e3e-3434-4271-acd6-af2a1e30524c,
    cd874e96-63cf-41c9-8e8a-75f3223bfe9d,
    f8cabdbb-875a-480b-8b5e-4b5313c5fcbd,
    dc3256a7-ae23-4c2e-b375-55e2884e045d,
    4da50521-aad0-48a4-9f79-858bbc2e6b89,
    1d370c9e-250f-4733-8b8a-7b6f5c6e1b2b,
    494ad6fd-1637-44b8-8d3a-1fa19681ba64,
    7c05abfb-dba1-43c3-a8b1-af504762ec60,
    5902c851-5275-41fd-89c4-cd6390c88670,
    19c334eb-5661-4697-879d-1082571dfef8,
    c406028e-768e-404e-8417-40d2960c4ba3,
    a0b57685-e5dc-498c-99a4-33b1aef32632,
    24be15dd-45f7-4980-b4f8-3176ab67e8f6,
    e213b903-107b-4465-8fe1-78b7b393d631,
    df981c08-148d-4266-9ea7-163168012968,
    187cd54f-396b-4c3c-9bfc-9883ac37f556,
    f0b7b298-6432-477a-85a0-83e29e8c5380,
    94ec7576-7a02-4c08-8739-4e0fc52a3d3a,
    041975f5-990a-4792-b384-eded32966783,
    01ea2812-5005-4152-af2e-2586bf65b4c6,
    aecba5d0-9d9b-4ab6-9018-62f5abb7b809],
    exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion
    [topVer=48, minorTopVer=0], nodeId=27a9bef9, evt=NODE_JOINED],
    init=true, ready=true, replied=false, added=true,
    initFut=GridFutureAdapter [resFlag=2, res=true,
    startTime=1445974657836, endTime=1445974658400,
    ignoreInterrupts=false, lsnr=null, state=DONE], topSnapshot=null,
    lastVer=null, partReleaseFut=GridCompoundFuture [lsnrCalls=3,
    finished=true, rdc=null, init=true,
    res=java.util.concurrent.atomic.AtomicMarkableReference@6b58be0e,
    err=null, done=true, cancelled=false, err=null, futs=[true, true,
    true]], skipPreload=false, clientOnlyExchange=false,
    oldest=0d44a4b3-4d10-4f67-b8bd-005be226b1df, oldestOrder=1,
    evtLatch=0, remaining=[0ab29a08-9c95-4054-8035-225f5828b3d4,
    0d232f1a-0f46-4798-a39a-63a17dc4dc7f,
    f9b5a77a-a4c1-46aa-872e-aeaca9b76ee3,
    596d0df7-3edf-4078-8f4a-ffa3d96296c6,
    5790761a-aeeb-44d1-9fce-3fee31ef39b7,
    55d2082e-3517-4828-8d47-57b4ed5a41bc,
    26266d8c-cc87-4472-9fa4-c526d6da2233,
    0e664e25-8dde-4df8-966b-53b60f9a1087,
    b1bf93b3-24bb-4520-ade0-31d05a93558d,
    aac08043-875a-485a-ab2c-cd7e66d68f8f,
    2fd79f9f-9590-41d2-962e-004a3d7690b5,
    5a2a1012-0766-448c-9583-25873c305de9,
    534ac94b-8dd1-4fa8-a481-539fa4f4ce55,
    b34cb942-e960-4a00-b4fb-10add6466a93,
    320d05fd-e021-40ac-83bc-62f54756771b,
    0d44a4b3-4d10-4f67-b8bd-005be226b1df,
    837785eb-24e0-496a-a0cc-f795b64b5929,
    9a00f235-3b6a-4be5-b0e3-93cd1beacaf4,
    966d70b2-e1dc-4e20-9876-b63736545abd,
    3aab732e-a075-4b19-9525-e97a1260a4fe,
    4e34ad89-fa46-4503-a599-b8c937ca1f47,
    4c7c3c47-6e5c-4c15-80a9-408192596bc2,
    85eea5fe-9aff-4821-970c-4ce006ee853a,
    7837fdfc-6255-4784-8088-09d4e6e37bb9,
    3f426f4e-2d0c-402a-a4af-9d7656f46484,
    e8778da0-a764-4ad9-afba-8a748564e12a,
    4a988e3e-3434-4271-acd6-af2a1e30524c,
    cd874e96-63cf-41c9-8e8a-75f3223bfe9d,
    f8cabdbb-875a-480b-8b5e-4b5313c5fcbd,
    dc3256a7-ae23-4c2e-b375-55e2884e045d,
    4da50521-aad0-48a4-9f79-858bbc2e6b89,
    1d370c9e-250f-4733-8b8a-7b6f5c6e1b2b,
    494ad6fd-1637-44b8-8d3a-1fa19681ba64,
    7c05abfb-dba1-43c3-a8b1-af504762ec60,
    5902c851-5275-41fd-89c4-cd6390c88670,
    19c334eb-5661-4697-879d-1082571dfef8,
    c406028e-768e-404e-8417-40d2960c4ba3,
    a0b57685-e5dc-498c-99a4-33b1aef32632,
    24be15dd-45f7-4980-b4f8-3176ab67e8f6,
    e213b903-107b-4465-8fe1-78b7b393d631,
    df981c08-148d-4266-9ea7-163168012968,
    187cd54f-396b-4c3c-9bfc-9883ac37f556,
    f0b7b298-6432-477a-85a0-83e29e8c5380,
    94ec7576-7a02-4c08-8739-4e0fc52a3d3a,
    041975f5-990a-4792-b384-eded32966783,
    01ea2812-5005-4152-af2e-2586bf65b4c6,
    aecba5d0-9d9b-4ab6-9018-62f5abb7b809], super=GridFutureAdapter
    [resFlag=0, res=null, startTime=1445974657836, endTime=0,
    ignoreInterrupts=false, lsnr=null, state=INIT]]]



    Then on the occasions when mapreduce jobs fail I will see one
    node with (it isn't always the same node)


    [14:52:57,080][WARN
    ][grid-nio-worker-3-#135%null%][TcpCommunicationSpi] Closing NIO
    session because of unhandled exception [cls=class
    o.a.i.i.util.nio.GridNioException, msg=Connection timed out]
    [14:52:59,123][WARN
    ][grid-nio-worker-3-#135%null%][TcpCommunicationSpi] Failed to
    process selector key (will close): GridSelectorNioSessionImpl
    [selectorIdx=3, queueSize=0,
    writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768],
    readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768],
    recovery=GridNioRecoveryDescriptor [acked=3, resendCnt=0,
    rcvCnt=0, reserved=true, lastAck=0, nodeLeft=false,
    node=TcpDiscoveryNode [id=837785eb-24e0-496a-a0cc-f795b64b5929,
    addrs=[0:0:0:0:0:0:0:1%1, 10.148.0.81, 10.159.1.176, 127.0.0.1],
    sockAddrs=[/10.159.1.176:47500 <http://10.159.1.176:47500>,
    /0:0:0:0:0:0:0:1%1:47500, r1i4n4.redacted.com/10.148.0.81:47500
    <http://r1i4n4.redacted.com/10.148.0.81:47500>,
    /10.148.0.81:47500 <http://10.148.0.81:47500>,
    /10.159.1.176:47500 <http://10.159.1.176:47500>, /127.0.0.1:47500
    <http://127.0.0.1:47500>], discPort=47500, order=45, intOrder=45,
    lastExchangeTime=1445974625750, loc=false,
    ver=1.4.0#19691231-sha1:00000000, isClient=false],
    connected=true, connectCnt=1, queueLimit=5120],
    super=GridNioSessionImpl [locAddr=/10.159.1.112:46222
    <http://10.159.1.112:46222>, rmtAddr=/10.159.1.176:47100
    <http://10.159.1.176:47100>, createTime=1445974646591,
    closeTime=0, bytesSent=30217, bytesRcvd=9,
    sndSchedTime=1445975577912, lastSndTime=1445975577912,
    lastRcvTime=1445974655114, readsPaused=false,
    filterChain=FilterChain[filters=[GridNioCodecFilter
    [parser=o.a.i.i.util.nio.GridDirectParser@44de55ba,
    directMode=true], GridConnectionBytesVerifyFilter], accepted=false]]
    [14:52:59,124][WARN
    ][grid-nio-worker-3-#135%null%][TcpCommunicationSpi] Closing NIO
    session because of unhandled exception [cls=class
    o.a.i.i.util.nio.GridNioException, msg=Connection timed out]
    [14:53:00,105][WARN
    ][grid-nio-worker-3-#135%null%][TcpCommunicationSpi] Failed to
    process selector key (will close): GridSelectorNioSessionImpl
    [selectorIdx=3, queueSize=0,
    writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768],
    readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768],
    recovery=GridNioRecoveryDescriptor [acked=0, resendCnt=0,
    rcvCnt=0, reserved=true, lastAck=0, nodeLeft=false,
    node=TcpDiscoveryNode [id=4426467e-b4b4-4912-baa1-d7cc839d9188,
    addrs=[0:0:0:0:0:0:0:1%1, 10.148.0.106, 10.159.1.201, 127.0.0.1],
    sockAddrs=[/10.159.1.201:47500 <http://10.159.1.201:47500>,
    /0:0:0:0:0:0:0:1%1:47500, r1i5n11.redacted.com/10.148.0.106:47500
    <http://r1i5n11.redacted.com/10.148.0.106:47500>,
    /10.148.0.106:47500 <http://10.148.0.106:47500>,
    /10.159.1.201:47500 <http://10.159.1.201:47500>, /127.0.0.1:47500
    <http://127.0.0.1:47500>], discPort=47500, order=57, intOrder=57,
    lastExchangeTime=1445974625790, loc=false,
    ver=1.4.0#19691231-sha1:00000000, isClient=false],
    connected=true, connectCnt=1, queueLimit=5120],
    super=GridNioSessionImpl [locAddr=/10.159.1.112:60869
    <http://10.159.1.112:60869>, rmtAddr=/10.159.1.201:47100
    <http://10.159.1.201:47100>, createTime=1445974654478,
    closeTime=0, bytesSent=22979, bytesRcvd=0,
    sndSchedTime=1445975577912, lastSndTime=1445975577912,
    lastRcvTime=1445974654478, readsPaused=false,
    filterChain=FilterChain[filters=[GridNioCodecFilter
    [parser=o.a.i.i.util.nio.GridDirectParser@44de55ba,
    directMode=true], GridConnectionBytesVerifyFilter], accepted=false]]
    [14:53:00,105][WARN
    ][grid-nio-worker-3-#135%null%][TcpCommunicationSpi] Closing NIO
    session because of unhandled exception [cls=class
    o.a.i.i.util.nio.GridNioException, msg=Connection timed out]


    I've tried adjusting the timeout settings further but haven't had
    much success.

    Here is what my config looks like, it is obviously heavily based
    off the hadoop example config.


    <?xml version="1.0" encoding="UTF-8"?>
    <beans
    ns1:schemaLocation="http://www.springframework.org/schema/beans
    http://www.springframework.org/schema/beans/spring-beans.xsd
    http://www.springframework.org/schema/util
    http://www.springframework.org/schema/util/spring-util.xsd";
    
<http://www.springframework.org/schema/beans%C2%A0%C2%A0%C2%A0%C2%A0%C2%A0%C2%A0http://www.springframework.org/schema/beans/spring-beans.xsd%C2%A0%C2%A0%C2%A0%C2%A0%C2%A0%C2%A0http://www.springframework.org/schema/util%C2%A0%C2%A0%C2%A0%C2%A0%C2%A0%C2%A0%C2%A0http://www.springframework.org/schema/util/spring-util.xsd>
    xmlns="http://www.springframework.org/schema/beans";
    <http://www.springframework.org/schema/beans>
    xmlns:ns1="http://www.w3.org/2001/XMLSchema-instance";
    <http://www.w3.org/2001/XMLSchema-instance>>
      <description>
            Spring file for Ignite node configuration with IGFS and
    Apache Hadoop map-reduce support enabled.
            Ignite node will start with this configuration by default.
        </description>
      <bean
    
class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer"
    id="propertyConfigurer">
        <property name="systemPropertiesModeName"
    value="SYSTEM_PROPERTIES_MODE_FALLBACK" />
        <property name="searchSystemEnvironment" value="true" />
      </bean>
      <bean abstract="true"
    class="org.apache.ignite.configuration.FileSystemConfiguration"
    id="igfsCfgBase">
        <property name="blockSize" value="#{128 * 1024}" />
        <property name="perNodeBatchSize" value="512" />
        <property name="perNodeParallelBatchCount" value="16" />
        <property name="prefetchBlocks" value="32" />
      </bean>
      <bean abstract="true"
    class="org.apache.ignite.configuration.CacheConfiguration"
    id="dataCacheCfgBase">
        <property name="cacheMode" value="PARTITIONED" />
        <property name="atomicityMode" value="TRANSACTIONAL" />
        <property name="writeSynchronizationMode" value="FULL_SYNC" />
        <property name="backups" value="0" />
        <property name="affinityMapper">
          <bean
    class="org.apache.ignite.igfs.IgfsGroupDataBlocksKeyMapper">
            <constructor-arg value="512" />
          </bean>
        </property>
        <property name="startSize" value="#{100*1024*1024}" />
        <property name="offHeapMaxMemory" value="0" />
      </bean>
      <bean abstract="true"
    class="org.apache.ignite.configuration.CacheConfiguration"
    id="metaCacheCfgBase">
        <property name="cacheMode" value="REPLICATED" />
        <property name="atomicityMode" value="TRANSACTIONAL" />
        <property name="writeSynchronizationMode" value="FULL_SYNC" />
      </bean>
      <bean
    class="org.apache.ignite.configuration.IgniteConfiguration"
    id="grid.cfg">
        <property name="failureDetectionTimeout" value="3000" />
        <property name="hadoopConfiguration">
          <bean
    class="org.apache.ignite.configuration.HadoopConfiguration">
            <property name="finishedJobInfoTtl" value="30000" />
          </bean>
        </property>
        <property name="connectorConfiguration">
          <bean
    class="org.apache.ignite.configuration.ConnectorConfiguration">
            <property name="port" value="11211" />
          </bean>
        </property>
        <property name="fileSystemConfiguration">
          <list>
            <bean
    class="org.apache.ignite.configuration.FileSystemConfiguration"
    parent="igfsCfgBase">
              <property name="name" value="igfs" />
              <property name="metaCacheName" value="igfs-meta" />
              <property name="dataCacheName" value="igfs-data" />
              <property name="ipcEndpointConfiguration">
                <bean
    class="org.apache.ignite.igfs.IgfsIpcEndpointConfiguration">
                  <property name="type" value="TCP" />
                  <property name="host" value="r1i0n12" />
                  <property name="port" value="10500" />
                </bean>
              </property>
            </bean>
          </list>
        </property>
        <property name="cacheConfiguration">
          <list>
            <bean
    class="org.apache.ignite.configuration.CacheConfiguration"
    parent="metaCacheCfgBase">
              <property name="name" value="igfs-meta" />
            </bean>
            <bean
    class="org.apache.ignite.configuration.CacheConfiguration"
    parent="dataCacheCfgBase">
              <property name="name" value="igfs-data" />
            </bean>
          </list>
        </property>
        <property name="includeEventTypes">
          <list>
            <ns2:constant
    static-field="org.apache.ignite.events.EventType.EVT_TASK_FAILED"
    xmlns:ns2="http://www.springframework.org/schema/util";
    <http://www.springframework.org/schema/util> />
            <ns2:constant
    static-field="org.apache.ignite.events.EventType.EVT_TASK_FINISHED"
    xmlns:ns2="http://www.springframework.org/schema/util";
    <http://www.springframework.org/schema/util> />
            <ns2:constant
    static-field="org.apache.ignite.events.EventType.EVT_JOB_MAPPED"
    xmlns:ns2="http://www.springframework.org/schema/util";
    <http://www.springframework.org/schema/util> />
          </list>
        </property>
        <property name="discoverySpi">
          <bean
    class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
            <property name="ipFinder">
              <bean
    
class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
                <property name="addresses">
                  <list>
    <value>r1i0n12:47500</value>
                  </list>
                </property>
              </bean>
            </property>
          </bean>
        </property>
      </bean>
    </beans>



    Quoting Denis Magda <[email protected]>
    <mailto:[email protected]>:

    Hi Joe,

    Great!

    Please see below

    On 10/27/2015 9:37 AM, [email protected] <mailto:[email protected]> wrote:

    Reducing the port range (to a single port) and lowering the
    IgniteConfiguration.setFailureDetectionTimeout to 1000 helped
    speed up everybody joining the topology and I was able to get a
    pi estimator run on 64 nodes.


    I suspect that the reason was in the number of ports specified
    in the range.  By some reason it takes significant time to get a
    response from TCP/IP stack that a connection can't be
    established on a particular port number.
    Please try to reduce the port range, lower
    TcpDiscoverySpi.setNetworkTimeout, keep
    IgniteConfiguration.setFailureDetectionTimeout's default value
    and share results with us.

    Thanks again for the help, I'm over the current hurdle.
    Joe


    Quoting [email protected] <mailto:[email protected]>:

    Thanks for the quick response Denis.

    I did a port range of 10 ports. I'll take a look at the
    failureDetectionTimeout and networkTimeout.

    Side question: Is there an easy way to map between the
    programmatic API and the spring XML properties? For instance I
    was trying to find the correct xml incantation for
    TcpDiscoverySpi.setMaxMissedHeartbeats(int) and I might have a
    similar issue finding
    IgniteConfiguration.setFailureDetectionTimeout(long). It seems
    like I can usually drop the set and adjust capitalization
    (setFooBar() == <property name="fooBar")

    Yes, your understanding is correct.

    Please pardon my ignorance on terminology:
    Are the nodes I run ignite.sh on considered server nodes or
    cluster nodes (I would have thought they are the same)

    Actually we have a notion of server and client nodes. This page
    contains extensive information on the type of nodes:
    https://apacheignite.readme.io/docs/clients-vs-servers

    A cluster node is just a server or client node.

    Regards,
    Denis

    Thanks,
    Joe

    Quoting Denis Magda <[email protected]>
    <mailto:[email protected]>:

    Hi Joe,

    How big is a port range, that you specified in your discovery
    configuration, for a every single node?
    Please take into account that the discovery may iterate over
    every port from the range before one node connects to the
    other and depending on the TCP related settings of your
    network it may take significant time before the cluster is
    assembled.

    Here I would recommend you to reduce the port range as much
    as possible and to play with the following network related
    parameters:
    - Try to use the failure detection timeout instead of setting
    socket, ack and many other timeouts explicitly
    
(https://apacheignite.readme.io/docs/cluster-config#failure-detection-timeout);
    - Try to play with TcpDiscoverySpi.networkTimeout because
    this timeout is considered during the time when a cluster
    node tries to join a cluster.

    In order to help you with the hanging compute tasks and to
    give you more specific recommendations regarding the slow
    join process please provide us with the following:
    - config files for server and cluster nodes;
    - log files from all the nodes. Please start the nodes with
    -DIGNITE_QUIET=false virtual machine property. If you start
    the nodes using ignite.sh/bat <http://ignite.sh/bat> then
    just pass '-v' as an argument to the script.
    - thread dumps for the nodes that are hanging waiting for the
    compute tasks to be completed.

    Regards,
    Denis

    On 10/26/2015 6:56 AM, [email protected] <mailto:[email protected]>
    wrote:

    Hi all,

    I have been experimenting with ignite and have run into a
    problem scaling up to larger clusters.

    I am playing with only two different use cases, 1) a Hadoop
    MapReduce accelerator 2) an in memory data grid (no
    secondary file system) being accessed by frameworks using
    the HDFS

    Everything works fine with a smaller cluster (8 nodes) but
    with a larger cluster (64 nodes) it takes a couple of
    minutes for all the nodes to register with the cluster(which
    would be ok) and mapreduce jobs just hang and never return.

    I've compiled the latest Ignite 1.4 (with
    ignite.edition=hadoop) from source, and am using it with
    Hadoop 2.7.1 just trying to run things like the pi estimator
    and wordcount examples.

    I started with the config/hadoop/default-config.xml

    I can't use multicast so I've configured it to use static IP
    based discovery with just a single node/port range.

    I've increased the heartbeat frequency to 10000 and that
    seemed to help make things more stable once all the nodes do
    join the cluster. I've also played with increasing both the
    socket timeout and the ack timeout but that seemed to just
    make it take longer for nodes to attempt to join the cluster
    after a failed attempt.

    I have access to a couple of different clusters, we allocate
    resources with slurm so I get a piece of a cluster to play
    with (hence the no-multicast restriction). The nodes all
    have fast networks (FDR InfiniBand) and a decent amount of
    memory (64GB-128GB) but no local storage (or swap space).

    As mentioned earlier, I disable the secondaryFilesystem.

    Any advice/hints/example xml configs would be extremely
    welcome.


    I also haven't been seeing the expected performance using
    the hdfs api to access ignite. I've tried both using the
    hdfs cli to do some simple timings of put/get and a little
    java program that writes then reads a file.  Even with small
    files (500MB) that should be kept completely in a single
    node, I only see about 250MB/s for writes and reads are much
    slower than that (4x to 10x).  The writes are better than
    hdfs (our hdfs is backed with pretty poor storage) but reads
    are much slower. Now I haven't tried scaling this at all but
    with an 8 node ignite cluster and a single "client" access a
    single file I would hope for something closer to memory
    speeds. (if you would like me to split this into another
    message to the list just let me know, I'm assuming the cause
    it the same---I missed a required config setting ;-) )

    Thanks in advance for any help,
    Joe

Re: Help with tuning for larger clusters

Reply via email to