Re: Distributed queue problem with peerClassLoading enabled

Denis Magda Wed, 24 Feb 2016 07:08:24 -0800

Hi Mateusz,

Please see inline


On 2/17/2016 11:30 AM, mp wrote:

Denis,

Please see below for answers.

Cheers,
-Mateusz
On Tue, Feb 16, 2016 at 10:18 PM, Denis Magda <dma...@gridgain.com<mailto:dma...@gridgain.com>> wrote:
    Hi Mateusz,

    I've revisited the whole discussion from the beginning and should
    say that the solution based on the distributed queue won't work
    for you even if all the issues listed below are fixed.

    Presently you're placing in the queue tasks coming form different
    nodes with different class versions. Even if the tasks are stored
    in the queue in a binary form they have to be deserialized to an
    original form before execution. This will lead to
    ClassNotFoundExceptions in your case.
No, I'm not filling the queue with different class versions. My testcase is quite simple. Please check the original description specifiedinhttp://apache-ignite-users.70518.x6.nabble.com/Distributed-queue-problem-with-peerClassLoading-enabled-tp1762p1780.html
In 1.4.0 it works as follows:

1. A server node is started.
2. Client node (which acts as a server as well) is started, creates aqueue, fills it with 100 tasks, and both nodes are polling tasks fromthe queue. When the queue is empty (all tasks called), the test ends.At this point the client node leaves the cluster as well.3. Then you run the same test again *without any code modifications*,ie, exactly the same class on the same client node. I assume that evenif the server node already cached the class during the 1st test run,it should be no problem for the 2nd test run. But unfortunately itfails with "ignitetest.Task cannot be cast to ignitetest.Task", whichI believe is due to class loaders being different.
In 1.5.0 even the 1st run of the test fails due to a problem youreported in https://issues.apache.org/jira/browse/IGNITE-2339

It will pass the 1st run if you switch to OptimizedMarshaller that is adefault one in 1.4.0. Just set IgniteConfiguration.setMarshaller(newOptimizedMarshaller()) and you will see that everything will work thesame in 1.5.0.

However the second run will fail as in case with 1.4.0.

Actually, what I would like to achieve is the following (in 3 steps):
1. Have my original test pass, regardless of how many times I call thetest. This is a baseline.2. Modify my test so that after I finish each test run, I can *modify*the code of Task class, and my change is reflected in the subsequenttest run, ie, the server node will process the Task class with its newdefinition.3. Multiple client nodes can have their own Task class definitionsthat run on the cluster at the same time, assuming that they create*different* queues for their tasks. Actually they will also broadcasttheir own IgniteCallables to the cluster (note that in particular theycan broadcast the same IgniteCallable class, each in its own differentversion). But I'm always assuming that a given queue iscreated/maintained/filled by one client node therefore it will nevermix objects with different versions of the same class.
    My suggestion is the same is it was before. Avoid usage of the
    distributed queue but rather start sending ignite compute tasks
    for execution from the beginning.

    Will this work for you?
In my setup, it is simply more logical to model the computation asIgniteCallables being broadcast to the cluster and then they pulltasks from a distributed queue. Actually, the distributed queue (andother distributed structures provided by Ignite) was one of mainreasons that attracted me to try it out. And I really like them andtheir simple usage :)Without the use of distributed structures I think managing mycomputations would probably be more cumbersome. But I will of coursethink about it, and maybe I will come up with a suitable model.

Seems that I got the idea why you had decided to use a distributed queuein this particular case. This way you try to send several jobs forexecution at once and wants them to be executed uniformly by all nodesthat are available.

Is this assumption correct?

If I'm right then I will suggest you create your own ComputeTask thatwill be split and mapped to "less" loaded nodes using a load balancer.I've prepared a code sample for you [1]. Please go through it. Ifanything is unclear then let me know.

Here [2] you can read more on load balancing that is implemented andused by compute engine out of the box.


[1] https://gist.github.com/dmagda/3a78e7d01d9fffd7bc66
[2] https://apacheignite.readme.io/docs/load-balancing

Regards,
Denis



    --
    Denis



    On 2/16/2016 11:33 AM, mp wrote:

    Hi Denis,

    Many thanks! I look forward to 1.6 then.
    Please also consider the following statement made by Dmitriy on
    Nov 03, 2015 (see his message in the thread):

    "With that in mind, we will be removing the requirement for
    caches to work only with SHARED and CONTINUOUS deployment modes,
    so you will be able to use PRIVATE or ISOLATED deployment modes
    to deploy your computations."

    As far as I understand, the above planned change is not covered
    by any Jira ticket.

    Cheers,
    -Mateusz



    On Fri, Feb 12, 2016 at 10:35 PM, Denis Magda
    <dma...@gridgain.com <mailto:dma...@gridgain.com>> wrote:

        Hi Mateusz,

        I assigned both tickets that you have problems with on
        myself. They will be fixed as a part of the next release.
        https://issues.apache.org/jira/browse/IGNITE-2339
        https://issues.apache.org/jira/browse/IGNITE-1823

        There is one more issue that was reproduced locally and
        refers to unexpected cache undeployment when the binary
        marshaller is used.
        https://issues.apache.org/jira/browse/IGNITE-2647

        Thanks for your patience and still showing the interest in
        Ignite.

        Regards,
        Denis


        On 2/12/2016 4:41 PM, mp wrote:

        Hi Denis,

        But my test still fails in version 1.5 with default (ie,
        binary) marshaller. See my message from January 7, and your
        reply in which you mentioned a new Jira ticked for a bug
        concerning the new binary marshaller:
        https://issues.apache.org/jira/browse/IGNITE-2339

        Basically, my test case (see
        https://issues.apache.org/jira/browse/IGNITE-1823 ) fails in
        all of the scenarios I tried:

        1. Binary marshaller + default deployment mode
        2. Binary marshaller + shared deployment mode
        3. Binary marshaller + private deployment mode
        4. Optimized marshaller + default deployment mode
        5. Optimized marshaller + shared deployment mode
        6. Optimized marshaller + private deployment mode

        Would you have any hint/advice on how I could proceed? Is
        there any chance of fixing the issues related to my test case?

        Thanks for your help,
        -Mateusz


        On Wed, Feb 10, 2016 at 4:46 PM, Denis Magda
        <dma...@gridgain.com <mailto:dma...@gridgain.com>> wrote:

            Hi Mateusz,

            In version 1.5 we released the binary objects [1] format
            that allows to store cache in class version independent
            form. Thus you don't need to have any classes on server
            side.
            This ability allows dynamic change to an objects
            structure, and even allows multiple clients with
            different versions of class definitions to co-exist.

            In my understanding if you switch to this format you
            will be able to support your use case.

            If something is unclear don't hesitate to ask.

            [1] https://apacheignite.readme.io/docs/binary-marshaller

            --
            Denis


            On 2/10/2016 4:06 PM, mp wrote:

Hi Denis,

Thanks for your reply.
So, summing up, it seems that in the context of my use
case, version 1.5 does not differ from 1.4? Which means
that I still cannot achieve my goal: different versions
of the same class (from different clients) running on
the cluster at the same time?

As far as I understand this involves:
1. https://issues.apache.org/jira/browse/IGNITE-1823
2. https://issues.apache.org/jira/browse/IGNITE-2339
3. Removing the requirement for caches to work only
with SHARED and CONTINUOUS deployment modes (this was
announced by Dmitriy in

http://apache-ignite-users.70518.x6.nabble.com/Distributed-queue-problem-with-peerClassLoading-enabled-tp1762p1829.html
)

Is there any chance the above use case will be possible
in near future (any upcoming version)?

I really like the API and concept of Ignite. If only I
could achieve the above scenario...

Cheers,
-Mateusz

On Thu, Jan 7, 2016 at 5:25 PM, Denis Magda
<dma...@gridgain.com <mailto:dma...@gridgain.com>> wrote:

Mateusz,

It doesn’t work for now because peerClassLoading
doesn’t work for objects that are stored in the
binary format in a cache.
Since starting from 1.5 BinaryMarshaller is a
default one all the objects are stored in a such
format in caches by default.

If you prefer to turn off such a behavior you can
set IgniteConfiguration.setMarshaller(new
OptimizedMarshaller()) for every node and your test
should work as before.

—
Denis

                On 7 янв. 2016 г., at 17:09, mp <mjj...@gmail.com
                <mailto:mjj...@gmail.com>> wrote:

                Hello Denis,

                Thanks a lot for your reply!
                Concerning point 2: does it mean that
                "peerClassLoading" simply does not work in 1.5?
                It used to work (partially) in 1.4 (details
                described earlier in the message thread).

                Cheers,
                -Mateusz



                On Thu, Jan 7, 2016 at 1:38 PM, Denis Magda
                <dma...@gridgain.com <mailto:dma...@gridgain.com>>
                wrote:

                    Hi Mateusz,

                    1. It seems that distributed cache is still
                    *not* available in
                    PRIVATE/ISOLATED modes. Is this correct?

                    Right, it hasn't been fixed yet. I've just
                    followed up the related discussion on the dev
                    list. Please follow it to see the most
                    up-to-date information
                    
http://apache-ignite-developers.2346864.n4.nabble.com/Fwd-Distributed-queue-problem-with-peerClassLoading-enabled-tp4521p6440.html

                    2. When I run my simple test code in the
                    default SHARED mode (the same as
                    specified in
                    https://issues.apache.org/jira/browse/IGNITE-1823
                    jira issue),
                    I still get an error. However the cause
                    exception seems to be different.
                    Please see attached server log.

                    The reason is that there is an attempt to
                    deserialize a binary object stored on a server
                    node and the server node doesn't have object's
                    class definition in its class path.
                    I've opened a ticket
                    https://issues.apache.org/jira/browse/IGNITE-2339

                    As a workaround you can put a class definition
                    on server's class path and the problem will
                    disappear.

                    Regards,
                    Denis

                    On 1/7/2016 1:30 PM, mjjp wrote:

                        Hello,

                        I have just downloaded 1.5.0-final to
                        check if my problem has been resolved.
                        Either I'm doing something wrong, or
                        version 1.5 has the same behavior in
                        this context:

                        1. It seems that distributed cache is
                        still *not* available in
                        PRIVATE/ISOLATED modes. Is this correct?

                        2. When I run my simple test code in the
                        default SHARED mode (the same as
                        specified in
                        https://issues.apache.org/jira/browse/IGNITE-1823
                        jira issue),
                        I still get an error. However the cause
                        exception seems to be different.
                        Please see attached server log.

                        Would you be able to check the attached
                        log to verify if this is an expected
                        behavior in 1.5?

                        Cheers,
                        -Mateusz

                        ignite-fd14d572.log
                        
<http://apache-ignite-users.70518.x6.nabble.com/file/n2416/ignite-fd14d572.log>




                        --
                        View this message in context:
                        
http://apache-ignite-users.70518.x6.nabble.com/Distributed-queue-problem-with-peerClassLoading-enabled-tp1762p2416.html
                        Sent from the Apache Ignite Users mailing
                        list archive at Nabble.com
                        <http://nabble.com>.

Re: Distributed queue problem with peerClassLoading enabled

Reply via email to