Here's my setup:

I have a distributed training cache that contains all data with all
possible features. This cache is updated constantly with new real-world
data to improve accuracy.

When running Linear Regression on nodes, they will need a different
subset from that full training cache every time again depending on the
test case for needed regression.

So before Linear Regression on each node, I create from the full cache a
local copy by filtering a subset that fits the current test data.
Originally I used CacheBasedDatasetBuilder but noticed it creates
multiple temporary caches with the same characteristics as upstream full
training cache, i.e. distributed on all nodes. That's unnecessary in my
case because I need the subset only temporarily on the worker node.



On 30.09.22 01:08, Jeremy McMillan wrote:
I share Stephen's curiosity about the use case. The best compromises
are sensitive to situation and outcomes.

Are you trying to cull training data into training, tuning, and
validation subsets?

Maybe there's a colocation approach that would suffice.

On Thu, Sep 29, 2022, 12:26 Thomas Kramer <[email protected]> wrote:

    Right, I don't want to use CacheMode.LOCAL because it's
    deprecated. Thus my question, what will be the alternative for a
    purely local cache in memory that doesn't cause the cluster-wide
    lock and no map exchange event?


    On 29.09.22 14:57, Николай Ижиков wrote:
    You may not want to use LOCAL caches because they are removed in
    master [1] and will not exists in next release

    [1]
    
https://github.com/apache/ignite/commit/01a7d075a5f48016511f6a754538201f12aff4f7


    29 сент. 2022 г., в 15:55, Николай Ижиков
    <[email protected]> написал(а):

    Because node local cache created on each server node.

    29 сент. 2022 г., в 15:43, Kramer <[email protected]> написал(а):

    Coming back to my original question:
    CacheConfiguration<UUID, BinaryObject> cfg = new
    CacheConfiguration<>();
    cfg.setCacheMode(CacheMode.REPLICATED);
    cfg.setAffinity(new LocalAffinityFunction());
    Will the above code still create a cluster wide lock with
    partition map exchange event even though the cache will be
    hosted on local node only?
    *Gesendet:* Dienstag, 27. September 2022 um 18:50 Uhr
    *Von:* "Thomas Kramer" <[email protected]>
    *An:* [email protected]
    *Betreff:* Re: Creating local cache without cluster-wide lock
    I'm using CacheBasedDataset to filter a subset from a
    distributed cache
    of all training data for Linear Regression. This seems to by
    default use
    the AffinityFunction from the upstream cache to create a new
    temporary
    cache with every preprocessing trainer and on every dataset
    update. This
    causes a lot of additional traffic if happening on multiple nodes.

    So I was looking to create local caches for the filtered datasets.


    On 27.09.22 18:30, Stephen Darlington wrote:
    > What are you trying to do? The general solution is to create
    a long-lived cache and have a run-number or similar as part of
    the key.
    >
    >> On 27 Sep 2022, at 15:36, Thomas Kramer <[email protected]>
    wrote:
    >>
    >> I understand creating a new cache dynamically requires a
    cluster-wide
    >> lock with partition map exchange event to create the cache
    on all nodes.
    >> This is unnecessary traffic when only working with local caches.
    >>
    >> For local-only caches I assume this wouldn't happen. But
    CacheMode.LOCAL
    >> is deprecated.
    >>
    >> Is there a way to create a local cache without triggering
    unnecessary
    >> map exchange events?
    >>
    >> Would this work or does it still create a short global lock
    on all nodes
    >> not only the local node?
    >>
    >> CacheConfiguration<UUID, BinaryObject> cfg = new
    >> CacheConfiguration<>();
    >> cfg.setCacheMode(CacheMode.REPLICATED);
    >> cfg.setAffinity(new LocalAffinityFunction());
    >>


Reply via email to