Is your linear regression library/algorithm map-reduce compatible?

Can you identify which rows/records should be in any particular linear
regression run using data which is present at ingestion (ergo available to
Ignite affinity routing)?


On Fri, Sep 30, 2022, 03:51 Thomas Kramer <[email protected]> wrote:

> Here's my setup:
>
> I have a distributed training cache that contains all data with all
> possible features. This cache is updated constantly with new real-world
> data to improve accuracy.
>
> When running Linear Regression on nodes, they will need a different subset
> from that full training cache every time again depending on the test case
> for needed regression.
>
> So before Linear Regression on each node, I create from the full cache a
> local copy by filtering a subset that fits the current test data.
> Originally I used CacheBasedDatasetBuilder but noticed it creates multiple
> temporary caches with the same characteristics as upstream full training
> cache, i.e. distributed on all nodes. That's unnecessary in my case because
> I need the subset only temporarily on the worker node.
>
>
>
> On 30.09.22 01:08, Jeremy McMillan wrote:
>
> I share Stephen's curiosity about the use case. The best compromises are
> sensitive to situation and outcomes.
>
> Are you trying to cull training data into training, tuning, and validation
> subsets?
>
> Maybe there's a colocation approach that would suffice.
>
> On Thu, Sep 29, 2022, 12:26 Thomas Kramer <[email protected]> wrote:
>
>> Right, I don't want to use CacheMode.LOCAL because it's deprecated. Thus
>> my question, what will be the alternative for a purely local cache in
>> memory that doesn't cause the cluster-wide lock and no map exchange event?
>>
>>
>> On 29.09.22 14:57, Николай Ижиков wrote:
>>
>> You may not want to use LOCAL caches because they are removed in master
>> [1] and will not exists in next release
>>
>> [1]
>> https://github.com/apache/ignite/commit/01a7d075a5f48016511f6a754538201f12aff4f7
>>
>>
>> 29 сент. 2022 г., в 15:55, Николай Ижиков <[email protected]>
>> написал(а):
>>
>> Because node local cache created on each server node.
>>
>> 29 сент. 2022 г., в 15:43, Kramer <[email protected]> написал(а):
>>
>> Coming back to my original question:
>>
>> CacheConfiguration<UUID, BinaryObject> cfg = new CacheConfiguration<>();
>> cfg.setCacheMode(CacheMode.REPLICATED);
>> cfg.setAffinity(new LocalAffinityFunction());
>>
>> Will the above code still create a cluster wide lock with partition map
>> exchange event even though the cache will be hosted on local node only?
>>
>>
>>
>> *Gesendet:* Dienstag, 27. September 2022 um 18:50 Uhr
>> *Von:* "Thomas Kramer" <[email protected]>
>> *An:* [email protected]
>> *Betreff:* Re: Creating local cache without cluster-wide lock
>> I'm using CacheBasedDataset to filter a subset from a distributed cache
>> of all training data for Linear Regression. This seems to by default use
>> the AffinityFunction from the upstream cache to create a new temporary
>> cache with every preprocessing trainer and on every dataset update. This
>> causes a lot of additional traffic if happening on multiple nodes.
>>
>> So I was looking to create local caches for the filtered datasets.
>>
>>
>> On 27.09.22 18:30, Stephen Darlington wrote:
>> > What are you trying to do? The general solution is to create a
>> long-lived cache and have a run-number or similar as part of the key.
>> >
>> >> On 27 Sep 2022, at 15:36, Thomas Kramer <[email protected]> wrote:
>> >>
>> >> I understand creating a new cache dynamically requires a cluster-wide
>> >> lock with partition map exchange event to create the cache on all
>> nodes.
>> >> This is unnecessary traffic when only working with local caches.
>> >>
>> >> For local-only caches I assume this wouldn't happen. But
>> CacheMode.LOCAL
>> >> is deprecated.
>> >>
>> >> Is there a way to create a local cache without triggering unnecessary
>> >> map exchange events?
>> >>
>> >> Would this work or does it still create a short global lock on all
>> nodes
>> >> not only the local node?
>> >>
>> >> CacheConfiguration<UUID, BinaryObject> cfg = new
>> >> CacheConfiguration<>();
>> >> cfg.setCacheMode(CacheMode.REPLICATED);
>> >> cfg.setAffinity(new LocalAffinityFunction());
>> >>
>>
>>
>>
>>

Reply via email to