Is your linear regression library/algorithm map-reduce compatible? Can you identify which rows/records should be in any particular linear regression run using data which is present at ingestion (ergo available to Ignite affinity routing)?
On Fri, Sep 30, 2022, 03:51 Thomas Kramer <[email protected]> wrote: > Here's my setup: > > I have a distributed training cache that contains all data with all > possible features. This cache is updated constantly with new real-world > data to improve accuracy. > > When running Linear Regression on nodes, they will need a different subset > from that full training cache every time again depending on the test case > for needed regression. > > So before Linear Regression on each node, I create from the full cache a > local copy by filtering a subset that fits the current test data. > Originally I used CacheBasedDatasetBuilder but noticed it creates multiple > temporary caches with the same characteristics as upstream full training > cache, i.e. distributed on all nodes. That's unnecessary in my case because > I need the subset only temporarily on the worker node. > > > > On 30.09.22 01:08, Jeremy McMillan wrote: > > I share Stephen's curiosity about the use case. The best compromises are > sensitive to situation and outcomes. > > Are you trying to cull training data into training, tuning, and validation > subsets? > > Maybe there's a colocation approach that would suffice. > > On Thu, Sep 29, 2022, 12:26 Thomas Kramer <[email protected]> wrote: > >> Right, I don't want to use CacheMode.LOCAL because it's deprecated. Thus >> my question, what will be the alternative for a purely local cache in >> memory that doesn't cause the cluster-wide lock and no map exchange event? >> >> >> On 29.09.22 14:57, Николай Ижиков wrote: >> >> You may not want to use LOCAL caches because they are removed in master >> [1] and will not exists in next release >> >> [1] >> https://github.com/apache/ignite/commit/01a7d075a5f48016511f6a754538201f12aff4f7 >> >> >> 29 сент. 2022 г., в 15:55, Николай Ижиков <[email protected]> >> написал(а): >> >> Because node local cache created on each server node. >> >> 29 сент. 2022 г., в 15:43, Kramer <[email protected]> написал(а): >> >> Coming back to my original question: >> >> CacheConfiguration<UUID, BinaryObject> cfg = new CacheConfiguration<>(); >> cfg.setCacheMode(CacheMode.REPLICATED); >> cfg.setAffinity(new LocalAffinityFunction()); >> >> Will the above code still create a cluster wide lock with partition map >> exchange event even though the cache will be hosted on local node only? >> >> >> >> *Gesendet:* Dienstag, 27. September 2022 um 18:50 Uhr >> *Von:* "Thomas Kramer" <[email protected]> >> *An:* [email protected] >> *Betreff:* Re: Creating local cache without cluster-wide lock >> I'm using CacheBasedDataset to filter a subset from a distributed cache >> of all training data for Linear Regression. This seems to by default use >> the AffinityFunction from the upstream cache to create a new temporary >> cache with every preprocessing trainer and on every dataset update. This >> causes a lot of additional traffic if happening on multiple nodes. >> >> So I was looking to create local caches for the filtered datasets. >> >> >> On 27.09.22 18:30, Stephen Darlington wrote: >> > What are you trying to do? The general solution is to create a >> long-lived cache and have a run-number or similar as part of the key. >> > >> >> On 27 Sep 2022, at 15:36, Thomas Kramer <[email protected]> wrote: >> >> >> >> I understand creating a new cache dynamically requires a cluster-wide >> >> lock with partition map exchange event to create the cache on all >> nodes. >> >> This is unnecessary traffic when only working with local caches. >> >> >> >> For local-only caches I assume this wouldn't happen. But >> CacheMode.LOCAL >> >> is deprecated. >> >> >> >> Is there a way to create a local cache without triggering unnecessary >> >> map exchange events? >> >> >> >> Would this work or does it still create a short global lock on all >> nodes >> >> not only the local node? >> >> >> >> CacheConfiguration<UUID, BinaryObject> cfg = new >> >> CacheConfiguration<>(); >> >> cfg.setCacheMode(CacheMode.REPLICATED); >> >> cfg.setAffinity(new LocalAffinityFunction()); >> >> >> >> >> >>
