[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training URL: https://github.com/apache/ignite/pull/5767#discussion_r247903540 ## File path: modules/ml/src/main/java/org/apache/ignite/ml/trainers/AdaptableDatasetTrainer.java ## @@ -56,27 +68,46 @@ * @param Type of labels. * @return Instance of this class. */ -public static , L> AdaptableDatasetTrainer of(DatasetTrainer wrapped) { -return new AdaptableDatasetTrainer<>(IgniteFunction.identity(), wrapped, IgniteFunction.identity()); +public static , L> AdaptableDatasetTrainer of( Review comment: Sorry...) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training URL: https://github.com/apache/ignite/pull/5767#discussion_r247901935 ## File path: modules/ml/src/main/java/org/apache/ignite/ml/composition/bagging/BaggedTrainer.java ## @@ -0,0 +1,200 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.ignite.ml.composition.bagging; + +import java.util.Collections; +import java.util.List; +import java.util.Random; +import java.util.stream.Collectors; +import java.util.stream.IntStream; +import java.util.stream.Stream; +import org.apache.ignite.ml.IgniteModel; +import org.apache.ignite.ml.composition.CompositionUtils; +import org.apache.ignite.ml.composition.combinators.parallel.TrainersParallelComposition; +import org.apache.ignite.ml.composition.predictionsaggregator.PredictionsAggregator; +import org.apache.ignite.ml.dataset.DatasetBuilder; +import org.apache.ignite.ml.environment.LearningEnvironmentBuilder; +import org.apache.ignite.ml.math.functions.IgniteBiFunction; +import org.apache.ignite.ml.math.functions.IgniteFunction; +import org.apache.ignite.ml.math.primitives.vector.Vector; +import org.apache.ignite.ml.math.primitives.vector.VectorUtils; +import org.apache.ignite.ml.trainers.AdaptableDatasetTrainer; +import org.apache.ignite.ml.trainers.DatasetTrainer; +import org.apache.ignite.ml.trainers.transformers.BaggingUpstreamTransformer; +import org.apache.ignite.ml.util.Utils; + +/** + * Trainer encapsulating logic of bootstrap aggregating (bagging). + * This trainer accepts some other trainer and returns bagged version of it. + * Resulting model consists of submodels results of which are aggregated by a specified aggregator. + * Bagging is done + * on both samples and features (https://en.wikipedia.org/wiki/Bootstrap_aggregating";>Samples bagging, + * https://en.wikipedia.org/wiki/Random_subspace_method";>Features bagging). + * + * @param Type of model produced by trainer for which bagged version is created. + * @param Type of labels. + * @param Type of trainer for which bagged version is created. + */ +public class BaggedTrainer, L, T extends DatasetTrainer> extends +DatasetTrainer { +/** Trainer for which bagged version is created. */ +private final DatasetTrainer tr; + +/** Aggregator of submodels results. */ +private final PredictionsAggregator aggregator; + +/** Count of submodels in the ensemble. */ +private final int ensembleSize; + +/** Ratio determining which part of dataset will be taken as subsample for each submodel training. */ +private final double subsampleRatio; + +/** Dimensionality of feature vectors. */ +private final int featuresVectorSize; + +/** Dimension of subspace on which all samples from subsample are projected. */ +private final int featureSubspaceDim; + +/** + * Construct instance of this class with given parameters. + * + * @param tr Trainer for making bagged. + * @param aggregator Aggregator of models. + * @param ensembleSize Size of ensemble. + * @param subsampleRatio Ratio (subsample size) / (initial dataset size). + * @param featuresVectorSize Dimensionality of feature vector. + * @param featureSubspaceDim Dimensionality of feature subspace. + */ +public BaggedTrainer(DatasetTrainer tr, +PredictionsAggregator aggregator, int ensembleSize, double subsampleRatio, int featuresVectorSize, +int featureSubspaceDim) { +this.tr = tr; +this.aggregator = aggregator; +this.ensembleSize = ensembleSize; +this.subsampleRatio = subsampleRatio; +this.featuresVectorSize = featuresVectorSize; +this.featureSubspaceDim = featureSubspaceDim; +} + +/** + * Create trainer bagged trainer. + * + * @return Bagged trainer. + */ +private DatasetTrainer, L> getTrainer() { +List mappings = (featuresVectorSize > 0 && featureSubspaceDim != featuresVectorSize) ? +IntStream.range(0, ensembleSize).mapToObj( +modelIdx -> getMapping( +featuresVectorSize, +featureSubspaceDim, +
[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training URL: https://github.com/apache/ignite/pull/5767#discussion_r247901728 ## File path: modules/ml/src/main/java/org/apache/ignite/ml/composition/stacking/StackedDatasetTrainer.java ## @@ -254,62 +233,23 @@ public StackedDatasetTrainer() { IgniteBiFunction featureExtractor, IgniteBiFunction lbExtractor) { -return update(null, datasetBuilder, featureExtractor, lbExtractor); +return new StackedModel<>(getTrainer().fit(datasetBuilder, featureExtractor, lbExtractor)); } /** {@inheritDoc} */ @Override public StackedModel update(StackedModel mdl, DatasetBuilder datasetBuilder, IgniteBiFunction featureExtractor, IgniteBiFunction lbExtractor) { -return runOnSubmodels( -ensemble -> { -List>> res = new ArrayList<>(); -for (int i = 0; i < ensemble.size(); i++) { -final int j = i; -res.add(() -> { -DatasetTrainer, L> trainer = ensemble.get(j); -return mdl == null ? -trainer.fit(datasetBuilder, featureExtractor, lbExtractor) : -trainer.update(mdl.submodels().get(j), datasetBuilder, featureExtractor, lbExtractor); -}); -} -return res; -}, -(at, extr) -> mdl == null ? -at.fit(datasetBuilder, extr, lbExtractor) : -at.update(mdl.aggregatorModel(), datasetBuilder, extr, lbExtractor), -featureExtractor -); -} -/** {@inheritDoc} */ -@Override public StackedDatasetTrainer withEnvironmentBuilder( -LearningEnvironmentBuilder envBuilder) { -submodelsTrainers = -submodelsTrainers.stream().map(x -> x.withEnvironmentBuilder(envBuilder)).collect(Collectors.toList()); -aggregatorTrainer = aggregatorTrainer.withEnvironmentBuilder(envBuilder); - -return this; +return new StackedModel<>(getTrainer().update(mdl, datasetBuilder, featureExtractor, lbExtractor)); } /** - * - * 1. Obtain models produced by running specified tasks; - * 2. run other specified task on dataset augmented with results of models from step 2. - * + * Get the trainer for stacking. * - * @param taskSupplier Function used to generate tasks for first step. - * @param aggregatorProcessor Function used - * @param featureExtractor Feature extractor. - * @param Type of keys in upstream. - * @param Type of values in upstream. - * @return {@link StackedModel}. + * @return Trainer for stacking. */ -private StackedModel runOnSubmodels( -IgniteFunction, L>>, List>>> taskSupplier, -IgniteBiFunction, IgniteBiFunction, AM> aggregatorProcessor, -IgniteBiFunction featureExtractor) { - +private DatasetTrainer, L> getTrainer() { Review comment: Separated consistency checking into a separate method. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training URL: https://github.com/apache/ignite/pull/5767#discussion_r247890231 ## File path: modules/ml/src/main/java/org/apache/ignite/ml/composition/combinators/parallel/TrainersParallelComposition.java ## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.ignite.ml.composition.combinators.parallel; + +import java.util.ArrayList; +import java.util.List; +import java.util.stream.Collectors; +import org.apache.ignite.ml.IgniteModel; +import org.apache.ignite.ml.composition.CompositionUtils; +import org.apache.ignite.ml.dataset.DatasetBuilder; +import org.apache.ignite.ml.environment.parallelism.Promise; +import org.apache.ignite.ml.math.functions.IgniteBiFunction; +import org.apache.ignite.ml.math.functions.IgniteSupplier; +import org.apache.ignite.ml.math.primitives.vector.Vector; +import org.apache.ignite.ml.trainers.DatasetTrainer; + +/** + * This class represents a parallel composition of trainers. + * Parallel composition of trainers is a trainer itself which trains a list of trainers with same + * input and output. Training is done in following manner: + * + * 1. Independently train all trainers on the same dataset and get a list of models. + * 2. Combine models produced in step (1) into a {@link ModelsParallelComposition}. + * + * Updating is made in a similar fashion. + * Like in other trainers combinators we avoid to include type of contained trainers in type parameters + * because otherwise compositions of compositions would have a relatively complex generic type which will + * reduce readability. + * + * @param Type of trainers inputs. + * @param Type of trainers outputs. + * @param Type of dataset labels. + */ +public class TrainersParallelComposition extends DatasetTrainer>, L> { +/** List of trainers. */ +private final List, L>> trainers; + +/** + * Construct an instance of this class from a list of trainers. + * + * @param trainers Trainers. + * @param Type of mode + * @param Review comment: Fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training URL: https://github.com/apache/ignite/pull/5767#discussion_r247890198 ## File path: modules/ml/src/main/java/org/apache/ignite/ml/composition/combinators/parallel/TrainersParallelComposition.java ## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.ignite.ml.composition.combinators.parallel; + +import java.util.ArrayList; +import java.util.List; +import java.util.stream.Collectors; +import org.apache.ignite.ml.IgniteModel; +import org.apache.ignite.ml.composition.CompositionUtils; +import org.apache.ignite.ml.dataset.DatasetBuilder; +import org.apache.ignite.ml.environment.parallelism.Promise; +import org.apache.ignite.ml.math.functions.IgniteBiFunction; +import org.apache.ignite.ml.math.functions.IgniteSupplier; +import org.apache.ignite.ml.math.primitives.vector.Vector; +import org.apache.ignite.ml.trainers.DatasetTrainer; + +/** + * This class represents a parallel composition of trainers. + * Parallel composition of trainers is a trainer itself which trains a list of trainers with same + * input and output. Training is done in following manner: + * + * 1. Independently train all trainers on the same dataset and get a list of models. + * 2. Combine models produced in step (1) into a {@link ModelsParallelComposition}. + * + * Updating is made in a similar fashion. + * Like in other trainers combinators we avoid to include type of contained trainers in type parameters + * because otherwise compositions of compositions would have a relatively complex generic type which will + * reduce readability. + * + * @param Type of trainers inputs. + * @param Type of trainers outputs. + * @param Type of dataset labels. + */ +public class TrainersParallelComposition extends DatasetTrainer>, L> { +/** List of trainers. */ +private final List, L>> trainers; + +/** + * Construct an instance of this class from a list of trainers. + * + * @param trainers Trainers. + * @param Type of mode + * @param + */ +public , T extends DatasetTrainer, L>> TrainersParallelComposition( +List trainers) { +this.trainers = trainers.stream().map(CompositionUtils::unsafeCoerce).collect(Collectors.toList()); +} + +public static , T extends DatasetTrainer, L> TrainersParallelComposition of(List trainers) { +List, L>> trs = + trainers.stream().map(CompositionUtils::unsafeCoerce).collect(Collectors.toList()); + +return new TrainersParallelComposition<>(trs); +} + +/** {@inheritDoc} */ +@Override public IgniteModel> fit(DatasetBuilder datasetBuilder, +IgniteBiFunction featureExtractor, IgniteBiFunction lbExtractor) { +List>> tasks = trainers.stream() +.map(tr -> (IgniteSupplier>)(() -> tr.fit(datasetBuilder, featureExtractor, lbExtractor))) +.collect(Collectors.toList()); + +List> mdls = environment.parallelismStrategy().submit(tasks).stream() +.map(Promise::unsafeGet) +.collect(Collectors.toList()); + +return new ModelsParallelComposition<>(mdls); +} + +/** {@inheritDoc} */ +@Override public IgniteModel> update(IgniteModel> mdl, DatasetBuilder datasetBuilder, +IgniteBiFunction featureExtractor, IgniteBiFunction lbExtractor) { +// Unsafe. +ModelsParallelComposition typedMdl = (ModelsParallelComposition)mdl; + +assert typedMdl.submodels().size() == trainers.size(); +List> mdls = new ArrayList<>(); + +for (int i = 0; i < trainers.size(); i++) +mdls.add(trainers.get(i).update(typedMdl.submodels().get(i), datasetBuilder, featureExtractor, lbExtractor)); + +return new ModelsParallelComposition<>(mdls); +} + +/** {@inheritDoc} */ +@Override protected boolean checkState(IgniteModel> mdl) { Review comment: Fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the sp
[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training URL: https://github.com/apache/ignite/pull/5767#discussion_r247889971 ## File path: modules/ml/src/main/java/org/apache/ignite/ml/dataset/UpstreamTransformer.java ## @@ -22,29 +22,15 @@ /** * Interface of transformer of upstream. - * - * @param Type of keys in the upstream. - * @param Type of values in the upstream. */ // TODO: IGNITE-10297: Investigate possibility of API change. @FunctionalInterface -public interface UpstreamTransformer extends Serializable { +public interface UpstreamTransformer extends Serializable { Review comment: We want to take emphasis that `UpstreamTransformer` is not for chenging of contents of upstream, but only for the change of the form. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training URL: https://github.com/apache/ignite/pull/5767#discussion_r247890064 ## File path: modules/ml/src/main/java/org/apache/ignite/ml/composition/stacking/StackedDatasetTrainer.java ## @@ -402,11 +346,12 @@ public StackedDatasetTrainer() { IgniteBiFunction featureExtractor, IgniteBiFunction lbExtractor) { // This method is never called, we override "update" instead. -return null; +throw new IllegalStateException(); } /** {@inheritDoc} */ @Override protected boolean checkState(StackedModel mdl) { -return true; +// Should be never called. +throw new IllegalStateException(); Review comment: Fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training URL: https://github.com/apache/ignite/pull/5767#discussion_r247890183 ## File path: modules/ml/src/main/java/org/apache/ignite/ml/composition/combinators/parallel/TrainersParallelComposition.java ## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.ignite.ml.composition.combinators.parallel; + +import java.util.ArrayList; +import java.util.List; +import java.util.stream.Collectors; +import org.apache.ignite.ml.IgniteModel; +import org.apache.ignite.ml.composition.CompositionUtils; +import org.apache.ignite.ml.dataset.DatasetBuilder; +import org.apache.ignite.ml.environment.parallelism.Promise; +import org.apache.ignite.ml.math.functions.IgniteBiFunction; +import org.apache.ignite.ml.math.functions.IgniteSupplier; +import org.apache.ignite.ml.math.primitives.vector.Vector; +import org.apache.ignite.ml.trainers.DatasetTrainer; + +/** + * This class represents a parallel composition of trainers. + * Parallel composition of trainers is a trainer itself which trains a list of trainers with same + * input and output. Training is done in following manner: + * + * 1. Independently train all trainers on the same dataset and get a list of models. + * 2. Combine models produced in step (1) into a {@link ModelsParallelComposition}. + * + * Updating is made in a similar fashion. + * Like in other trainers combinators we avoid to include type of contained trainers in type parameters + * because otherwise compositions of compositions would have a relatively complex generic type which will + * reduce readability. + * + * @param Type of trainers inputs. + * @param Type of trainers outputs. + * @param Type of dataset labels. + */ +public class TrainersParallelComposition extends DatasetTrainer>, L> { +/** List of trainers. */ +private final List, L>> trainers; + +/** + * Construct an instance of this class from a list of trainers. + * + * @param trainers Trainers. + * @param Type of mode + * @param + */ +public , T extends DatasetTrainer, L>> TrainersParallelComposition( +List trainers) { +this.trainers = trainers.stream().map(CompositionUtils::unsafeCoerce).collect(Collectors.toList()); +} + +public static , T extends DatasetTrainer, L> TrainersParallelComposition of(List trainers) { +List, L>> trs = + trainers.stream().map(CompositionUtils::unsafeCoerce).collect(Collectors.toList()); + +return new TrainersParallelComposition<>(trs); +} + +/** {@inheritDoc} */ +@Override public IgniteModel> fit(DatasetBuilder datasetBuilder, +IgniteBiFunction featureExtractor, IgniteBiFunction lbExtractor) { +List>> tasks = trainers.stream() +.map(tr -> (IgniteSupplier>)(() -> tr.fit(datasetBuilder, featureExtractor, lbExtractor))) +.collect(Collectors.toList()); + +List> mdls = environment.parallelismStrategy().submit(tasks).stream() +.map(Promise::unsafeGet) +.collect(Collectors.toList()); + +return new ModelsParallelComposition<>(mdls); +} + +/** {@inheritDoc} */ +@Override public IgniteModel> update(IgniteModel> mdl, DatasetBuilder datasetBuilder, +IgniteBiFunction featureExtractor, IgniteBiFunction lbExtractor) { +// Unsafe. +ModelsParallelComposition typedMdl = (ModelsParallelComposition)mdl; + +assert typedMdl.submodels().size() == trainers.size(); +List> mdls = new ArrayList<>(); + +for (int i = 0; i < trainers.size(); i++) +mdls.add(trainers.get(i).update(typedMdl.submodels().get(i), datasetBuilder, featureExtractor, lbExtractor)); + +return new ModelsParallelComposition<>(mdls); +} + +/** {@inheritDoc} */ +@Override protected boolean checkState(IgniteModel> mdl) { +// Never called. +throw new IllegalStateException(); +} + +/** {@inheritDoc} */ Review comment: Fixed. This is an automated message from the
[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training URL: https://github.com/apache/ignite/pull/5767#discussion_r247890213 ## File path: modules/ml/src/main/java/org/apache/ignite/ml/composition/combinators/parallel/TrainersParallelComposition.java ## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.ignite.ml.composition.combinators.parallel; + +import java.util.ArrayList; +import java.util.List; +import java.util.stream.Collectors; +import org.apache.ignite.ml.IgniteModel; +import org.apache.ignite.ml.composition.CompositionUtils; +import org.apache.ignite.ml.dataset.DatasetBuilder; +import org.apache.ignite.ml.environment.parallelism.Promise; +import org.apache.ignite.ml.math.functions.IgniteBiFunction; +import org.apache.ignite.ml.math.functions.IgniteSupplier; +import org.apache.ignite.ml.math.primitives.vector.Vector; +import org.apache.ignite.ml.trainers.DatasetTrainer; + +/** + * This class represents a parallel composition of trainers. + * Parallel composition of trainers is a trainer itself which trains a list of trainers with same + * input and output. Training is done in following manner: + * + * 1. Independently train all trainers on the same dataset and get a list of models. + * 2. Combine models produced in step (1) into a {@link ModelsParallelComposition}. + * + * Updating is made in a similar fashion. + * Like in other trainers combinators we avoid to include type of contained trainers in type parameters + * because otherwise compositions of compositions would have a relatively complex generic type which will + * reduce readability. + * + * @param Type of trainers inputs. + * @param Type of trainers outputs. + * @param Type of dataset labels. + */ +public class TrainersParallelComposition extends DatasetTrainer>, L> { +/** List of trainers. */ +private final List, L>> trainers; + +/** + * Construct an instance of this class from a list of trainers. + * + * @param trainers Trainers. + * @param Type of mode + * @param + */ +public , T extends DatasetTrainer, L>> TrainersParallelComposition( +List trainers) { +this.trainers = trainers.stream().map(CompositionUtils::unsafeCoerce).collect(Collectors.toList()); +} + +public static , T extends DatasetTrainer, L> TrainersParallelComposition of(List trainers) { Review comment: Fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training URL: https://github.com/apache/ignite/pull/5767#discussion_r247890133 ## File path: modules/ml/src/main/java/org/apache/ignite/ml/composition/combinators/sequential/TrainersSequentialComposition.java ## @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.ignite.ml.composition.combinators.sequential; + +import org.apache.ignite.ml.IgniteModel; +import org.apache.ignite.ml.composition.CompositionUtils; +import org.apache.ignite.ml.composition.DatasetMapping; +import org.apache.ignite.ml.dataset.DatasetBuilder; +import org.apache.ignite.ml.math.functions.IgniteBiFunction; +import org.apache.ignite.ml.math.functions.IgniteFunction; +import org.apache.ignite.ml.math.primitives.vector.Vector; +import org.apache.ignite.ml.math.primitives.vector.VectorUtils; +import org.apache.ignite.ml.trainers.DatasetTrainer; +import sun.reflect.generics.reflectiveObjects.NotImplementedException; + +/** + * Sequential composition of trainers. + * Sequential composition of trainers is itself trainer which produces {@link ModelsSequentialComposition}. + * Training is done in following fashion: + * + * 1. First trainer is trained and `mdl1` is produced. + * 2. From `mdl1` {@link DatasetMapping} is constructed. This mapping `dsM` encapsulates dependency between first + * training result and second trainer. + * 3. Second trainer is trained using dataset aquired from application `dsM` to original dataset; `mdl2` is produced. + * 4. `mdl1` and `mdl2` are composed into {@link ModelsSequentialComposition}. + * + * + * @param Type of input of model produced by first trainer. + * @param Type of output of model produced by first trainer. + * @param Type of output of model produced by second trainer. + * @param Type of labels. + */ +public class TrainersSequentialComposition extends DatasetTrainer, L> { +/** First trainer. */ +private DatasetTrainer, L> tr1; + +/** Second trainer. */ +private DatasetTrainer, L> tr2; + +/** Dataset mapping. */ +private IgniteFunction, DatasetMapping> datasetMapping; + +/** + * Construct sequential composition of given two trainers. + * + * @param tr1 First trainer. + * @param tr2 Second trainer. + * @param datasetMapping Dataset mapping. + */ +public TrainersSequentialComposition(DatasetTrainer, L> tr1, +DatasetTrainer, L> tr2, +IgniteFunction, DatasetMapping> datasetMapping) { +this.tr1 = CompositionUtils.unsafeCoerce(tr1); +this.tr2 = CompositionUtils.unsafeCoerce(tr2); +this.datasetMapping = datasetMapping; +} + +/** {@inheritDoc} */ +@Override public ModelsSequentialComposition fit(DatasetBuilder datasetBuilder, +IgniteBiFunction featureExtractor, IgniteBiFunction lbExtractor) { + +IgniteModel mdl1 = tr1.fit(datasetBuilder, featureExtractor, lbExtractor); +DatasetMapping mapping = datasetMapping.apply(mdl1); + +IgniteModel mdl2 = tr2.fit(datasetBuilder, +featureExtractor.andThen(mapping::mapFeatures), +lbExtractor.andThen(mapping::mapLabels)); + +return new ModelsSequentialComposition<>(mdl1, mdl2); +} + +/** {@inheritDoc} */ +@Override public ModelsSequentialComposition update( +ModelsSequentialComposition mdl, DatasetBuilder datasetBuilder, +IgniteBiFunction featureExtractor, IgniteBiFunction lbExtractor) { + +IgniteModel firstUpdated = tr1.update(mdl.firstModel(), datasetBuilder, featureExtractor, lbExtractor); +DatasetMapping mapping = datasetMapping.apply(firstUpdated); + +IgniteModel secondUpdated = tr2.update(mdl.secondModel(), +datasetBuilder, +featureExtractor.andThen(mapping::mapFeatures), +lbExtractor.andThen(mapping::mapLabels)); + +return new ModelsSequentialComposition<>(firstUpdated, secondUpdated); +} + +/** {@inheritDoc} */ +@Override protected boolean checkState(ModelsSequentialComposition mdl) { +// Never called. +throw new Illeg
[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training URL: https://github.com/apache/ignite/pull/5767#discussion_r247889971 ## File path: modules/ml/src/main/java/org/apache/ignite/ml/dataset/UpstreamTransformer.java ## @@ -22,29 +22,15 @@ /** * Interface of transformer of upstream. - * - * @param Type of keys in the upstream. - * @param Type of values in the upstream. */ // TODO: IGNITE-10297: Investigate possibility of API change. @FunctionalInterface -public interface UpstreamTransformer extends Serializable { +public interface UpstreamTransformer extends Serializable { Review comment: We want to take emphasis that `UpstreamTransformer` is not for changing of contents of upstream, but only for the change of the form. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training URL: https://github.com/apache/ignite/pull/5767#discussion_r247890081 ## File path: modules/ml/src/main/java/org/apache/ignite/ml/composition/stacking/StackedDatasetTrainer.java ## @@ -402,11 +346,12 @@ public StackedDatasetTrainer() { IgniteBiFunction featureExtractor, IgniteBiFunction lbExtractor) { // This method is never called, we override "update" instead. -return null; +throw new IllegalStateException(); Review comment: Fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training URL: https://github.com/apache/ignite/pull/5767#discussion_r247890104 ## File path: modules/ml/src/main/java/org/apache/ignite/ml/composition/stacking/StackedDatasetTrainer.java ## @@ -322,59 +262,63 @@ public StackedDatasetTrainer() { if (aggregatingInputMerger == null) throw new IllegalStateException("Binary operator used to convert outputs of submodels is not specified"); -List>> mdlSuppliers = taskSupplier.apply(submodelsTrainers); +List, L>> subs = new ArrayList<>(); +if (submodelInput2AggregatingInputConverter != null) { +DatasetTrainer, L> id = DatasetTrainer.identityTrainer(); +DatasetTrainer, L> mappedId = CompositionUtils.unsafeCoerce( + AdaptableDatasetTrainer.of(id).afterTrainedModel(submodelInput2AggregatingInputConverter)); +subs.add(mappedId); +} -List> subMdls = environment.parallelismStrategy().submit(mdlSuppliers).stream() -.map(Promise::unsafeGet) -.collect(Collectors.toList()); +subs.addAll(submodelsTrainers); -// Add new columns consisting in submodels output in features. -IgniteBiFunction augmentedExtractor = getFeatureExtractorForAggregator(featureExtractor, -subMdls, -submodelInput2AggregatingInputConverter, +TrainersParallelComposition composition = new TrainersParallelComposition<>(subs); + +IgniteBiFunction>, Vector, Vector> featureMapper = getFeatureExtractorForAggregator( submodelOutput2VectorConverter, vector2SubmodelInputConverter); -AM aggregator = aggregatorProcessor.apply(aggregatorTrainer, augmentedExtractor); +return AdaptableDatasetTrainer +.of(composition) +.afterTrainedModel(lst -> lst.stream().reduce(aggregatingInputMerger).get()) +.andThen(aggregatorTrainer, model -> new DatasetMapping() { +@Override public Vector mapFeatures(Vector v) { +List> models = ((ModelsParallelComposition)model.innerModel()).submodels(); +return featureMapper.apply(models, v); +} -StackedModel res = new StackedModel<>( -aggregator, -aggregatingInputMerger, -submodelInput2AggregatingInputConverter); +@Override public L mapLabels(L lbl) { +return lbl; +} +}).unsafeSimplyTyped(); +} -for (IgniteModel subMdl : subMdls) -res.addSubmodel(subMdl); +/** {@inheritDoc} */ +@Override public StackedDatasetTrainer withEnvironmentBuilder( +LearningEnvironmentBuilder envBuilder) { +submodelsTrainers = +submodelsTrainers.stream().map(x -> x.withEnvironmentBuilder(envBuilder)).collect(Collectors.toList()); +aggregatorTrainer = aggregatorTrainer.withEnvironmentBuilder(envBuilder); -return res; +return this; } /** * Get feature extractor which will be used for aggregator trainer from original feature extractor. * This method is static to make sure that we will not grab context of instance in serialization. * - * @param featureExtractor Original feature extractor. - * @param subMdls Submodels. * @param Type of upstream keys. * @param Type of upstream values. * @return Feature extractor which will be used for aggregator trainer from original feature extractor. */ -private static IgniteBiFunction getFeatureExtractorForAggregator( -IgniteBiFunction featureExtractor, List> subMdls, -IgniteFunction submodelInput2AggregatingInputConverter, +private static IgniteBiFunction>, Vector, Vector> getFeatureExtractorForAggregator( Review comment: Fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training URL: https://github.com/apache/ignite/pull/5767#discussion_r247885943 ## File path: modules/ml/src/main/java/org/apache/ignite/ml/composition/combinators/parallel/TrainersParallelComposition.java ## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.ignite.ml.composition.combinators.parallel; + +import java.util.ArrayList; +import java.util.List; +import java.util.stream.Collectors; +import org.apache.ignite.ml.IgniteModel; +import org.apache.ignite.ml.composition.CompositionUtils; +import org.apache.ignite.ml.dataset.DatasetBuilder; +import org.apache.ignite.ml.environment.parallelism.Promise; +import org.apache.ignite.ml.math.functions.IgniteBiFunction; +import org.apache.ignite.ml.math.functions.IgniteSupplier; +import org.apache.ignite.ml.math.primitives.vector.Vector; +import org.apache.ignite.ml.trainers.DatasetTrainer; + +/** + * This class represents a parallel composition of trainers. + * Parallel composition of trainers is a trainer itself which trains a list of trainers with same + * input and output. Training is done in following manner: + * + * 1. Independently train all trainers on the same dataset and get a list of models. + * 2. Combine models produced in step (1) into a {@link ModelsParallelComposition}. + * + * Updating is made in a similar fashion. + * Like in other trainers combinators we avoid to include type of contained trainers in type parameters + * because otherwise compositions of compositions would have a relatively complex generic type which will + * reduce readability. + * + * @param Type of trainers inputs. + * @param Type of trainers outputs. + * @param Type of dataset labels. + */ +public class TrainersParallelComposition extends DatasetTrainer>, L> { +/** List of trainers. */ +private final List, L>> trainers; + +/** + * Construct an instance of this class from a list of trainers. + * + * @param trainers Trainers. + * @param Type of mode + * @param + */ +public , T extends DatasetTrainer, L>> TrainersParallelComposition( +List trainers) { +this.trainers = trainers.stream().map(CompositionUtils::unsafeCoerce).collect(Collectors.toList()); +} + +public static , T extends DatasetTrainer, L> TrainersParallelComposition of(List trainers) { +List, L>> trs = + trainers.stream().map(CompositionUtils::unsafeCoerce).collect(Collectors.toList()); + +return new TrainersParallelComposition<>(trs); +} + +/** {@inheritDoc} */ +@Override public IgniteModel> fit(DatasetBuilder datasetBuilder, +IgniteBiFunction featureExtractor, IgniteBiFunction lbExtractor) { +List>> tasks = trainers.stream() +.map(tr -> (IgniteSupplier>)(() -> tr.fit(datasetBuilder, featureExtractor, lbExtractor))) +.collect(Collectors.toList()); + +List> mdls = environment.parallelismStrategy().submit(tasks).stream() +.map(Promise::unsafeGet) +.collect(Collectors.toList()); + +return new ModelsParallelComposition<>(mdls); +} + +/** {@inheritDoc} */ +@Override public IgniteModel> update(IgniteModel> mdl, DatasetBuilder datasetBuilder, +IgniteBiFunction featureExtractor, IgniteBiFunction lbExtractor) { +// Unsafe. +ModelsParallelComposition typedMdl = (ModelsParallelComposition)mdl; + +assert typedMdl.submodels().size() == trainers.size(); +List> mdls = new ArrayList<>(); + +for (int i = 0; i < trainers.size(); i++) +mdls.add(trainers.get(i).update(typedMdl.submodels().get(i), datasetBuilder, featureExtractor, lbExtractor)); Review comment: Thanks, nice catch, done. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Ser
[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training URL: https://github.com/apache/ignite/pull/5767#discussion_r247885961 ## File path: modules/ml/src/main/java/org/apache/ignite/ml/util/Utils.java ## @@ -130,4 +132,50 @@ Spliterators.spliteratorUnknownSize(iter, Spliterator.ORDERED), false); } + +/** + * Zips two streams (in functional sense of zipping) i.e. returns stream consisting + * of results of applying zipper to corresponding entries of two stream. + * + * @param a First stream. + * @param b Second stream. + * @param zipper Bi-function combining two streams. + * @param Type of first stream entries. + * @param Type of secong stream entries. + * @param Type of zipper output. + * @return Two streams zipped together. + */ +public static Stream zip(Stream a, Review comment: Removed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training URL: https://github.com/apache/ignite/pull/5767#discussion_r247885916 ## File path: modules/ml/src/main/java/org/apache/ignite/ml/composition/CompositionUtils.java ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.ignite.ml.composition; + +import org.apache.ignite.ml.IgniteModel; +import org.apache.ignite.ml.dataset.DatasetBuilder; +import org.apache.ignite.ml.math.functions.IgniteBiFunction; +import org.apache.ignite.ml.math.primitives.vector.Vector; +import org.apache.ignite.ml.trainers.DatasetTrainer; + +/** + * Various utility functions for trainers composition. + */ +public class CompositionUtils { +/** + * Perform blurring of model type of given trainer to {@code IgniteModel}, where I, O are input and output + * types of original model. + * + * @param trainer Trainer to coerce. + * @param Type of input of model produced by coerced trainer. + * @param Type of output of model produced by coerced trainer. + * @param Type of model produced by coerced trainer. + * @param Type of labels. + * @return Trainer coerced to {@code DatasetTrainer, L>}. + */ +public static , L> DatasetTrainer, L> unsafeCoerce( +DatasetTrainer trainer) { +return new DatasetTrainer, L>() { +/** {@inheritDoc} */ +@Override public IgniteModel fit(DatasetBuilder datasetBuilder, +IgniteBiFunction featureExtractor, IgniteBiFunction lbExtractor) { +return trainer.fit(datasetBuilder, featureExtractor, lbExtractor); +} + +/** {@inheritDoc} */ +@Override public IgniteModel update(IgniteModel mdl, DatasetBuilder datasetBuilder, +IgniteBiFunction featureExtractor, IgniteBiFunction lbExtractor) { +DatasetTrainer, L> trainer1 = (DatasetTrainer, L>)trainer; +return trainer1.update(mdl, datasetBuilder, featureExtractor, lbExtractor); +} + +/** {@inheritDoc} */ +@Override protected boolean checkState(IgniteModel mdl) { +return true; Review comment: Agree, done. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training URL: https://github.com/apache/ignite/pull/5767#discussion_r247885899 ## File path: modules/ml/src/main/java/org/apache/ignite/ml/trainers/DatasetTrainer.java ## @@ -362,4 +362,29 @@ public EmptyDatasetException() { } } +/** + * Returns the trainer which returns identity model. + * + * @param Type of model input. + * @param Type of labels in dataset. + * @return Trainer which returns identity model. + */ +public static DatasetTrainer, L> identityTrainer() { +return new DatasetTrainer, L>() { +@Override public IgniteModel fit(DatasetBuilder datasetBuilder, +IgniteBiFunction featureExtractor, +IgniteBiFunction lbExtractor) { +return x -> x; +} + +@Override protected boolean checkState(IgniteModel mdl) { +return true; +} + +@Override protected IgniteModel updateModel(IgniteModel mdl, DatasetBuilder datasetBuilder, Review comment: Fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training URL: https://github.com/apache/ignite/pull/5767#discussion_r247885884 ## File path: modules/ml/src/main/java/org/apache/ignite/ml/trainers/DatasetTrainer.java ## @@ -362,4 +362,29 @@ public EmptyDatasetException() { } } +/** + * Returns the trainer which returns identity model. + * + * @param Type of model input. + * @param Type of labels in dataset. + * @return Trainer which returns identity model. + */ +public static DatasetTrainer, L> identityTrainer() { +return new DatasetTrainer, L>() { +@Override public IgniteModel fit(DatasetBuilder datasetBuilder, +IgniteBiFunction featureExtractor, +IgniteBiFunction lbExtractor) { +return x -> x; +} + +@Override protected boolean checkState(IgniteModel mdl) { Review comment: Fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training URL: https://github.com/apache/ignite/pull/5767#discussion_r247885840 ## File path: modules/ml/src/main/java/org/apache/ignite/ml/math/functions/IgniteFunction.java ## @@ -18,6 +18,7 @@ package org.apache.ignite.ml.math.functions; import java.io.Serializable; +import java.util.Objects; Review comment: Fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training URL: https://github.com/apache/ignite/pull/5767#discussion_r247885871 ## File path: modules/ml/src/main/java/org/apache/ignite/ml/trainers/AdaptableDatasetTrainer.java ## @@ -56,27 +68,46 @@ * @param Type of labels. * @return Instance of this class. */ -public static , L> AdaptableDatasetTrainer of(DatasetTrainer wrapped) { -return new AdaptableDatasetTrainer<>(IgniteFunction.identity(), wrapped, IgniteFunction.identity()); +public static , L> AdaptableDatasetTrainer of( +DatasetTrainer wrapped) { +return new AdaptableDatasetTrainer<>(IgniteFunction.identity(), +wrapped, +IgniteFunction.identity(), +IgniteFunction.identity(), +IgniteFunction.identity(), +UpstreamTransformerBuilder.identity()); } /** * Construct instance of this class with specified wrapped trainer and converter functions. * * @param before Function used to convert input type of wrapped trainer. - * @param wrapped Wrapped trainer. + * @param wrapped Wrapped trainer. * @param after Function used to convert output type of wrapped trainer. + * @param extractor Review comment: Fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training URL: https://github.com/apache/ignite/pull/5767#discussion_r247885725 ## File path: modules/ml/src/main/java/org/apache/ignite/ml/composition/DatasetMapping.java ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.ignite.ml.composition; + +import org.apache.ignite.ml.math.functions.IgniteFunction; +import org.apache.ignite.ml.math.primitives.vector.Vector; + +/** + * This class represents dataset mapping. This is just a tuple of two mappings: one for features and one for labels. + * + * @param Type of labels before mapping. + * @param Type of labels after mapping. + */ +public interface DatasetMapping { Review comment: For the moment, no. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training URL: https://github.com/apache/ignite/pull/5767#discussion_r247885530 ## File path: modules/ml/src/main/java/org/apache/ignite/ml/composition/bagging/BaggedTrainer.java ## @@ -0,0 +1,200 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.ignite.ml.composition.bagging; + +import java.util.Collections; +import java.util.List; +import java.util.Random; +import java.util.stream.Collectors; +import java.util.stream.IntStream; +import java.util.stream.Stream; +import org.apache.ignite.ml.IgniteModel; +import org.apache.ignite.ml.composition.CompositionUtils; +import org.apache.ignite.ml.composition.combinators.parallel.TrainersParallelComposition; +import org.apache.ignite.ml.composition.predictionsaggregator.PredictionsAggregator; +import org.apache.ignite.ml.dataset.DatasetBuilder; +import org.apache.ignite.ml.environment.LearningEnvironmentBuilder; +import org.apache.ignite.ml.math.functions.IgniteBiFunction; +import org.apache.ignite.ml.math.functions.IgniteFunction; +import org.apache.ignite.ml.math.primitives.vector.Vector; +import org.apache.ignite.ml.math.primitives.vector.VectorUtils; +import org.apache.ignite.ml.trainers.AdaptableDatasetTrainer; +import org.apache.ignite.ml.trainers.DatasetTrainer; +import org.apache.ignite.ml.trainers.transformers.BaggingUpstreamTransformer; +import org.apache.ignite.ml.util.Utils; + +/** + * Trainer encapsulating logic of bootstrap aggregating (bagging). + * This trainer accepts some other trainer and returns bagged version of it. + * Resulting model consists of submodels results of which are aggregated by a specified aggregator. + * Bagging is done + * on both samples and features (https://en.wikipedia.org/wiki/Bootstrap_aggregating";>Samples bagging, + * https://en.wikipedia.org/wiki/Random_subspace_method";>Features bagging). + * + * @param Type of model produced by trainer for which bagged version is created. + * @param Type of labels. + * @param Type of trainer for which bagged version is created. + */ +public class BaggedTrainer, L, T extends DatasetTrainer> extends +DatasetTrainer { +/** Trainer for which bagged version is created. */ +private final DatasetTrainer tr; + +/** Aggregator of submodels results. */ +private final PredictionsAggregator aggregator; + +/** Count of submodels in the ensemble. */ +private final int ensembleSize; + +/** Ratio determining which part of dataset will be taken as subsample for each submodel training. */ +private final double subsampleRatio; + +/** Dimensionality of feature vectors. */ +private final int featuresVectorSize; + +/** Dimension of subspace on which all samples from subsample are projected. */ +private final int featureSubspaceDim; + +/** + * Construct instance of this class with given parameters. + * + * @param tr Trainer for making bagged. + * @param aggregator Aggregator of models. + * @param ensembleSize Size of ensemble. + * @param subsampleRatio Ratio (subsample size) / (initial dataset size). + * @param featuresVectorSize Dimensionality of feature vector. + * @param featureSubspaceDim Dimensionality of feature subspace. + */ +public BaggedTrainer(DatasetTrainer tr, +PredictionsAggregator aggregator, int ensembleSize, double subsampleRatio, int featuresVectorSize, +int featureSubspaceDim) { +this.tr = tr; +this.aggregator = aggregator; +this.ensembleSize = ensembleSize; +this.subsampleRatio = subsampleRatio; +this.featuresVectorSize = featuresVectorSize; +this.featureSubspaceDim = featureSubspaceDim; +} + +/** + * Create trainer bagged trainer. + * + * @return Bagged trainer. + */ +private DatasetTrainer, L> getTrainer() { +List mappings = (featuresVectorSize > 0 && featureSubspaceDim != featuresVectorSize) ? +IntStream.range(0, ensembleSize).mapToObj( +modelIdx -> getMapping( +featuresVectorSize, +featureSubspaceDim, +
[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training URL: https://github.com/apache/ignite/pull/5767#discussion_r247874720 ## File path: modules/ml/src/main/java/org/apache/ignite/ml/composition/combinators/sequential/TrainersSequentialComposition.java ## @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.ignite.ml.composition.combinators.sequential; + +import org.apache.ignite.ml.IgniteModel; +import org.apache.ignite.ml.composition.CompositionUtils; +import org.apache.ignite.ml.composition.DatasetMapping; +import org.apache.ignite.ml.dataset.DatasetBuilder; +import org.apache.ignite.ml.math.functions.IgniteBiFunction; +import org.apache.ignite.ml.math.functions.IgniteFunction; +import org.apache.ignite.ml.math.primitives.vector.Vector; +import org.apache.ignite.ml.math.primitives.vector.VectorUtils; +import org.apache.ignite.ml.trainers.DatasetTrainer; +import sun.reflect.generics.reflectiveObjects.NotImplementedException; + +/** + * Sequential composition of trainers. + * Sequential composition of trainers is itself trainer which produces {@link ModelsSequentialComposition}. + * Training is done in following fashion: + * + * 1. First trainer is trained and `mdl1` is produced. + * 2. From `mdl1` {@link DatasetMapping} is constructed. This mapping `dsM` encapsulates dependency between first + * training result and second trainer. + * 3. Second trainer is trained using dataset aquired from application `dsM` to original dataset; `mdl2` is produced. + * 4. `mdl1` and `mdl2` are composed into {@link ModelsSequentialComposition}. + * + * + * @param Type of input of model produced by first trainer. + * @param Type of output of model produced by first trainer. + * @param Type of output of model produced by second trainer. + * @param Type of labels. + */ +public class TrainersSequentialComposition extends DatasetTrainer, L> { +/** First trainer. */ +private DatasetTrainer, L> tr1; + +/** Second trainer. */ +private DatasetTrainer, L> tr2; + +/** Dataset mapping. */ +private IgniteFunction, DatasetMapping> datasetMapping; + +/** + * Construct sequential composition of given two trainers. + * + * @param tr1 First trainer. + * @param tr2 Second trainer. + * @param datasetMapping Dataset mapping. + */ +public TrainersSequentialComposition(DatasetTrainer, L> tr1, +DatasetTrainer, L> tr2, +IgniteFunction, DatasetMapping> datasetMapping) { +this.tr1 = CompositionUtils.unsafeCoerce(tr1); +this.tr2 = CompositionUtils.unsafeCoerce(tr2); +this.datasetMapping = datasetMapping; +} + +/** {@inheritDoc} */ +@Override public ModelsSequentialComposition fit(DatasetBuilder datasetBuilder, +IgniteBiFunction featureExtractor, IgniteBiFunction lbExtractor) { + +IgniteModel mdl1 = tr1.fit(datasetBuilder, featureExtractor, lbExtractor); +DatasetMapping mapping = datasetMapping.apply(mdl1); + +IgniteModel mdl2 = tr2.fit(datasetBuilder, +featureExtractor.andThen(mapping::mapFeatures), +lbExtractor.andThen(mapping::mapLabels)); + +return new ModelsSequentialComposition<>(mdl1, mdl2); +} + +/** {@inheritDoc} */ +@Override public ModelsSequentialComposition update( +ModelsSequentialComposition mdl, DatasetBuilder datasetBuilder, +IgniteBiFunction featureExtractor, IgniteBiFunction lbExtractor) { + +IgniteModel firstUpdated = tr1.update(mdl.firstModel(), datasetBuilder, featureExtractor, lbExtractor); +DatasetMapping mapping = datasetMapping.apply(firstUpdated); + +IgniteModel secondUpdated = tr2.update(mdl.secondModel(), +datasetBuilder, +featureExtractor.andThen(mapping::mapFeatures), +lbExtractor.andThen(mapping::mapLabels)); + +return new ModelsSequentialComposition<>(firstUpdated, secondUpdated); +} + +/** {@inheritDoc} */ +@Override protected boolean checkState(ModelsSequentialComposition mdl) { +// Never called. +throw new Illeg
[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training URL: https://github.com/apache/ignite/pull/5767#discussion_r247885547 ## File path: modules/ml/src/main/java/org/apache/ignite/ml/composition/bagging/BaggedTrainer.java ## @@ -0,0 +1,200 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.ignite.ml.composition.bagging; + +import java.util.Collections; +import java.util.List; +import java.util.Random; +import java.util.stream.Collectors; +import java.util.stream.IntStream; +import java.util.stream.Stream; +import org.apache.ignite.ml.IgniteModel; +import org.apache.ignite.ml.composition.CompositionUtils; +import org.apache.ignite.ml.composition.combinators.parallel.TrainersParallelComposition; +import org.apache.ignite.ml.composition.predictionsaggregator.PredictionsAggregator; +import org.apache.ignite.ml.dataset.DatasetBuilder; +import org.apache.ignite.ml.environment.LearningEnvironmentBuilder; +import org.apache.ignite.ml.math.functions.IgniteBiFunction; +import org.apache.ignite.ml.math.functions.IgniteFunction; +import org.apache.ignite.ml.math.primitives.vector.Vector; +import org.apache.ignite.ml.math.primitives.vector.VectorUtils; +import org.apache.ignite.ml.trainers.AdaptableDatasetTrainer; +import org.apache.ignite.ml.trainers.DatasetTrainer; +import org.apache.ignite.ml.trainers.transformers.BaggingUpstreamTransformer; +import org.apache.ignite.ml.util.Utils; + +/** + * Trainer encapsulating logic of bootstrap aggregating (bagging). + * This trainer accepts some other trainer and returns bagged version of it. + * Resulting model consists of submodels results of which are aggregated by a specified aggregator. + * Bagging is done + * on both samples and features (https://en.wikipedia.org/wiki/Bootstrap_aggregating";>Samples bagging, + * https://en.wikipedia.org/wiki/Random_subspace_method";>Features bagging). + * + * @param Type of model produced by trainer for which bagged version is created. + * @param Type of labels. + * @param Type of trainer for which bagged version is created. + */ +public class BaggedTrainer, L, T extends DatasetTrainer> extends +DatasetTrainer { +/** Trainer for which bagged version is created. */ +private final DatasetTrainer tr; + +/** Aggregator of submodels results. */ +private final PredictionsAggregator aggregator; + +/** Count of submodels in the ensemble. */ +private final int ensembleSize; + +/** Ratio determining which part of dataset will be taken as subsample for each submodel training. */ +private final double subsampleRatio; + +/** Dimensionality of feature vectors. */ +private final int featuresVectorSize; + +/** Dimension of subspace on which all samples from subsample are projected. */ +private final int featureSubspaceDim; + +/** + * Construct instance of this class with given parameters. + * + * @param tr Trainer for making bagged. + * @param aggregator Aggregator of models. + * @param ensembleSize Size of ensemble. + * @param subsampleRatio Ratio (subsample size) / (initial dataset size). + * @param featuresVectorSize Dimensionality of feature vector. + * @param featureSubspaceDim Dimensionality of feature subspace. + */ +public BaggedTrainer(DatasetTrainer tr, +PredictionsAggregator aggregator, int ensembleSize, double subsampleRatio, int featuresVectorSize, +int featureSubspaceDim) { +this.tr = tr; +this.aggregator = aggregator; +this.ensembleSize = ensembleSize; +this.subsampleRatio = subsampleRatio; +this.featuresVectorSize = featuresVectorSize; +this.featureSubspaceDim = featureSubspaceDim; +} + +/** + * Create trainer bagged trainer. + * + * @return Bagged trainer. + */ +private DatasetTrainer, L> getTrainer() { +List mappings = (featuresVectorSize > 0 && featureSubspaceDim != featuresVectorSize) ? +IntStream.range(0, ensembleSize).mapToObj( +modelIdx -> getMapping( +featuresVectorSize, +featureSubspaceDim, +
[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training URL: https://github.com/apache/ignite/pull/5767#discussion_r247874720 ## File path: modules/ml/src/main/java/org/apache/ignite/ml/composition/combinators/sequential/TrainersSequentialComposition.java ## @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.ignite.ml.composition.combinators.sequential; + +import org.apache.ignite.ml.IgniteModel; +import org.apache.ignite.ml.composition.CompositionUtils; +import org.apache.ignite.ml.composition.DatasetMapping; +import org.apache.ignite.ml.dataset.DatasetBuilder; +import org.apache.ignite.ml.math.functions.IgniteBiFunction; +import org.apache.ignite.ml.math.functions.IgniteFunction; +import org.apache.ignite.ml.math.primitives.vector.Vector; +import org.apache.ignite.ml.math.primitives.vector.VectorUtils; +import org.apache.ignite.ml.trainers.DatasetTrainer; +import sun.reflect.generics.reflectiveObjects.NotImplementedException; + +/** + * Sequential composition of trainers. + * Sequential composition of trainers is itself trainer which produces {@link ModelsSequentialComposition}. + * Training is done in following fashion: + * + * 1. First trainer is trained and `mdl1` is produced. + * 2. From `mdl1` {@link DatasetMapping} is constructed. This mapping `dsM` encapsulates dependency between first + * training result and second trainer. + * 3. Second trainer is trained using dataset aquired from application `dsM` to original dataset; `mdl2` is produced. + * 4. `mdl1` and `mdl2` are composed into {@link ModelsSequentialComposition}. + * + * + * @param Type of input of model produced by first trainer. + * @param Type of output of model produced by first trainer. + * @param Type of output of model produced by second trainer. + * @param Type of labels. + */ +public class TrainersSequentialComposition extends DatasetTrainer, L> { +/** First trainer. */ +private DatasetTrainer, L> tr1; + +/** Second trainer. */ +private DatasetTrainer, L> tr2; + +/** Dataset mapping. */ +private IgniteFunction, DatasetMapping> datasetMapping; + +/** + * Construct sequential composition of given two trainers. + * + * @param tr1 First trainer. + * @param tr2 Second trainer. + * @param datasetMapping Dataset mapping. + */ +public TrainersSequentialComposition(DatasetTrainer, L> tr1, +DatasetTrainer, L> tr2, +IgniteFunction, DatasetMapping> datasetMapping) { +this.tr1 = CompositionUtils.unsafeCoerce(tr1); +this.tr2 = CompositionUtils.unsafeCoerce(tr2); +this.datasetMapping = datasetMapping; +} + +/** {@inheritDoc} */ +@Override public ModelsSequentialComposition fit(DatasetBuilder datasetBuilder, +IgniteBiFunction featureExtractor, IgniteBiFunction lbExtractor) { + +IgniteModel mdl1 = tr1.fit(datasetBuilder, featureExtractor, lbExtractor); +DatasetMapping mapping = datasetMapping.apply(mdl1); + +IgniteModel mdl2 = tr2.fit(datasetBuilder, +featureExtractor.andThen(mapping::mapFeatures), +lbExtractor.andThen(mapping::mapLabels)); + +return new ModelsSequentialComposition<>(mdl1, mdl2); +} + +/** {@inheritDoc} */ +@Override public ModelsSequentialComposition update( +ModelsSequentialComposition mdl, DatasetBuilder datasetBuilder, +IgniteBiFunction featureExtractor, IgniteBiFunction lbExtractor) { + +IgniteModel firstUpdated = tr1.update(mdl.firstModel(), datasetBuilder, featureExtractor, lbExtractor); +DatasetMapping mapping = datasetMapping.apply(firstUpdated); + +IgniteModel secondUpdated = tr2.update(mdl.secondModel(), +datasetBuilder, +featureExtractor.andThen(mapping::mapFeatures), +lbExtractor.andThen(mapping::mapLabels)); + +return new ModelsSequentialComposition<>(firstUpdated, secondUpdated); +} + +/** {@inheritDoc} */ +@Override protected boolean checkState(ModelsSequentialComposition mdl) { +// Never called. +throw new Illeg
[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training URL: https://github.com/apache/ignite/pull/5767#discussion_r247873164 ## File path: modules/ml/src/main/java/org/apache/ignite/ml/composition/CompositionUtils.java ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.ignite.ml.composition; + +import org.apache.ignite.ml.IgniteModel; +import org.apache.ignite.ml.dataset.DatasetBuilder; +import org.apache.ignite.ml.math.functions.IgniteBiFunction; +import org.apache.ignite.ml.math.primitives.vector.Vector; +import org.apache.ignite.ml.trainers.DatasetTrainer; + +/** + * Various utility functions for trainers composition. + */ +public class CompositionUtils { +/** + * Perform blurring of model type of given trainer to {@code IgniteModel}, where I, O are input and output + * types of original model. + * + * @param trainer Trainer to coerce. + * @param Type of input of model produced by coerced trainer. + * @param Type of output of model produced by coerced trainer. + * @param Type of model produced by coerced trainer. + * @param Type of labels. + * @return Trainer coerced to {@code DatasetTrainer, L>}. + */ +public static , L> DatasetTrainer, L> unsafeCoerce( +DatasetTrainer trainer) { +return new DatasetTrainer, L>() { +/** {@inheritDoc} */ +@Override public IgniteModel fit(DatasetBuilder datasetBuilder, +IgniteBiFunction featureExtractor, IgniteBiFunction lbExtractor) { +return trainer.fit(datasetBuilder, featureExtractor, lbExtractor); +} + +/** {@inheritDoc} */ +@Override public IgniteModel update(IgniteModel mdl, DatasetBuilder datasetBuilder, +IgniteBiFunction featureExtractor, IgniteBiFunction lbExtractor) { +DatasetTrainer, L> trainer1 = (DatasetTrainer, L>)trainer; +return trainer1.update(mdl, datasetBuilder, featureExtractor, lbExtractor); +} + +/** {@inheritDoc} */ +@Override protected boolean checkState(IgniteModel mdl) { +return true; +} + +/** {@inheritDoc} */ +@Override protected IgniteModel updateModel(IgniteModel mdl, DatasetBuilder datasetBuilder, +IgniteBiFunction featureExtractor, IgniteBiFunction lbExtractor) { +return null; Review comment: Done, see https://github.com/apache/ignite/pull/5767#discussion_r247873122 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training URL: https://github.com/apache/ignite/pull/5767#discussion_r247873122 ## File path: modules/ml/src/main/java/org/apache/ignite/ml/composition/CompositionUtils.java ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.ignite.ml.composition; + +import org.apache.ignite.ml.IgniteModel; +import org.apache.ignite.ml.dataset.DatasetBuilder; +import org.apache.ignite.ml.math.functions.IgniteBiFunction; +import org.apache.ignite.ml.math.primitives.vector.Vector; +import org.apache.ignite.ml.trainers.DatasetTrainer; + +/** + * Various utility functions for trainers composition. + */ +public class CompositionUtils { +/** + * Perform blurring of model type of given trainer to {@code IgniteModel}, where I, O are input and output + * types of original model. + * + * @param trainer Trainer to coerce. + * @param Type of input of model produced by coerced trainer. + * @param Type of output of model produced by coerced trainer. + * @param Type of model produced by coerced trainer. + * @param Type of labels. + * @return Trainer coerced to {@code DatasetTrainer, L>}. + */ +public static , L> DatasetTrainer, L> unsafeCoerce( +DatasetTrainer trainer) { +return new DatasetTrainer, L>() { +/** {@inheritDoc} */ +@Override public IgniteModel fit(DatasetBuilder datasetBuilder, +IgniteBiFunction featureExtractor, IgniteBiFunction lbExtractor) { +return trainer.fit(datasetBuilder, featureExtractor, lbExtractor); +} + +/** {@inheritDoc} */ +@Override public IgniteModel update(IgniteModel mdl, DatasetBuilder datasetBuilder, +IgniteBiFunction featureExtractor, IgniteBiFunction lbExtractor) { +DatasetTrainer, L> trainer1 = (DatasetTrainer, L>)trainer; +return trainer1.update(mdl, datasetBuilder, featureExtractor, lbExtractor); +} + +/** {@inheritDoc} */ +@Override protected boolean checkState(IgniteModel mdl) { +return true; Review comment: This method is never called. Now throwing exception to make it more clear. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training URL: https://github.com/apache/ignite/pull/5767#discussion_r247870918 ## File path: modules/ml/src/main/java/org/apache/ignite/ml/composition/combinators/parallel/ModelsParallelComposition.java ## @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.ignite.ml.composition.combinators.parallel; + +import java.util.ArrayList; +import java.util.List; +import java.util.stream.Collectors; +import org.apache.ignite.ml.IgniteModel; + +/** + * Parallel composition of models. + * Parallel composition of models is a model which contains a list of submodels with same input and output types. + * Result of prediction in such model is a list of predictions of each of submodels. + * + * @param Type of submodel input. + * @param Type of submodel output. + */ +public class ModelsParallelComposition implements IgniteModel> { +/** List of submodels. */ +private final List> submodels; + +/** + * Construc an instance of this class from list of submodels. + * + * @param submodels List of submodels constituting this model. + */ +public ModelsParallelComposition(List> submodels) { +this.submodels = submodels; +} + +/** {@inheritDoc} */ +@Override public List predict(I i) { +return submodels +.stream() +.map(m -> m.predict(i)) +.collect(Collectors.toList()); +} + +/** + * List of submodels constituting this model. + * + * @return List of submodels constituting this model. + */ +public List> submodels() { +return new ArrayList<>(submodels); Review comment: Yeah, agree. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training URL: https://github.com/apache/ignite/pull/5767#discussion_r247869073 ## File path: modules/ml/src/main/java/org/apache/ignite/ml/composition/bagging/BaggedTrainer.java ## @@ -0,0 +1,200 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.ignite.ml.composition.bagging; + +import java.util.Collections; +import java.util.List; +import java.util.Random; +import java.util.stream.Collectors; +import java.util.stream.IntStream; +import java.util.stream.Stream; +import org.apache.ignite.ml.IgniteModel; +import org.apache.ignite.ml.composition.CompositionUtils; +import org.apache.ignite.ml.composition.combinators.parallel.TrainersParallelComposition; +import org.apache.ignite.ml.composition.predictionsaggregator.PredictionsAggregator; +import org.apache.ignite.ml.dataset.DatasetBuilder; +import org.apache.ignite.ml.environment.LearningEnvironmentBuilder; +import org.apache.ignite.ml.math.functions.IgniteBiFunction; +import org.apache.ignite.ml.math.functions.IgniteFunction; +import org.apache.ignite.ml.math.primitives.vector.Vector; +import org.apache.ignite.ml.math.primitives.vector.VectorUtils; +import org.apache.ignite.ml.trainers.AdaptableDatasetTrainer; +import org.apache.ignite.ml.trainers.DatasetTrainer; +import org.apache.ignite.ml.trainers.transformers.BaggingUpstreamTransformer; +import org.apache.ignite.ml.util.Utils; + +/** + * Trainer encapsulating logic of bootstrap aggregating (bagging). + * This trainer accepts some other trainer and returns bagged version of it. + * Resulting model consists of submodels results of which are aggregated by a specified aggregator. + * Bagging is done + * on both samples and features (https://en.wikipedia.org/wiki/Bootstrap_aggregating";>Samples bagging, + * https://en.wikipedia.org/wiki/Random_subspace_method";>Features bagging). + * + * @param Type of model produced by trainer for which bagged version is created. + * @param Type of labels. + * @param Type of trainer for which bagged version is created. + */ +public class BaggedTrainer, L, T extends DatasetTrainer> extends +DatasetTrainer { +/** Trainer for which bagged version is created. */ +private final DatasetTrainer tr; + +/** Aggregator of submodels results. */ +private final PredictionsAggregator aggregator; + +/** Count of submodels in the ensemble. */ +private final int ensembleSize; + +/** Ratio determining which part of dataset will be taken as subsample for each submodel training. */ +private final double subsampleRatio; + +/** Dimensionality of feature vectors. */ +private final int featuresVectorSize; + +/** Dimension of subspace on which all samples from subsample are projected. */ +private final int featureSubspaceDim; + +/** + * Construct instance of this class with given parameters. + * + * @param tr Trainer for making bagged. + * @param aggregator Aggregator of models. + * @param ensembleSize Size of ensemble. + * @param subsampleRatio Ratio (subsample size) / (initial dataset size). + * @param featuresVectorSize Dimensionality of feature vector. + * @param featureSubspaceDim Dimensionality of feature subspace. + */ +public BaggedTrainer(DatasetTrainer tr, +PredictionsAggregator aggregator, int ensembleSize, double subsampleRatio, int featuresVectorSize, +int featureSubspaceDim) { +this.tr = tr; +this.aggregator = aggregator; +this.ensembleSize = ensembleSize; +this.subsampleRatio = subsampleRatio; +this.featuresVectorSize = featuresVectorSize; +this.featureSubspaceDim = featureSubspaceDim; +} + +/** + * Create trainer bagged trainer. + * + * @return Bagged trainer. + */ +private DatasetTrainer, L> getTrainer() { +List mappings = (featuresVectorSize > 0 && featureSubspaceDim != featuresVectorSize) ? +IntStream.range(0, ensembleSize).mapToObj( +modelIdx -> getMapping( +featuresVectorSize, +featureSubspaceDim, +
[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training URL: https://github.com/apache/ignite/pull/5767#discussion_r247868951 ## File path: modules/ml/src/main/java/org/apache/ignite/ml/composition/bagging/BaggedModel.java ## @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.ignite.ml.composition.bagging; + +import org.apache.ignite.ml.IgniteModel; +import org.apache.ignite.ml.math.primitives.vector.Vector; + +/** + * This class represents model produced by {@link BaggedTrainer}. + * It is a wrapper around inner representation of model produced by {@link BaggedTrainer}. + */ +public class BaggedModel implements IgniteModel { Review comment: Yes, we could do that, but after I decided to drop fully type-safe Bagged models because of heavy-looking generics, I decided at least do some type-safety and make this wrapper. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training URL: https://github.com/apache/ignite/pull/5767#discussion_r247868450 ## File path: modules/ml/src/main/java/org/apache/ignite/ml/composition/DatasetMapping.java ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.ignite.ml.composition; + +import org.apache.ignite.ml.math.functions.IgniteFunction; +import org.apache.ignite.ml.math.primitives.vector.Vector; + +/** + * This class represents dataset mapping. This is just a tuple of two mappings: one for features and one for labels. + * + * @param Type of labels before mapping. + * @param Type of labels after mapping. + */ +public interface DatasetMapping { +/** + * Method used to map feature vectors. + * + * @param v Feature vector. + * @return Mapped feature vector. + */ +public default Vector mapFeatures(Vector v) { Review comment: Because there is no sensible default mapping `L1 -> L2`, but for `Vector -> Vector` there is `id`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services