[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training

2019-01-15 Thread GitBox
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: 
Consistent API for Ensemble training
URL: https://github.com/apache/ignite/pull/5767#discussion_r247903540
 
 

 ##
 File path: 
modules/ml/src/main/java/org/apache/ignite/ml/trainers/AdaptableDatasetTrainer.java
 ##
 @@ -56,27 +68,46 @@
  * @param  Type of labels.
  * @return Instance of this class.
  */
-public static , L> 
AdaptableDatasetTrainer of(DatasetTrainer wrapped) {
-return new AdaptableDatasetTrainer<>(IgniteFunction.identity(), 
wrapped, IgniteFunction.identity());
+public static , L> 
AdaptableDatasetTrainer of(
 
 Review comment:
   Sorry...)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training

2019-01-15 Thread GitBox
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: 
Consistent API for Ensemble training
URL: https://github.com/apache/ignite/pull/5767#discussion_r247901935
 
 

 ##
 File path: 
modules/ml/src/main/java/org/apache/ignite/ml/composition/bagging/BaggedTrainer.java
 ##
 @@ -0,0 +1,200 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.ml.composition.bagging;
+
+import java.util.Collections;
+import java.util.List;
+import java.util.Random;
+import java.util.stream.Collectors;
+import java.util.stream.IntStream;
+import java.util.stream.Stream;
+import org.apache.ignite.ml.IgniteModel;
+import org.apache.ignite.ml.composition.CompositionUtils;
+import 
org.apache.ignite.ml.composition.combinators.parallel.TrainersParallelComposition;
+import 
org.apache.ignite.ml.composition.predictionsaggregator.PredictionsAggregator;
+import org.apache.ignite.ml.dataset.DatasetBuilder;
+import org.apache.ignite.ml.environment.LearningEnvironmentBuilder;
+import org.apache.ignite.ml.math.functions.IgniteBiFunction;
+import org.apache.ignite.ml.math.functions.IgniteFunction;
+import org.apache.ignite.ml.math.primitives.vector.Vector;
+import org.apache.ignite.ml.math.primitives.vector.VectorUtils;
+import org.apache.ignite.ml.trainers.AdaptableDatasetTrainer;
+import org.apache.ignite.ml.trainers.DatasetTrainer;
+import org.apache.ignite.ml.trainers.transformers.BaggingUpstreamTransformer;
+import org.apache.ignite.ml.util.Utils;
+
+/**
+ * Trainer encapsulating logic of bootstrap aggregating (bagging).
+ * This trainer accepts some other trainer and returns bagged version of it.
+ * Resulting model consists of submodels results of which are aggregated by a 
specified aggregator.
+ * Bagging is done
+ * on both samples and features (https://en.wikipedia.org/wiki/Bootstrap_aggregating";>Samples 
bagging,
+ * https://en.wikipedia.org/wiki/Random_subspace_method";>Features 
bagging).
+ *
+ * @param  Type of model produced by trainer for which bagged version is 
created.
+ * @param  Type of labels.
+ * @param  Type of trainer for which bagged version is created.
+ */
+public class BaggedTrainer, L, T extends 
DatasetTrainer> extends
+DatasetTrainer {
+/** Trainer for which bagged version is created. */
+private final DatasetTrainer tr;
+
+/** Aggregator of submodels results. */
+private final PredictionsAggregator aggregator;
+
+/** Count of submodels in the ensemble. */
+private final int ensembleSize;
+
+/** Ratio determining which part of dataset will be taken as subsample for 
each submodel training. */
+private final double subsampleRatio;
+
+/** Dimensionality of feature vectors. */
+private final int featuresVectorSize;
+
+/** Dimension of subspace on which all samples from subsample are 
projected. */
+private final int featureSubspaceDim;
+
+/**
+ * Construct instance of this class with given parameters.
+ *
+ * @param tr Trainer for making bagged.
+ * @param aggregator Aggregator of models.
+ * @param ensembleSize Size of ensemble.
+ * @param subsampleRatio Ratio (subsample size) / (initial dataset size).
+ * @param featuresVectorSize Dimensionality of feature vector.
+ * @param featureSubspaceDim Dimensionality of feature subspace.
+ */
+public BaggedTrainer(DatasetTrainer tr,
+PredictionsAggregator aggregator, int ensembleSize, double 
subsampleRatio, int featuresVectorSize,
+int featureSubspaceDim) {
+this.tr = tr;
+this.aggregator = aggregator;
+this.ensembleSize = ensembleSize;
+this.subsampleRatio = subsampleRatio;
+this.featuresVectorSize = featuresVectorSize;
+this.featureSubspaceDim = featureSubspaceDim;
+}
+
+/**
+ * Create trainer bagged trainer.
+ *
+ * @return Bagged trainer.
+ */
+private DatasetTrainer, L> getTrainer() {
+List mappings = (featuresVectorSize > 0 && featureSubspaceDim 
!= featuresVectorSize) ?
+IntStream.range(0, ensembleSize).mapToObj(
+modelIdx -> getMapping(
+featuresVectorSize,
+featureSubspaceDim,
+ 

[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training

2019-01-15 Thread GitBox
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: 
Consistent API for Ensemble training
URL: https://github.com/apache/ignite/pull/5767#discussion_r247901728
 
 

 ##
 File path: 
modules/ml/src/main/java/org/apache/ignite/ml/composition/stacking/StackedDatasetTrainer.java
 ##
 @@ -254,62 +233,23 @@ public StackedDatasetTrainer() {
 IgniteBiFunction featureExtractor,
 IgniteBiFunction lbExtractor) {
 
-return update(null, datasetBuilder, featureExtractor, lbExtractor);
+return new StackedModel<>(getTrainer().fit(datasetBuilder, 
featureExtractor, lbExtractor));
 }
 
 /** {@inheritDoc} */
 @Override public  StackedModel 
update(StackedModel mdl,
 DatasetBuilder datasetBuilder, IgniteBiFunction 
featureExtractor,
 IgniteBiFunction lbExtractor) {
-return runOnSubmodels(
-ensemble -> {
-List>> res = new 
ArrayList<>();
-for (int i = 0; i < ensemble.size(); i++) {
-final int j = i;
-res.add(() -> {
-DatasetTrainer, L> trainer = 
ensemble.get(j);
-return mdl == null ?
-trainer.fit(datasetBuilder, featureExtractor, 
lbExtractor) :
-trainer.update(mdl.submodels().get(j), 
datasetBuilder, featureExtractor, lbExtractor);
-});
-}
-return res;
-},
-(at, extr) -> mdl == null ?
-at.fit(datasetBuilder, extr, lbExtractor) :
-at.update(mdl.aggregatorModel(), datasetBuilder, extr, 
lbExtractor),
-featureExtractor
-);
-}
 
-/** {@inheritDoc} */
-@Override public StackedDatasetTrainer 
withEnvironmentBuilder(
-LearningEnvironmentBuilder envBuilder) {
-submodelsTrainers =
-submodelsTrainers.stream().map(x -> 
x.withEnvironmentBuilder(envBuilder)).collect(Collectors.toList());
-aggregatorTrainer = 
aggregatorTrainer.withEnvironmentBuilder(envBuilder);
-
-return this;
+return new StackedModel<>(getTrainer().update(mdl, datasetBuilder, 
featureExtractor, lbExtractor));
 }
 
 /**
- * 
- * 1. Obtain models produced by running specified tasks;
- * 2. run other specified task on dataset augmented with results of models 
from step 2.
- * 
+ * Get the trainer for stacking.
  *
- * @param taskSupplier Function used to generate tasks for first step.
- * @param aggregatorProcessor Function used
- * @param featureExtractor Feature extractor.
- * @param  Type of keys in upstream.
- * @param  Type of values in upstream.
- * @return {@link StackedModel}.
+ * @return Trainer for stacking.
  */
-private  StackedModel runOnSubmodels(
-IgniteFunction, L>>, 
List>>> taskSupplier,
-IgniteBiFunction, IgniteBiFunction, AM> aggregatorProcessor,
-IgniteBiFunction featureExtractor) {
-
+private DatasetTrainer, L> getTrainer() {
 
 Review comment:
   Separated consistency checking into a separate method.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training

2019-01-15 Thread GitBox
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: 
Consistent API for Ensemble training
URL: https://github.com/apache/ignite/pull/5767#discussion_r247890231
 
 

 ##
 File path: 
modules/ml/src/main/java/org/apache/ignite/ml/composition/combinators/parallel/TrainersParallelComposition.java
 ##
 @@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.ml.composition.combinators.parallel;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.stream.Collectors;
+import org.apache.ignite.ml.IgniteModel;
+import org.apache.ignite.ml.composition.CompositionUtils;
+import org.apache.ignite.ml.dataset.DatasetBuilder;
+import org.apache.ignite.ml.environment.parallelism.Promise;
+import org.apache.ignite.ml.math.functions.IgniteBiFunction;
+import org.apache.ignite.ml.math.functions.IgniteSupplier;
+import org.apache.ignite.ml.math.primitives.vector.Vector;
+import org.apache.ignite.ml.trainers.DatasetTrainer;
+
+/**
+ * This class represents a parallel composition of trainers.
+ * Parallel composition of trainers is a trainer itself which trains a list of 
trainers with same
+ * input and output. Training is done in following manner:
+ * 
+ * 1. Independently train all trainers on the same dataset and get a list 
of models.
+ * 2. Combine models produced in step (1) into a {@link 
ModelsParallelComposition}.
+ * 
+ * Updating is made in a similar fashion.
+ * Like in other trainers combinators we avoid to include type of contained 
trainers in type parameters
+ * because otherwise compositions of compositions would have a relatively 
complex generic type which will
+ * reduce readability.
+ *
+ * @param  Type of trainers inputs.
+ * @param  Type of trainers outputs.
+ * @param  Type of dataset labels.
+ */
+public class TrainersParallelComposition extends 
DatasetTrainer>, L> {
+/** List of trainers. */
+private final List, L>> trainers;
+
+/**
+ * Construct an instance of this class from a list of trainers.
+ *
+ * @param trainers Trainers.
+ * @param  Type of mode
+ * @param 
 
 Review comment:
   Fixed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training

2019-01-15 Thread GitBox
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: 
Consistent API for Ensemble training
URL: https://github.com/apache/ignite/pull/5767#discussion_r247890198
 
 

 ##
 File path: 
modules/ml/src/main/java/org/apache/ignite/ml/composition/combinators/parallel/TrainersParallelComposition.java
 ##
 @@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.ml.composition.combinators.parallel;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.stream.Collectors;
+import org.apache.ignite.ml.IgniteModel;
+import org.apache.ignite.ml.composition.CompositionUtils;
+import org.apache.ignite.ml.dataset.DatasetBuilder;
+import org.apache.ignite.ml.environment.parallelism.Promise;
+import org.apache.ignite.ml.math.functions.IgniteBiFunction;
+import org.apache.ignite.ml.math.functions.IgniteSupplier;
+import org.apache.ignite.ml.math.primitives.vector.Vector;
+import org.apache.ignite.ml.trainers.DatasetTrainer;
+
+/**
+ * This class represents a parallel composition of trainers.
+ * Parallel composition of trainers is a trainer itself which trains a list of 
trainers with same
+ * input and output. Training is done in following manner:
+ * 
+ * 1. Independently train all trainers on the same dataset and get a list 
of models.
+ * 2. Combine models produced in step (1) into a {@link 
ModelsParallelComposition}.
+ * 
+ * Updating is made in a similar fashion.
+ * Like in other trainers combinators we avoid to include type of contained 
trainers in type parameters
+ * because otherwise compositions of compositions would have a relatively 
complex generic type which will
+ * reduce readability.
+ *
+ * @param  Type of trainers inputs.
+ * @param  Type of trainers outputs.
+ * @param  Type of dataset labels.
+ */
+public class TrainersParallelComposition extends 
DatasetTrainer>, L> {
+/** List of trainers. */
+private final List, L>> trainers;
+
+/**
+ * Construct an instance of this class from a list of trainers.
+ *
+ * @param trainers Trainers.
+ * @param  Type of mode
+ * @param 
+ */
+public , T extends DatasetTrainer, L>> TrainersParallelComposition(
+List trainers) {
+this.trainers = 
trainers.stream().map(CompositionUtils::unsafeCoerce).collect(Collectors.toList());
+}
+
+public static , T extends 
DatasetTrainer, L> TrainersParallelComposition of(List 
trainers) {
+List, L>> trs =
+
trainers.stream().map(CompositionUtils::unsafeCoerce).collect(Collectors.toList());
+
+return new TrainersParallelComposition<>(trs);
+}
+
+/** {@inheritDoc} */
+@Override public  IgniteModel> fit(DatasetBuilder 
datasetBuilder,
+IgniteBiFunction featureExtractor, IgniteBiFunction lbExtractor) {
+List>> tasks = trainers.stream()
+.map(tr -> (IgniteSupplier>)(() -> 
tr.fit(datasetBuilder, featureExtractor, lbExtractor)))
+.collect(Collectors.toList());
+
+List> mdls = 
environment.parallelismStrategy().submit(tasks).stream()
+.map(Promise::unsafeGet)
+.collect(Collectors.toList());
+
+return new ModelsParallelComposition<>(mdls);
+}
+
+/** {@inheritDoc} */
+@Override public  IgniteModel> update(IgniteModel> mdl, DatasetBuilder datasetBuilder,
+IgniteBiFunction featureExtractor, IgniteBiFunction lbExtractor) {
+// Unsafe.
+ModelsParallelComposition typedMdl = 
(ModelsParallelComposition)mdl;
+
+assert typedMdl.submodels().size() == trainers.size();
+List> mdls = new ArrayList<>();
+
+for (int i = 0; i < trainers.size(); i++)
+mdls.add(trainers.get(i).update(typedMdl.submodels().get(i), 
datasetBuilder, featureExtractor, lbExtractor));
+
+return new ModelsParallelComposition<>(mdls);
+}
+
+/** {@inheritDoc} */
+@Override protected boolean checkState(IgniteModel> mdl) {
 
 Review comment:
   Fixed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the sp

[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training

2019-01-15 Thread GitBox
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: 
Consistent API for Ensemble training
URL: https://github.com/apache/ignite/pull/5767#discussion_r247889971
 
 

 ##
 File path: 
modules/ml/src/main/java/org/apache/ignite/ml/dataset/UpstreamTransformer.java
 ##
 @@ -22,29 +22,15 @@
 
 /**
  * Interface of transformer of upstream.
- *
- * @param  Type of keys in the upstream.
- * @param  Type of values in the upstream.
  */
 // TODO: IGNITE-10297: Investigate possibility of API change.
 @FunctionalInterface
-public interface UpstreamTransformer extends Serializable {
+public interface UpstreamTransformer extends Serializable {
 
 Review comment:
   We want to take emphasis that `UpstreamTransformer` is not for chenging of 
contents of upstream, but only for the change of the form.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training

2019-01-15 Thread GitBox
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: 
Consistent API for Ensemble training
URL: https://github.com/apache/ignite/pull/5767#discussion_r247890064
 
 

 ##
 File path: 
modules/ml/src/main/java/org/apache/ignite/ml/composition/stacking/StackedDatasetTrainer.java
 ##
 @@ -402,11 +346,12 @@ public StackedDatasetTrainer() {
 IgniteBiFunction featureExtractor,
 IgniteBiFunction lbExtractor) {
 // This method is never called, we override "update" instead.
-return null;
+throw new IllegalStateException();
 }
 
 /** {@inheritDoc} */
 @Override protected boolean checkState(StackedModel mdl) {
-return true;
+// Should be never called.
+throw new IllegalStateException();
 
 Review comment:
   Fixed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training

2019-01-15 Thread GitBox
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: 
Consistent API for Ensemble training
URL: https://github.com/apache/ignite/pull/5767#discussion_r247890183
 
 

 ##
 File path: 
modules/ml/src/main/java/org/apache/ignite/ml/composition/combinators/parallel/TrainersParallelComposition.java
 ##
 @@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.ml.composition.combinators.parallel;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.stream.Collectors;
+import org.apache.ignite.ml.IgniteModel;
+import org.apache.ignite.ml.composition.CompositionUtils;
+import org.apache.ignite.ml.dataset.DatasetBuilder;
+import org.apache.ignite.ml.environment.parallelism.Promise;
+import org.apache.ignite.ml.math.functions.IgniteBiFunction;
+import org.apache.ignite.ml.math.functions.IgniteSupplier;
+import org.apache.ignite.ml.math.primitives.vector.Vector;
+import org.apache.ignite.ml.trainers.DatasetTrainer;
+
+/**
+ * This class represents a parallel composition of trainers.
+ * Parallel composition of trainers is a trainer itself which trains a list of 
trainers with same
+ * input and output. Training is done in following manner:
+ * 
+ * 1. Independently train all trainers on the same dataset and get a list 
of models.
+ * 2. Combine models produced in step (1) into a {@link 
ModelsParallelComposition}.
+ * 
+ * Updating is made in a similar fashion.
+ * Like in other trainers combinators we avoid to include type of contained 
trainers in type parameters
+ * because otherwise compositions of compositions would have a relatively 
complex generic type which will
+ * reduce readability.
+ *
+ * @param  Type of trainers inputs.
+ * @param  Type of trainers outputs.
+ * @param  Type of dataset labels.
+ */
+public class TrainersParallelComposition extends 
DatasetTrainer>, L> {
+/** List of trainers. */
+private final List, L>> trainers;
+
+/**
+ * Construct an instance of this class from a list of trainers.
+ *
+ * @param trainers Trainers.
+ * @param  Type of mode
+ * @param 
+ */
+public , T extends DatasetTrainer, L>> TrainersParallelComposition(
+List trainers) {
+this.trainers = 
trainers.stream().map(CompositionUtils::unsafeCoerce).collect(Collectors.toList());
+}
+
+public static , T extends 
DatasetTrainer, L> TrainersParallelComposition of(List 
trainers) {
+List, L>> trs =
+
trainers.stream().map(CompositionUtils::unsafeCoerce).collect(Collectors.toList());
+
+return new TrainersParallelComposition<>(trs);
+}
+
+/** {@inheritDoc} */
+@Override public  IgniteModel> fit(DatasetBuilder 
datasetBuilder,
+IgniteBiFunction featureExtractor, IgniteBiFunction lbExtractor) {
+List>> tasks = trainers.stream()
+.map(tr -> (IgniteSupplier>)(() -> 
tr.fit(datasetBuilder, featureExtractor, lbExtractor)))
+.collect(Collectors.toList());
+
+List> mdls = 
environment.parallelismStrategy().submit(tasks).stream()
+.map(Promise::unsafeGet)
+.collect(Collectors.toList());
+
+return new ModelsParallelComposition<>(mdls);
+}
+
+/** {@inheritDoc} */
+@Override public  IgniteModel> update(IgniteModel> mdl, DatasetBuilder datasetBuilder,
+IgniteBiFunction featureExtractor, IgniteBiFunction lbExtractor) {
+// Unsafe.
+ModelsParallelComposition typedMdl = 
(ModelsParallelComposition)mdl;
+
+assert typedMdl.submodels().size() == trainers.size();
+List> mdls = new ArrayList<>();
+
+for (int i = 0; i < trainers.size(); i++)
+mdls.add(trainers.get(i).update(typedMdl.submodels().get(i), 
datasetBuilder, featureExtractor, lbExtractor));
+
+return new ModelsParallelComposition<>(mdls);
+}
+
+/** {@inheritDoc} */
+@Override protected boolean checkState(IgniteModel> mdl) {
+// Never called.
+throw new IllegalStateException();
+}
+
+/** {@inheritDoc} */
 
 Review comment:
   Fixed.


This is an automated message from the 

[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training

2019-01-15 Thread GitBox
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: 
Consistent API for Ensemble training
URL: https://github.com/apache/ignite/pull/5767#discussion_r247890213
 
 

 ##
 File path: 
modules/ml/src/main/java/org/apache/ignite/ml/composition/combinators/parallel/TrainersParallelComposition.java
 ##
 @@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.ml.composition.combinators.parallel;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.stream.Collectors;
+import org.apache.ignite.ml.IgniteModel;
+import org.apache.ignite.ml.composition.CompositionUtils;
+import org.apache.ignite.ml.dataset.DatasetBuilder;
+import org.apache.ignite.ml.environment.parallelism.Promise;
+import org.apache.ignite.ml.math.functions.IgniteBiFunction;
+import org.apache.ignite.ml.math.functions.IgniteSupplier;
+import org.apache.ignite.ml.math.primitives.vector.Vector;
+import org.apache.ignite.ml.trainers.DatasetTrainer;
+
+/**
+ * This class represents a parallel composition of trainers.
+ * Parallel composition of trainers is a trainer itself which trains a list of 
trainers with same
+ * input and output. Training is done in following manner:
+ * 
+ * 1. Independently train all trainers on the same dataset and get a list 
of models.
+ * 2. Combine models produced in step (1) into a {@link 
ModelsParallelComposition}.
+ * 
+ * Updating is made in a similar fashion.
+ * Like in other trainers combinators we avoid to include type of contained 
trainers in type parameters
+ * because otherwise compositions of compositions would have a relatively 
complex generic type which will
+ * reduce readability.
+ *
+ * @param  Type of trainers inputs.
+ * @param  Type of trainers outputs.
+ * @param  Type of dataset labels.
+ */
+public class TrainersParallelComposition extends 
DatasetTrainer>, L> {
+/** List of trainers. */
+private final List, L>> trainers;
+
+/**
+ * Construct an instance of this class from a list of trainers.
+ *
+ * @param trainers Trainers.
+ * @param  Type of mode
+ * @param 
+ */
+public , T extends DatasetTrainer, L>> TrainersParallelComposition(
+List trainers) {
+this.trainers = 
trainers.stream().map(CompositionUtils::unsafeCoerce).collect(Collectors.toList());
+}
+
+public static , T extends 
DatasetTrainer, L> TrainersParallelComposition of(List 
trainers) {
 
 Review comment:
   Fixed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training

2019-01-15 Thread GitBox
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: 
Consistent API for Ensemble training
URL: https://github.com/apache/ignite/pull/5767#discussion_r247890133
 
 

 ##
 File path: 
modules/ml/src/main/java/org/apache/ignite/ml/composition/combinators/sequential/TrainersSequentialComposition.java
 ##
 @@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.ml.composition.combinators.sequential;
+
+import org.apache.ignite.ml.IgniteModel;
+import org.apache.ignite.ml.composition.CompositionUtils;
+import org.apache.ignite.ml.composition.DatasetMapping;
+import org.apache.ignite.ml.dataset.DatasetBuilder;
+import org.apache.ignite.ml.math.functions.IgniteBiFunction;
+import org.apache.ignite.ml.math.functions.IgniteFunction;
+import org.apache.ignite.ml.math.primitives.vector.Vector;
+import org.apache.ignite.ml.math.primitives.vector.VectorUtils;
+import org.apache.ignite.ml.trainers.DatasetTrainer;
+import sun.reflect.generics.reflectiveObjects.NotImplementedException;
+
+/**
+ * Sequential composition of trainers.
+ * Sequential composition of trainers is itself trainer which produces {@link 
ModelsSequentialComposition}.
+ * Training is done in following fashion:
+ * 
+ * 1. First trainer is trained and `mdl1` is produced.
+ * 2. From `mdl1` {@link DatasetMapping} is constructed. This mapping 
`dsM` encapsulates dependency between first
+ * training result and second trainer.
+ * 3. Second trainer is trained using dataset aquired from application 
`dsM` to original dataset; `mdl2` is produced.
+ * 4. `mdl1` and `mdl2` are composed into {@link 
ModelsSequentialComposition}.
+ * 
+ *
+ * @param  Type of input of model produced by first trainer.
+ * @param  Type of output of model produced by first trainer.
+ * @param  Type of output of model produced by second trainer.
+ * @param  Type of labels.
+ */
+public class TrainersSequentialComposition extends 
DatasetTrainer, L> {
+/** First trainer. */
+private DatasetTrainer, L> tr1;
+
+/** Second trainer. */
+private DatasetTrainer, L> tr2;
+
+/** Dataset mapping. */
+private IgniteFunction, DatasetMapping> 
datasetMapping;
+
+/**
+ * Construct sequential composition of given two trainers.
+ *
+ * @param tr1 First trainer.
+ * @param tr2 Second trainer.
+ * @param datasetMapping Dataset mapping.
+ */
+public TrainersSequentialComposition(DatasetTrainer, L> tr1,
+DatasetTrainer, L> tr2,
+IgniteFunction, DatasetMapping> 
datasetMapping) {
+this.tr1 = CompositionUtils.unsafeCoerce(tr1);
+this.tr2 = CompositionUtils.unsafeCoerce(tr2);
+this.datasetMapping = datasetMapping;
+}
+
+/** {@inheritDoc} */
+@Override public  ModelsSequentialComposition 
fit(DatasetBuilder datasetBuilder,
+IgniteBiFunction featureExtractor, IgniteBiFunction lbExtractor) {
+
+IgniteModel mdl1 = tr1.fit(datasetBuilder, featureExtractor, 
lbExtractor);
+DatasetMapping mapping = datasetMapping.apply(mdl1);
+
+IgniteModel mdl2 = tr2.fit(datasetBuilder,
+featureExtractor.andThen(mapping::mapFeatures),
+lbExtractor.andThen(mapping::mapLabels));
+
+return new ModelsSequentialComposition<>(mdl1, mdl2);
+}
+
+/** {@inheritDoc} */
+@Override public  ModelsSequentialComposition update(
+ModelsSequentialComposition mdl, DatasetBuilder 
datasetBuilder,
+IgniteBiFunction featureExtractor, IgniteBiFunction lbExtractor) {
+
+IgniteModel firstUpdated = tr1.update(mdl.firstModel(), 
datasetBuilder, featureExtractor, lbExtractor);
+DatasetMapping mapping = datasetMapping.apply(firstUpdated);
+
+IgniteModel secondUpdated = tr2.update(mdl.secondModel(),
+datasetBuilder,
+featureExtractor.andThen(mapping::mapFeatures),
+lbExtractor.andThen(mapping::mapLabels));
+
+return new ModelsSequentialComposition<>(firstUpdated, secondUpdated);
+}
+
+/** {@inheritDoc} */
+@Override protected boolean checkState(ModelsSequentialComposition mdl) {
+// Never called.
+throw new Illeg

[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training

2019-01-15 Thread GitBox
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: 
Consistent API for Ensemble training
URL: https://github.com/apache/ignite/pull/5767#discussion_r247889971
 
 

 ##
 File path: 
modules/ml/src/main/java/org/apache/ignite/ml/dataset/UpstreamTransformer.java
 ##
 @@ -22,29 +22,15 @@
 
 /**
  * Interface of transformer of upstream.
- *
- * @param  Type of keys in the upstream.
- * @param  Type of values in the upstream.
  */
 // TODO: IGNITE-10297: Investigate possibility of API change.
 @FunctionalInterface
-public interface UpstreamTransformer extends Serializable {
+public interface UpstreamTransformer extends Serializable {
 
 Review comment:
   We want to take emphasis that `UpstreamTransformer` is not for changing of 
contents of upstream, but only for the change of the form.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training

2019-01-15 Thread GitBox
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: 
Consistent API for Ensemble training
URL: https://github.com/apache/ignite/pull/5767#discussion_r247890081
 
 

 ##
 File path: 
modules/ml/src/main/java/org/apache/ignite/ml/composition/stacking/StackedDatasetTrainer.java
 ##
 @@ -402,11 +346,12 @@ public StackedDatasetTrainer() {
 IgniteBiFunction featureExtractor,
 IgniteBiFunction lbExtractor) {
 // This method is never called, we override "update" instead.
-return null;
+throw new IllegalStateException();
 
 Review comment:
   Fixed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training

2019-01-15 Thread GitBox
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: 
Consistent API for Ensemble training
URL: https://github.com/apache/ignite/pull/5767#discussion_r247890104
 
 

 ##
 File path: 
modules/ml/src/main/java/org/apache/ignite/ml/composition/stacking/StackedDatasetTrainer.java
 ##
 @@ -322,59 +262,63 @@ public StackedDatasetTrainer() {
 if (aggregatingInputMerger == null)
 throw new IllegalStateException("Binary operator used to convert 
outputs of submodels is not specified");
 
-List>> mdlSuppliers = 
taskSupplier.apply(submodelsTrainers);
+List, L>> subs = new ArrayList<>();
+if (submodelInput2AggregatingInputConverter != null) {
+DatasetTrainer, L> id = 
DatasetTrainer.identityTrainer();
+DatasetTrainer, L> mappedId = 
CompositionUtils.unsafeCoerce(
+
AdaptableDatasetTrainer.of(id).afterTrainedModel(submodelInput2AggregatingInputConverter));
+subs.add(mappedId);
+}
 
-List> subMdls = 
environment.parallelismStrategy().submit(mdlSuppliers).stream()
-.map(Promise::unsafeGet)
-.collect(Collectors.toList());
+subs.addAll(submodelsTrainers);
 
-// Add new columns consisting in submodels output in features.
-IgniteBiFunction augmentedExtractor = 
getFeatureExtractorForAggregator(featureExtractor,
-subMdls,
-submodelInput2AggregatingInputConverter,
+TrainersParallelComposition composition = new 
TrainersParallelComposition<>(subs);
+
+IgniteBiFunction>, Vector, Vector> 
featureMapper = getFeatureExtractorForAggregator(
 submodelOutput2VectorConverter,
 vector2SubmodelInputConverter);
 
-AM aggregator = aggregatorProcessor.apply(aggregatorTrainer, 
augmentedExtractor);
+return AdaptableDatasetTrainer
+.of(composition)
+.afterTrainedModel(lst -> 
lst.stream().reduce(aggregatingInputMerger).get())
+.andThen(aggregatorTrainer, model -> new DatasetMapping() {
+@Override public Vector mapFeatures(Vector v) {
+List> models = 
((ModelsParallelComposition)model.innerModel()).submodels();
+return featureMapper.apply(models, v);
+}
 
-StackedModel res = new StackedModel<>(
-aggregator,
-aggregatingInputMerger,
-submodelInput2AggregatingInputConverter);
+@Override public L mapLabels(L lbl) {
+return lbl;
+}
+}).unsafeSimplyTyped();
+}
 
-for (IgniteModel subMdl : subMdls)
-res.addSubmodel(subMdl);
+/** {@inheritDoc} */
+@Override public StackedDatasetTrainer 
withEnvironmentBuilder(
+LearningEnvironmentBuilder envBuilder) {
+submodelsTrainers =
+submodelsTrainers.stream().map(x -> 
x.withEnvironmentBuilder(envBuilder)).collect(Collectors.toList());
+aggregatorTrainer = 
aggregatorTrainer.withEnvironmentBuilder(envBuilder);
 
-return res;
+return this;
 }
 
 /**
  * Get feature extractor which will be used for aggregator trainer from 
original feature extractor.
  * This method is static to make sure that we will not grab context of 
instance in serialization.
  *
- * @param featureExtractor Original feature extractor.
- * @param subMdls Submodels.
  * @param  Type of upstream keys.
  * @param  Type of upstream values.
  * @return Feature extractor which will be used for aggregator trainer 
from original feature extractor.
  */
-private static  IgniteBiFunction 
getFeatureExtractorForAggregator(
-IgniteBiFunction featureExtractor, List> subMdls,
-IgniteFunction submodelInput2AggregatingInputConverter,
+private static  IgniteBiFunction>, 
Vector, Vector> getFeatureExtractorForAggregator(
 
 Review comment:
   Fixed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training

2019-01-15 Thread GitBox
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: 
Consistent API for Ensemble training
URL: https://github.com/apache/ignite/pull/5767#discussion_r247885943
 
 

 ##
 File path: 
modules/ml/src/main/java/org/apache/ignite/ml/composition/combinators/parallel/TrainersParallelComposition.java
 ##
 @@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.ml.composition.combinators.parallel;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.stream.Collectors;
+import org.apache.ignite.ml.IgniteModel;
+import org.apache.ignite.ml.composition.CompositionUtils;
+import org.apache.ignite.ml.dataset.DatasetBuilder;
+import org.apache.ignite.ml.environment.parallelism.Promise;
+import org.apache.ignite.ml.math.functions.IgniteBiFunction;
+import org.apache.ignite.ml.math.functions.IgniteSupplier;
+import org.apache.ignite.ml.math.primitives.vector.Vector;
+import org.apache.ignite.ml.trainers.DatasetTrainer;
+
+/**
+ * This class represents a parallel composition of trainers.
+ * Parallel composition of trainers is a trainer itself which trains a list of 
trainers with same
+ * input and output. Training is done in following manner:
+ * 
+ * 1. Independently train all trainers on the same dataset and get a list 
of models.
+ * 2. Combine models produced in step (1) into a {@link 
ModelsParallelComposition}.
+ * 
+ * Updating is made in a similar fashion.
+ * Like in other trainers combinators we avoid to include type of contained 
trainers in type parameters
+ * because otherwise compositions of compositions would have a relatively 
complex generic type which will
+ * reduce readability.
+ *
+ * @param  Type of trainers inputs.
+ * @param  Type of trainers outputs.
+ * @param  Type of dataset labels.
+ */
+public class TrainersParallelComposition extends 
DatasetTrainer>, L> {
+/** List of trainers. */
+private final List, L>> trainers;
+
+/**
+ * Construct an instance of this class from a list of trainers.
+ *
+ * @param trainers Trainers.
+ * @param  Type of mode
+ * @param 
+ */
+public , T extends DatasetTrainer, L>> TrainersParallelComposition(
+List trainers) {
+this.trainers = 
trainers.stream().map(CompositionUtils::unsafeCoerce).collect(Collectors.toList());
+}
+
+public static , T extends 
DatasetTrainer, L> TrainersParallelComposition of(List 
trainers) {
+List, L>> trs =
+
trainers.stream().map(CompositionUtils::unsafeCoerce).collect(Collectors.toList());
+
+return new TrainersParallelComposition<>(trs);
+}
+
+/** {@inheritDoc} */
+@Override public  IgniteModel> fit(DatasetBuilder 
datasetBuilder,
+IgniteBiFunction featureExtractor, IgniteBiFunction lbExtractor) {
+List>> tasks = trainers.stream()
+.map(tr -> (IgniteSupplier>)(() -> 
tr.fit(datasetBuilder, featureExtractor, lbExtractor)))
+.collect(Collectors.toList());
+
+List> mdls = 
environment.parallelismStrategy().submit(tasks).stream()
+.map(Promise::unsafeGet)
+.collect(Collectors.toList());
+
+return new ModelsParallelComposition<>(mdls);
+}
+
+/** {@inheritDoc} */
+@Override public  IgniteModel> update(IgniteModel> mdl, DatasetBuilder datasetBuilder,
+IgniteBiFunction featureExtractor, IgniteBiFunction lbExtractor) {
+// Unsafe.
+ModelsParallelComposition typedMdl = 
(ModelsParallelComposition)mdl;
+
+assert typedMdl.submodels().size() == trainers.size();
+List> mdls = new ArrayList<>();
+
+for (int i = 0; i < trainers.size(); i++)
+mdls.add(trainers.get(i).update(typedMdl.submodels().get(i), 
datasetBuilder, featureExtractor, lbExtractor));
 
 Review comment:
   Thanks, nice catch, done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Ser

[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training

2019-01-15 Thread GitBox
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: 
Consistent API for Ensemble training
URL: https://github.com/apache/ignite/pull/5767#discussion_r247885961
 
 

 ##
 File path: modules/ml/src/main/java/org/apache/ignite/ml/util/Utils.java
 ##
 @@ -130,4 +132,50 @@
 Spliterators.spliteratorUnknownSize(iter, Spliterator.ORDERED),
 false);
 }
+
+/**
+ * Zips two streams (in functional sense of zipping) i.e. returns stream 
consisting
+ * of results of applying zipper to corresponding entries of two stream.
+ *
+ * @param a First stream.
+ * @param b Second stream.
+ * @param zipper Bi-function combining two streams.
+ * @param  Type of first stream entries.
+ * @param  Type of secong stream entries.
+ * @param  Type of zipper output.
+ * @return Two streams zipped together.
+ */
+public static Stream zip(Stream a,
 
 Review comment:
   Removed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training

2019-01-15 Thread GitBox
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: 
Consistent API for Ensemble training
URL: https://github.com/apache/ignite/pull/5767#discussion_r247885916
 
 

 ##
 File path: 
modules/ml/src/main/java/org/apache/ignite/ml/composition/CompositionUtils.java
 ##
 @@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.ml.composition;
+
+import org.apache.ignite.ml.IgniteModel;
+import org.apache.ignite.ml.dataset.DatasetBuilder;
+import org.apache.ignite.ml.math.functions.IgniteBiFunction;
+import org.apache.ignite.ml.math.primitives.vector.Vector;
+import org.apache.ignite.ml.trainers.DatasetTrainer;
+
+/**
+ * Various utility functions for trainers composition.
+ */
+public class CompositionUtils {
+/**
+ * Perform blurring of model type of given trainer to {@code 
IgniteModel}, where I, O are input and output
+ * types of original model.
+ *
+ * @param trainer Trainer to coerce.
+ * @param  Type of input of model produced by coerced trainer.
+ * @param  Type of output of model produced by coerced trainer.
+ * @param  Type of model produced by coerced trainer.
+ * @param  Type of labels.
+ * @return Trainer coerced to {@code DatasetTrainer, L>}.
+ */
+public static , L> 
DatasetTrainer, L> unsafeCoerce(
+DatasetTrainer trainer) {
+return new DatasetTrainer, L>() {
+/** {@inheritDoc} */
+@Override public  IgniteModel fit(DatasetBuilder 
datasetBuilder,
+IgniteBiFunction featureExtractor, 
IgniteBiFunction lbExtractor) {
+return trainer.fit(datasetBuilder, featureExtractor, 
lbExtractor);
+}
+
+/** {@inheritDoc} */
+@Override public  IgniteModel update(IgniteModel 
mdl, DatasetBuilder datasetBuilder,
+IgniteBiFunction featureExtractor, 
IgniteBiFunction lbExtractor) {
+DatasetTrainer, L> trainer1 = 
(DatasetTrainer, L>)trainer;
+return trainer1.update(mdl, datasetBuilder, featureExtractor, 
lbExtractor);
+}
+
+/** {@inheritDoc} */
+@Override protected boolean checkState(IgniteModel mdl) {
+return true;
 
 Review comment:
   Agree, done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training

2019-01-15 Thread GitBox
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: 
Consistent API for Ensemble training
URL: https://github.com/apache/ignite/pull/5767#discussion_r247885899
 
 

 ##
 File path: 
modules/ml/src/main/java/org/apache/ignite/ml/trainers/DatasetTrainer.java
 ##
 @@ -362,4 +362,29 @@ public EmptyDatasetException() {
 }
 }
 
+/**
+ * Returns the trainer which returns identity model.
+ *
+ * @param  Type of model input.
+ * @param  Type of labels in dataset.
+ * @return Trainer which returns identity model.
+ */
+public static  DatasetTrainer, L> 
identityTrainer() {
+return new DatasetTrainer, L>() {
+@Override public  IgniteModel fit(DatasetBuilder 
datasetBuilder,
+IgniteBiFunction featureExtractor,
+IgniteBiFunction lbExtractor) {
+return x -> x;
+}
+
+@Override protected boolean checkState(IgniteModel mdl) {
+return true;
+}
+
+@Override protected  IgniteModel 
updateModel(IgniteModel mdl, DatasetBuilder datasetBuilder,
 
 Review comment:
   Fixed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training

2019-01-15 Thread GitBox
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: 
Consistent API for Ensemble training
URL: https://github.com/apache/ignite/pull/5767#discussion_r247885884
 
 

 ##
 File path: 
modules/ml/src/main/java/org/apache/ignite/ml/trainers/DatasetTrainer.java
 ##
 @@ -362,4 +362,29 @@ public EmptyDatasetException() {
 }
 }
 
+/**
+ * Returns the trainer which returns identity model.
+ *
+ * @param  Type of model input.
+ * @param  Type of labels in dataset.
+ * @return Trainer which returns identity model.
+ */
+public static  DatasetTrainer, L> 
identityTrainer() {
+return new DatasetTrainer, L>() {
+@Override public  IgniteModel fit(DatasetBuilder 
datasetBuilder,
+IgniteBiFunction featureExtractor,
+IgniteBiFunction lbExtractor) {
+return x -> x;
+}
+
+@Override protected boolean checkState(IgniteModel mdl) {
 
 Review comment:
   Fixed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training

2019-01-15 Thread GitBox
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: 
Consistent API for Ensemble training
URL: https://github.com/apache/ignite/pull/5767#discussion_r247885840
 
 

 ##
 File path: 
modules/ml/src/main/java/org/apache/ignite/ml/math/functions/IgniteFunction.java
 ##
 @@ -18,6 +18,7 @@
 package org.apache.ignite.ml.math.functions;
 
 import java.io.Serializable;
+import java.util.Objects;
 
 Review comment:
   Fixed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training

2019-01-15 Thread GitBox
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: 
Consistent API for Ensemble training
URL: https://github.com/apache/ignite/pull/5767#discussion_r247885871
 
 

 ##
 File path: 
modules/ml/src/main/java/org/apache/ignite/ml/trainers/AdaptableDatasetTrainer.java
 ##
 @@ -56,27 +68,46 @@
  * @param  Type of labels.
  * @return Instance of this class.
  */
-public static , L> 
AdaptableDatasetTrainer of(DatasetTrainer wrapped) {
-return new AdaptableDatasetTrainer<>(IgniteFunction.identity(), 
wrapped, IgniteFunction.identity());
+public static , L> 
AdaptableDatasetTrainer of(
+DatasetTrainer wrapped) {
+return new AdaptableDatasetTrainer<>(IgniteFunction.identity(),
+wrapped,
+IgniteFunction.identity(),
+IgniteFunction.identity(),
+IgniteFunction.identity(),
+UpstreamTransformerBuilder.identity());
 }
 
 /**
  * Construct instance of this class with specified wrapped trainer and 
converter functions.
  *
  * @param before Function used to convert input type of wrapped trainer.
- * @param wrapped  Wrapped trainer.
+ * @param wrapped Wrapped trainer.
  * @param after Function used to convert output type of wrapped trainer.
+ * @param extractor
 
 Review comment:
   Fixed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training

2019-01-15 Thread GitBox
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: 
Consistent API for Ensemble training
URL: https://github.com/apache/ignite/pull/5767#discussion_r247885725
 
 

 ##
 File path: 
modules/ml/src/main/java/org/apache/ignite/ml/composition/DatasetMapping.java
 ##
 @@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.ml.composition;
+
+import org.apache.ignite.ml.math.functions.IgniteFunction;
+import org.apache.ignite.ml.math.primitives.vector.Vector;
+
+/**
+ * This class represents dataset mapping. This is just a tuple of two 
mappings: one for features and one for labels.
+ *
+ * @param  Type of labels before mapping.
+ * @param  Type of labels after mapping.
+ */
+public interface DatasetMapping {
 
 Review comment:
   For the moment, no.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training

2019-01-15 Thread GitBox
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: 
Consistent API for Ensemble training
URL: https://github.com/apache/ignite/pull/5767#discussion_r247885530
 
 

 ##
 File path: 
modules/ml/src/main/java/org/apache/ignite/ml/composition/bagging/BaggedTrainer.java
 ##
 @@ -0,0 +1,200 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.ml.composition.bagging;
+
+import java.util.Collections;
+import java.util.List;
+import java.util.Random;
+import java.util.stream.Collectors;
+import java.util.stream.IntStream;
+import java.util.stream.Stream;
+import org.apache.ignite.ml.IgniteModel;
+import org.apache.ignite.ml.composition.CompositionUtils;
+import 
org.apache.ignite.ml.composition.combinators.parallel.TrainersParallelComposition;
+import 
org.apache.ignite.ml.composition.predictionsaggregator.PredictionsAggregator;
+import org.apache.ignite.ml.dataset.DatasetBuilder;
+import org.apache.ignite.ml.environment.LearningEnvironmentBuilder;
+import org.apache.ignite.ml.math.functions.IgniteBiFunction;
+import org.apache.ignite.ml.math.functions.IgniteFunction;
+import org.apache.ignite.ml.math.primitives.vector.Vector;
+import org.apache.ignite.ml.math.primitives.vector.VectorUtils;
+import org.apache.ignite.ml.trainers.AdaptableDatasetTrainer;
+import org.apache.ignite.ml.trainers.DatasetTrainer;
+import org.apache.ignite.ml.trainers.transformers.BaggingUpstreamTransformer;
+import org.apache.ignite.ml.util.Utils;
+
+/**
+ * Trainer encapsulating logic of bootstrap aggregating (bagging).
+ * This trainer accepts some other trainer and returns bagged version of it.
+ * Resulting model consists of submodels results of which are aggregated by a 
specified aggregator.
+ * Bagging is done
+ * on both samples and features (https://en.wikipedia.org/wiki/Bootstrap_aggregating";>Samples 
bagging,
+ * https://en.wikipedia.org/wiki/Random_subspace_method";>Features 
bagging).
+ *
+ * @param  Type of model produced by trainer for which bagged version is 
created.
+ * @param  Type of labels.
+ * @param  Type of trainer for which bagged version is created.
+ */
+public class BaggedTrainer, L, T extends 
DatasetTrainer> extends
+DatasetTrainer {
+/** Trainer for which bagged version is created. */
+private final DatasetTrainer tr;
+
+/** Aggregator of submodels results. */
+private final PredictionsAggregator aggregator;
+
+/** Count of submodels in the ensemble. */
+private final int ensembleSize;
+
+/** Ratio determining which part of dataset will be taken as subsample for 
each submodel training. */
+private final double subsampleRatio;
+
+/** Dimensionality of feature vectors. */
+private final int featuresVectorSize;
+
+/** Dimension of subspace on which all samples from subsample are 
projected. */
+private final int featureSubspaceDim;
+
+/**
+ * Construct instance of this class with given parameters.
+ *
+ * @param tr Trainer for making bagged.
+ * @param aggregator Aggregator of models.
+ * @param ensembleSize Size of ensemble.
+ * @param subsampleRatio Ratio (subsample size) / (initial dataset size).
+ * @param featuresVectorSize Dimensionality of feature vector.
+ * @param featureSubspaceDim Dimensionality of feature subspace.
+ */
+public BaggedTrainer(DatasetTrainer tr,
+PredictionsAggregator aggregator, int ensembleSize, double 
subsampleRatio, int featuresVectorSize,
+int featureSubspaceDim) {
+this.tr = tr;
+this.aggregator = aggregator;
+this.ensembleSize = ensembleSize;
+this.subsampleRatio = subsampleRatio;
+this.featuresVectorSize = featuresVectorSize;
+this.featureSubspaceDim = featureSubspaceDim;
+}
+
+/**
+ * Create trainer bagged trainer.
+ *
+ * @return Bagged trainer.
+ */
+private DatasetTrainer, L> getTrainer() {
+List mappings = (featuresVectorSize > 0 && featureSubspaceDim 
!= featuresVectorSize) ?
+IntStream.range(0, ensembleSize).mapToObj(
+modelIdx -> getMapping(
+featuresVectorSize,
+featureSubspaceDim,
+ 

[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training

2019-01-15 Thread GitBox
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: 
Consistent API for Ensemble training
URL: https://github.com/apache/ignite/pull/5767#discussion_r247874720
 
 

 ##
 File path: 
modules/ml/src/main/java/org/apache/ignite/ml/composition/combinators/sequential/TrainersSequentialComposition.java
 ##
 @@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.ml.composition.combinators.sequential;
+
+import org.apache.ignite.ml.IgniteModel;
+import org.apache.ignite.ml.composition.CompositionUtils;
+import org.apache.ignite.ml.composition.DatasetMapping;
+import org.apache.ignite.ml.dataset.DatasetBuilder;
+import org.apache.ignite.ml.math.functions.IgniteBiFunction;
+import org.apache.ignite.ml.math.functions.IgniteFunction;
+import org.apache.ignite.ml.math.primitives.vector.Vector;
+import org.apache.ignite.ml.math.primitives.vector.VectorUtils;
+import org.apache.ignite.ml.trainers.DatasetTrainer;
+import sun.reflect.generics.reflectiveObjects.NotImplementedException;
+
+/**
+ * Sequential composition of trainers.
+ * Sequential composition of trainers is itself trainer which produces {@link 
ModelsSequentialComposition}.
+ * Training is done in following fashion:
+ * 
+ * 1. First trainer is trained and `mdl1` is produced.
+ * 2. From `mdl1` {@link DatasetMapping} is constructed. This mapping 
`dsM` encapsulates dependency between first
+ * training result and second trainer.
+ * 3. Second trainer is trained using dataset aquired from application 
`dsM` to original dataset; `mdl2` is produced.
+ * 4. `mdl1` and `mdl2` are composed into {@link 
ModelsSequentialComposition}.
+ * 
+ *
+ * @param  Type of input of model produced by first trainer.
+ * @param  Type of output of model produced by first trainer.
+ * @param  Type of output of model produced by second trainer.
+ * @param  Type of labels.
+ */
+public class TrainersSequentialComposition extends 
DatasetTrainer, L> {
+/** First trainer. */
+private DatasetTrainer, L> tr1;
+
+/** Second trainer. */
+private DatasetTrainer, L> tr2;
+
+/** Dataset mapping. */
+private IgniteFunction, DatasetMapping> 
datasetMapping;
+
+/**
+ * Construct sequential composition of given two trainers.
+ *
+ * @param tr1 First trainer.
+ * @param tr2 Second trainer.
+ * @param datasetMapping Dataset mapping.
+ */
+public TrainersSequentialComposition(DatasetTrainer, L> tr1,
+DatasetTrainer, L> tr2,
+IgniteFunction, DatasetMapping> 
datasetMapping) {
+this.tr1 = CompositionUtils.unsafeCoerce(tr1);
+this.tr2 = CompositionUtils.unsafeCoerce(tr2);
+this.datasetMapping = datasetMapping;
+}
+
+/** {@inheritDoc} */
+@Override public  ModelsSequentialComposition 
fit(DatasetBuilder datasetBuilder,
+IgniteBiFunction featureExtractor, IgniteBiFunction lbExtractor) {
+
+IgniteModel mdl1 = tr1.fit(datasetBuilder, featureExtractor, 
lbExtractor);
+DatasetMapping mapping = datasetMapping.apply(mdl1);
+
+IgniteModel mdl2 = tr2.fit(datasetBuilder,
+featureExtractor.andThen(mapping::mapFeatures),
+lbExtractor.andThen(mapping::mapLabels));
+
+return new ModelsSequentialComposition<>(mdl1, mdl2);
+}
+
+/** {@inheritDoc} */
+@Override public  ModelsSequentialComposition update(
+ModelsSequentialComposition mdl, DatasetBuilder 
datasetBuilder,
+IgniteBiFunction featureExtractor, IgniteBiFunction lbExtractor) {
+
+IgniteModel firstUpdated = tr1.update(mdl.firstModel(), 
datasetBuilder, featureExtractor, lbExtractor);
+DatasetMapping mapping = datasetMapping.apply(firstUpdated);
+
+IgniteModel secondUpdated = tr2.update(mdl.secondModel(),
+datasetBuilder,
+featureExtractor.andThen(mapping::mapFeatures),
+lbExtractor.andThen(mapping::mapLabels));
+
+return new ModelsSequentialComposition<>(firstUpdated, secondUpdated);
+}
+
+/** {@inheritDoc} */
+@Override protected boolean checkState(ModelsSequentialComposition mdl) {
+// Never called.
+throw new Illeg

[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training

2019-01-15 Thread GitBox
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: 
Consistent API for Ensemble training
URL: https://github.com/apache/ignite/pull/5767#discussion_r247885547
 
 

 ##
 File path: 
modules/ml/src/main/java/org/apache/ignite/ml/composition/bagging/BaggedTrainer.java
 ##
 @@ -0,0 +1,200 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.ml.composition.bagging;
+
+import java.util.Collections;
+import java.util.List;
+import java.util.Random;
+import java.util.stream.Collectors;
+import java.util.stream.IntStream;
+import java.util.stream.Stream;
+import org.apache.ignite.ml.IgniteModel;
+import org.apache.ignite.ml.composition.CompositionUtils;
+import 
org.apache.ignite.ml.composition.combinators.parallel.TrainersParallelComposition;
+import 
org.apache.ignite.ml.composition.predictionsaggregator.PredictionsAggregator;
+import org.apache.ignite.ml.dataset.DatasetBuilder;
+import org.apache.ignite.ml.environment.LearningEnvironmentBuilder;
+import org.apache.ignite.ml.math.functions.IgniteBiFunction;
+import org.apache.ignite.ml.math.functions.IgniteFunction;
+import org.apache.ignite.ml.math.primitives.vector.Vector;
+import org.apache.ignite.ml.math.primitives.vector.VectorUtils;
+import org.apache.ignite.ml.trainers.AdaptableDatasetTrainer;
+import org.apache.ignite.ml.trainers.DatasetTrainer;
+import org.apache.ignite.ml.trainers.transformers.BaggingUpstreamTransformer;
+import org.apache.ignite.ml.util.Utils;
+
+/**
+ * Trainer encapsulating logic of bootstrap aggregating (bagging).
+ * This trainer accepts some other trainer and returns bagged version of it.
+ * Resulting model consists of submodels results of which are aggregated by a 
specified aggregator.
+ * Bagging is done
+ * on both samples and features (https://en.wikipedia.org/wiki/Bootstrap_aggregating";>Samples 
bagging,
+ * https://en.wikipedia.org/wiki/Random_subspace_method";>Features 
bagging).
+ *
+ * @param  Type of model produced by trainer for which bagged version is 
created.
+ * @param  Type of labels.
+ * @param  Type of trainer for which bagged version is created.
+ */
+public class BaggedTrainer, L, T extends 
DatasetTrainer> extends
+DatasetTrainer {
+/** Trainer for which bagged version is created. */
+private final DatasetTrainer tr;
+
+/** Aggregator of submodels results. */
+private final PredictionsAggregator aggregator;
+
+/** Count of submodels in the ensemble. */
+private final int ensembleSize;
+
+/** Ratio determining which part of dataset will be taken as subsample for 
each submodel training. */
+private final double subsampleRatio;
+
+/** Dimensionality of feature vectors. */
+private final int featuresVectorSize;
+
+/** Dimension of subspace on which all samples from subsample are 
projected. */
+private final int featureSubspaceDim;
+
+/**
+ * Construct instance of this class with given parameters.
+ *
+ * @param tr Trainer for making bagged.
+ * @param aggregator Aggregator of models.
+ * @param ensembleSize Size of ensemble.
+ * @param subsampleRatio Ratio (subsample size) / (initial dataset size).
+ * @param featuresVectorSize Dimensionality of feature vector.
+ * @param featureSubspaceDim Dimensionality of feature subspace.
+ */
+public BaggedTrainer(DatasetTrainer tr,
+PredictionsAggregator aggregator, int ensembleSize, double 
subsampleRatio, int featuresVectorSize,
+int featureSubspaceDim) {
+this.tr = tr;
+this.aggregator = aggregator;
+this.ensembleSize = ensembleSize;
+this.subsampleRatio = subsampleRatio;
+this.featuresVectorSize = featuresVectorSize;
+this.featureSubspaceDim = featureSubspaceDim;
+}
+
+/**
+ * Create trainer bagged trainer.
+ *
+ * @return Bagged trainer.
+ */
+private DatasetTrainer, L> getTrainer() {
+List mappings = (featuresVectorSize > 0 && featureSubspaceDim 
!= featuresVectorSize) ?
+IntStream.range(0, ensembleSize).mapToObj(
+modelIdx -> getMapping(
+featuresVectorSize,
+featureSubspaceDim,
+ 

[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training

2019-01-15 Thread GitBox
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: 
Consistent API for Ensemble training
URL: https://github.com/apache/ignite/pull/5767#discussion_r247874720
 
 

 ##
 File path: 
modules/ml/src/main/java/org/apache/ignite/ml/composition/combinators/sequential/TrainersSequentialComposition.java
 ##
 @@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.ml.composition.combinators.sequential;
+
+import org.apache.ignite.ml.IgniteModel;
+import org.apache.ignite.ml.composition.CompositionUtils;
+import org.apache.ignite.ml.composition.DatasetMapping;
+import org.apache.ignite.ml.dataset.DatasetBuilder;
+import org.apache.ignite.ml.math.functions.IgniteBiFunction;
+import org.apache.ignite.ml.math.functions.IgniteFunction;
+import org.apache.ignite.ml.math.primitives.vector.Vector;
+import org.apache.ignite.ml.math.primitives.vector.VectorUtils;
+import org.apache.ignite.ml.trainers.DatasetTrainer;
+import sun.reflect.generics.reflectiveObjects.NotImplementedException;
+
+/**
+ * Sequential composition of trainers.
+ * Sequential composition of trainers is itself trainer which produces {@link 
ModelsSequentialComposition}.
+ * Training is done in following fashion:
+ * 
+ * 1. First trainer is trained and `mdl1` is produced.
+ * 2. From `mdl1` {@link DatasetMapping} is constructed. This mapping 
`dsM` encapsulates dependency between first
+ * training result and second trainer.
+ * 3. Second trainer is trained using dataset aquired from application 
`dsM` to original dataset; `mdl2` is produced.
+ * 4. `mdl1` and `mdl2` are composed into {@link 
ModelsSequentialComposition}.
+ * 
+ *
+ * @param  Type of input of model produced by first trainer.
+ * @param  Type of output of model produced by first trainer.
+ * @param  Type of output of model produced by second trainer.
+ * @param  Type of labels.
+ */
+public class TrainersSequentialComposition extends 
DatasetTrainer, L> {
+/** First trainer. */
+private DatasetTrainer, L> tr1;
+
+/** Second trainer. */
+private DatasetTrainer, L> tr2;
+
+/** Dataset mapping. */
+private IgniteFunction, DatasetMapping> 
datasetMapping;
+
+/**
+ * Construct sequential composition of given two trainers.
+ *
+ * @param tr1 First trainer.
+ * @param tr2 Second trainer.
+ * @param datasetMapping Dataset mapping.
+ */
+public TrainersSequentialComposition(DatasetTrainer, L> tr1,
+DatasetTrainer, L> tr2,
+IgniteFunction, DatasetMapping> 
datasetMapping) {
+this.tr1 = CompositionUtils.unsafeCoerce(tr1);
+this.tr2 = CompositionUtils.unsafeCoerce(tr2);
+this.datasetMapping = datasetMapping;
+}
+
+/** {@inheritDoc} */
+@Override public  ModelsSequentialComposition 
fit(DatasetBuilder datasetBuilder,
+IgniteBiFunction featureExtractor, IgniteBiFunction lbExtractor) {
+
+IgniteModel mdl1 = tr1.fit(datasetBuilder, featureExtractor, 
lbExtractor);
+DatasetMapping mapping = datasetMapping.apply(mdl1);
+
+IgniteModel mdl2 = tr2.fit(datasetBuilder,
+featureExtractor.andThen(mapping::mapFeatures),
+lbExtractor.andThen(mapping::mapLabels));
+
+return new ModelsSequentialComposition<>(mdl1, mdl2);
+}
+
+/** {@inheritDoc} */
+@Override public  ModelsSequentialComposition update(
+ModelsSequentialComposition mdl, DatasetBuilder 
datasetBuilder,
+IgniteBiFunction featureExtractor, IgniteBiFunction lbExtractor) {
+
+IgniteModel firstUpdated = tr1.update(mdl.firstModel(), 
datasetBuilder, featureExtractor, lbExtractor);
+DatasetMapping mapping = datasetMapping.apply(firstUpdated);
+
+IgniteModel secondUpdated = tr2.update(mdl.secondModel(),
+datasetBuilder,
+featureExtractor.andThen(mapping::mapFeatures),
+lbExtractor.andThen(mapping::mapLabels));
+
+return new ModelsSequentialComposition<>(firstUpdated, secondUpdated);
+}
+
+/** {@inheritDoc} */
+@Override protected boolean checkState(ModelsSequentialComposition mdl) {
+// Never called.
+throw new Illeg

[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training

2019-01-15 Thread GitBox
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: 
Consistent API for Ensemble training
URL: https://github.com/apache/ignite/pull/5767#discussion_r247873164
 
 

 ##
 File path: 
modules/ml/src/main/java/org/apache/ignite/ml/composition/CompositionUtils.java
 ##
 @@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.ml.composition;
+
+import org.apache.ignite.ml.IgniteModel;
+import org.apache.ignite.ml.dataset.DatasetBuilder;
+import org.apache.ignite.ml.math.functions.IgniteBiFunction;
+import org.apache.ignite.ml.math.primitives.vector.Vector;
+import org.apache.ignite.ml.trainers.DatasetTrainer;
+
+/**
+ * Various utility functions for trainers composition.
+ */
+public class CompositionUtils {
+/**
+ * Perform blurring of model type of given trainer to {@code 
IgniteModel}, where I, O are input and output
+ * types of original model.
+ *
+ * @param trainer Trainer to coerce.
+ * @param  Type of input of model produced by coerced trainer.
+ * @param  Type of output of model produced by coerced trainer.
+ * @param  Type of model produced by coerced trainer.
+ * @param  Type of labels.
+ * @return Trainer coerced to {@code DatasetTrainer, L>}.
+ */
+public static , L> 
DatasetTrainer, L> unsafeCoerce(
+DatasetTrainer trainer) {
+return new DatasetTrainer, L>() {
+/** {@inheritDoc} */
+@Override public  IgniteModel fit(DatasetBuilder 
datasetBuilder,
+IgniteBiFunction featureExtractor, 
IgniteBiFunction lbExtractor) {
+return trainer.fit(datasetBuilder, featureExtractor, 
lbExtractor);
+}
+
+/** {@inheritDoc} */
+@Override public  IgniteModel update(IgniteModel 
mdl, DatasetBuilder datasetBuilder,
+IgniteBiFunction featureExtractor, 
IgniteBiFunction lbExtractor) {
+DatasetTrainer, L> trainer1 = 
(DatasetTrainer, L>)trainer;
+return trainer1.update(mdl, datasetBuilder, featureExtractor, 
lbExtractor);
+}
+
+/** {@inheritDoc} */
+@Override protected boolean checkState(IgniteModel mdl) {
+return true;
+}
+
+/** {@inheritDoc} */
+@Override protected  IgniteModel 
updateModel(IgniteModel mdl, DatasetBuilder datasetBuilder,
+IgniteBiFunction featureExtractor, 
IgniteBiFunction lbExtractor) {
+return null;
 
 Review comment:
   Done, see https://github.com/apache/ignite/pull/5767#discussion_r247873122


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training

2019-01-15 Thread GitBox
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: 
Consistent API for Ensemble training
URL: https://github.com/apache/ignite/pull/5767#discussion_r247873122
 
 

 ##
 File path: 
modules/ml/src/main/java/org/apache/ignite/ml/composition/CompositionUtils.java
 ##
 @@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.ml.composition;
+
+import org.apache.ignite.ml.IgniteModel;
+import org.apache.ignite.ml.dataset.DatasetBuilder;
+import org.apache.ignite.ml.math.functions.IgniteBiFunction;
+import org.apache.ignite.ml.math.primitives.vector.Vector;
+import org.apache.ignite.ml.trainers.DatasetTrainer;
+
+/**
+ * Various utility functions for trainers composition.
+ */
+public class CompositionUtils {
+/**
+ * Perform blurring of model type of given trainer to {@code 
IgniteModel}, where I, O are input and output
+ * types of original model.
+ *
+ * @param trainer Trainer to coerce.
+ * @param  Type of input of model produced by coerced trainer.
+ * @param  Type of output of model produced by coerced trainer.
+ * @param  Type of model produced by coerced trainer.
+ * @param  Type of labels.
+ * @return Trainer coerced to {@code DatasetTrainer, L>}.
+ */
+public static , L> 
DatasetTrainer, L> unsafeCoerce(
+DatasetTrainer trainer) {
+return new DatasetTrainer, L>() {
+/** {@inheritDoc} */
+@Override public  IgniteModel fit(DatasetBuilder 
datasetBuilder,
+IgniteBiFunction featureExtractor, 
IgniteBiFunction lbExtractor) {
+return trainer.fit(datasetBuilder, featureExtractor, 
lbExtractor);
+}
+
+/** {@inheritDoc} */
+@Override public  IgniteModel update(IgniteModel 
mdl, DatasetBuilder datasetBuilder,
+IgniteBiFunction featureExtractor, 
IgniteBiFunction lbExtractor) {
+DatasetTrainer, L> trainer1 = 
(DatasetTrainer, L>)trainer;
+return trainer1.update(mdl, datasetBuilder, featureExtractor, 
lbExtractor);
+}
+
+/** {@inheritDoc} */
+@Override protected boolean checkState(IgniteModel mdl) {
+return true;
 
 Review comment:
   This method is never called. Now throwing exception to make it more clear.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training

2019-01-15 Thread GitBox
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: 
Consistent API for Ensemble training
URL: https://github.com/apache/ignite/pull/5767#discussion_r247870918
 
 

 ##
 File path: 
modules/ml/src/main/java/org/apache/ignite/ml/composition/combinators/parallel/ModelsParallelComposition.java
 ##
 @@ -0,0 +1,67 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.ml.composition.combinators.parallel;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.stream.Collectors;
+import org.apache.ignite.ml.IgniteModel;
+
+/**
+ * Parallel composition of models.
+ * Parallel composition of models is a model which contains a list of 
submodels with same input and output types.
+ * Result of prediction in such model is a list of predictions of each of 
submodels.
+ *
+ * @param  Type of submodel input.
+ * @param  Type of submodel output.
+ */
+public class ModelsParallelComposition implements IgniteModel> {
+/** List of submodels. */
+private final List> submodels;
+
+/**
+ * Construc an instance of this class from list of submodels.
+ *
+ * @param submodels List of submodels constituting this model.
+ */
+public ModelsParallelComposition(List> submodels) {
+this.submodels = submodels;
+}
+
+/** {@inheritDoc} */
+@Override public List predict(I i) {
+return submodels
+.stream()
+.map(m -> m.predict(i))
+.collect(Collectors.toList());
+}
+
+/**
+ * List of submodels constituting this model.
+ *
+ * @return List of submodels constituting this model.
+ */
+public List> submodels() {
+return new ArrayList<>(submodels);
 
 Review comment:
   Yeah, agree.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training

2019-01-15 Thread GitBox
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: 
Consistent API for Ensemble training
URL: https://github.com/apache/ignite/pull/5767#discussion_r247869073
 
 

 ##
 File path: 
modules/ml/src/main/java/org/apache/ignite/ml/composition/bagging/BaggedTrainer.java
 ##
 @@ -0,0 +1,200 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.ml.composition.bagging;
+
+import java.util.Collections;
+import java.util.List;
+import java.util.Random;
+import java.util.stream.Collectors;
+import java.util.stream.IntStream;
+import java.util.stream.Stream;
+import org.apache.ignite.ml.IgniteModel;
+import org.apache.ignite.ml.composition.CompositionUtils;
+import 
org.apache.ignite.ml.composition.combinators.parallel.TrainersParallelComposition;
+import 
org.apache.ignite.ml.composition.predictionsaggregator.PredictionsAggregator;
+import org.apache.ignite.ml.dataset.DatasetBuilder;
+import org.apache.ignite.ml.environment.LearningEnvironmentBuilder;
+import org.apache.ignite.ml.math.functions.IgniteBiFunction;
+import org.apache.ignite.ml.math.functions.IgniteFunction;
+import org.apache.ignite.ml.math.primitives.vector.Vector;
+import org.apache.ignite.ml.math.primitives.vector.VectorUtils;
+import org.apache.ignite.ml.trainers.AdaptableDatasetTrainer;
+import org.apache.ignite.ml.trainers.DatasetTrainer;
+import org.apache.ignite.ml.trainers.transformers.BaggingUpstreamTransformer;
+import org.apache.ignite.ml.util.Utils;
+
+/**
+ * Trainer encapsulating logic of bootstrap aggregating (bagging).
+ * This trainer accepts some other trainer and returns bagged version of it.
+ * Resulting model consists of submodels results of which are aggregated by a 
specified aggregator.
+ * Bagging is done
+ * on both samples and features (https://en.wikipedia.org/wiki/Bootstrap_aggregating";>Samples 
bagging,
+ * https://en.wikipedia.org/wiki/Random_subspace_method";>Features 
bagging).
+ *
+ * @param  Type of model produced by trainer for which bagged version is 
created.
+ * @param  Type of labels.
+ * @param  Type of trainer for which bagged version is created.
+ */
+public class BaggedTrainer, L, T extends 
DatasetTrainer> extends
+DatasetTrainer {
+/** Trainer for which bagged version is created. */
+private final DatasetTrainer tr;
+
+/** Aggregator of submodels results. */
+private final PredictionsAggregator aggregator;
+
+/** Count of submodels in the ensemble. */
+private final int ensembleSize;
+
+/** Ratio determining which part of dataset will be taken as subsample for 
each submodel training. */
+private final double subsampleRatio;
+
+/** Dimensionality of feature vectors. */
+private final int featuresVectorSize;
+
+/** Dimension of subspace on which all samples from subsample are 
projected. */
+private final int featureSubspaceDim;
+
+/**
+ * Construct instance of this class with given parameters.
+ *
+ * @param tr Trainer for making bagged.
+ * @param aggregator Aggregator of models.
+ * @param ensembleSize Size of ensemble.
+ * @param subsampleRatio Ratio (subsample size) / (initial dataset size).
+ * @param featuresVectorSize Dimensionality of feature vector.
+ * @param featureSubspaceDim Dimensionality of feature subspace.
+ */
+public BaggedTrainer(DatasetTrainer tr,
+PredictionsAggregator aggregator, int ensembleSize, double 
subsampleRatio, int featuresVectorSize,
+int featureSubspaceDim) {
+this.tr = tr;
+this.aggregator = aggregator;
+this.ensembleSize = ensembleSize;
+this.subsampleRatio = subsampleRatio;
+this.featuresVectorSize = featuresVectorSize;
+this.featureSubspaceDim = featureSubspaceDim;
+}
+
+/**
+ * Create trainer bagged trainer.
+ *
+ * @return Bagged trainer.
+ */
+private DatasetTrainer, L> getTrainer() {
+List mappings = (featuresVectorSize > 0 && featureSubspaceDim 
!= featuresVectorSize) ?
+IntStream.range(0, ensembleSize).mapToObj(
+modelIdx -> getMapping(
+featuresVectorSize,
+featureSubspaceDim,
+ 

[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training

2019-01-15 Thread GitBox
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: 
Consistent API for Ensemble training
URL: https://github.com/apache/ignite/pull/5767#discussion_r247868951
 
 

 ##
 File path: 
modules/ml/src/main/java/org/apache/ignite/ml/composition/bagging/BaggedModel.java
 ##
 @@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.ml.composition.bagging;
+
+import org.apache.ignite.ml.IgniteModel;
+import org.apache.ignite.ml.math.primitives.vector.Vector;
+
+/**
+ * This class represents model produced by {@link BaggedTrainer}.
+ * It is a wrapper around inner representation of model produced by {@link 
BaggedTrainer}.
+ */
+public class BaggedModel implements IgniteModel {
 
 Review comment:
   Yes, we could do that, but after I decided to drop fully type-safe Bagged 
models because of heavy-looking generics, I decided at least do some 
type-safety and make this wrapper. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: Consistent API for Ensemble training

2019-01-15 Thread GitBox
artemmalykh commented on a change in pull request #5767: [ML] IGNITE-10573: 
Consistent API for Ensemble training
URL: https://github.com/apache/ignite/pull/5767#discussion_r247868450
 
 

 ##
 File path: 
modules/ml/src/main/java/org/apache/ignite/ml/composition/DatasetMapping.java
 ##
 @@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.ml.composition;
+
+import org.apache.ignite.ml.math.functions.IgniteFunction;
+import org.apache.ignite.ml.math.primitives.vector.Vector;
+
+/**
+ * This class represents dataset mapping. This is just a tuple of two 
mappings: one for features and one for labels.
+ *
+ * @param  Type of labels before mapping.
+ * @param  Type of labels after mapping.
+ */
+public interface DatasetMapping {
+/**
+ * Method used to map feature vectors.
+ *
+ * @param v Feature vector.
+ * @return Mapped feature vector.
+ */
+public default Vector mapFeatures(Vector v) {
 
 Review comment:
   Because there is no sensible default mapping `L1 -> L2`, but for `Vector -> 
Vector` there is `id`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services