[jira] [Created] (IGNITE-12218) [ML] Add support for Strings in Vectorizer
Aleksey Zinoviev created IGNITE-12218: - Summary: [ML] Add support for Strings in Vectorizer Key: IGNITE-12218 URL: https://issues.apache.org/jira/browse/IGNITE-12218 Project: Ignite Issue Type: Sub-task Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Currently the signatures of vectorizers are limited, should extend for Strings support -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12216) [ML][Umbrella]
Aleksey Zinoviev created IGNITE-12216: - Summary: [ML][Umbrella] Key: IGNITE-12216 URL: https://issues.apache.org/jira/browse/IGNITE-12216 Project: Ignite Issue Type: New Feature Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Discussion here [http://apache-ignite-developers.2346864.n4.nabble.com/ML-DISCUSSION-Big-Double-problem-td42262.html#a42267] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12180) [ML] Add support of the next Imputing Strategies: MIN, MAX
Aleksey Zinoviev created IGNITE-12180: - Summary: [ML] Add support of the next Imputing Strategies: MIN, MAX Key: IGNITE-12180 URL: https://issues.apache.org/jira/browse/IGNITE-12180 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Add support of the next Imputing Strategies: MIN, MAX -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (IGNITE-12168) [ML] Flaky ML example tests
Aleksey Zinoviev created IGNITE-12168: - Summary: [ML] Flaky ML example tests Key: IGNITE-12168 URL: https://issues.apache.org/jira/browse/IGNITE-12168 Project: Ignite Issue Type: Bug Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Discussed here [http://apache-ignite-developers.2346864.n4.nabble.com/After-IGNITE-12148-the-Examples-suite-has-unstable-tests-td43469.html] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (IGNITE-12148) [ML] Recommendation Engine
Aleksey Zinoviev created IGNITE-12148: - Summary: [ML] Recommendation Engine Key: IGNITE-12148 URL: https://issues.apache.org/jira/browse/IGNITE-12148 Project: Ignite Issue Type: New Feature Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 The main idea - the provide the recommendation engine to build the recommendation system over the Ignite cache and via SQL operators -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (IGNITE-12079) [ML][Umbrella] Add advanced preprocessing techniques
Aleksey Zinoviev created IGNITE-12079: - Summary: [ML][Umbrella] Add advanced preprocessing techniques Key: IGNITE-12079 URL: https://issues.apache.org/jira/browse/IGNITE-12079 Project: Ignite Issue Type: New Feature Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 *Main goal:* To reduce the gap between Apache Spark and Apache Ignite in preprocessing operations. The reducing of the gap could help with loading Spark ML Pipelines to Ignite ML. Next steps: # Add Frequency Encoder # Add two Imputing Strategies (MIN, MAX, COUNT, MOST_FREQUENT, LEAST_FREQUENT) # Add RobustScaler (will be added in Spark 3.0) # Add CountVectorizer # Add FeatureHasher # Add QuantileDiscretizer # Add Locality Sensitive Hashing (LSH) # Add LabelEncoder # Add RevertStringIndexing # Add multi-column preprocessor -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (IGNITE-11680) [ML] Improve ROC AUC to work with ProbableLabel
Aleksey Zinoviev created IGNITE-11680: - Summary: [ML] Improve ROC AUC to work with ProbableLabel Key: IGNITE-11680 URL: https://issues.apache.org/jira/browse/IGNITE-11680 Project: Ignite Issue Type: Sub-task Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev The ROC AUC implementation is ready to work with Probable label instead of binary label (0.0/1.0) It should work in future for multi-classification tasks -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11295) [ML] Add readme file to SparkModelParser module
Aleksey Zinoviev created IGNITE-11295: - Summary: [ML] Add readme file to SparkModelParser module Key: IGNITE-11295 URL: https://issues.apache.org/jira/browse/IGNITE-11295 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 This file should contain examples of usage and instruction how to use this module -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11294) [ML] Use ML logger and env variables in Spark ML Parser
Aleksey Zinoviev created IGNITE-11294: - Summary: [ML] Use ML logger and env variables in Spark ML Parser Key: IGNITE-11294 URL: https://issues.apache.org/jira/browse/IGNITE-11294 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Add logger to SparkModelParser class and environment usage -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11244) [ML] Improve model loading from directory instead full path to file with model
Aleksey Zinoviev created IGNITE-11244: - Summary: [ML] Improve model loading from directory instead full path to file with model Key: IGNITE-11244 URL: https://issues.apache.org/jira/browse/IGNITE-11244 Project: Ignite Issue Type: Sub-task Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev The proposed feature should support auto-discovering of Spark models in the suggested directories -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11041) [ML] Add parser for Spark Gradient-boosted tree regressor
Aleksey Zinoviev created IGNITE-11041: - Summary: [ML] Add parser for Spark Gradient-boosted tree regressor Key: IGNITE-11041 URL: https://issues.apache.org/jira/browse/IGNITE-11041 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 # Write Spark example producing Gradient-boosted tree regressor # Save model to parquet file # Parse parquet file # Add an example -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11040) [ML] Add parser for Spark Random forest regressor
Aleksey Zinoviev created IGNITE-11040: - Summary: [ML] Add parser for Spark Random forest regressor Key: IGNITE-11040 URL: https://issues.apache.org/jira/browse/IGNITE-11040 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 # Write Spark example producing Random Forest regressor # Save model to parquet file # Parse parquet file # Add an example -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11039) [ML] Add parser for Spark Decision tree regression
Aleksey Zinoviev created IGNITE-11039: - Summary: [ML] Add parser for Spark Decision tree regression Key: IGNITE-11039 URL: https://issues.apache.org/jira/browse/IGNITE-11039 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 # Write Spark example producing Decision Tree Regressor # Save model to parquet file # Parse parquet file # Add an example -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11037) [ML] Add parser for Spark KMeans clustering model
Aleksey Zinoviev created IGNITE-11037: - Summary: [ML] Add parser for Spark KMeans clustering model Key: IGNITE-11037 URL: https://issues.apache.org/jira/browse/IGNITE-11037 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11012) [ML] Add model type validation during parsing parquet file
Aleksey Zinoviev created IGNITE-11012: - Summary: [ML] Add model type validation during parsing parquet file Key: IGNITE-11012 URL: https://issues.apache.org/jira/browse/IGNITE-11012 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 After resolving ignite path, check special field in parquet file to validate apropriate model loading. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11005) [ML] Add parser for Spark Gradient-boosted tree classifier
Aleksey Zinoviev created IGNITE-11005: - Summary: [ML] Add parser for Spark Gradient-boosted tree classifier Key: IGNITE-11005 URL: https://issues.apache.org/jira/browse/IGNITE-11005 Project: Ignite Issue Type: Sub-task Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev # Write Spark example producing Gradient-boosted tree classifier model # Save model to parquet file # Parse parquet file # Add an example -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11003) [ML] Add parser for Spark Random forest classifier
Aleksey Zinoviev created IGNITE-11003: - Summary: [ML] Add parser for Spark Random forest classifier Key: IGNITE-11003 URL: https://issues.apache.org/jira/browse/IGNITE-11003 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11002) [ML] Add parser for Spark Decision tree classifier model
Aleksey Zinoviev created IGNITE-11002: - Summary: [ML] Add parser for Spark Decision tree classifier model Key: IGNITE-11002 URL: https://issues.apache.org/jira/browse/IGNITE-11002 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 # Write Spark example producing Linear SVM model # Save model to parquet file # Parse parquet file # Add an example -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11001) [ML] Add parser for Spark Linear SVM model
Aleksey Zinoviev created IGNITE-11001: - Summary: [ML] Add parser for Spark Linear SVM model Key: IGNITE-11001 URL: https://issues.apache.org/jira/browse/IGNITE-11001 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 # Write Spark example producing Linear SVM model # Save model to parquet file # Parse parquet file # Add an example -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11000) [ML] Add parser for Spark LinearRegression
Aleksey Zinoviev created IGNITE-11000: - Summary: [ML] Add parser for Spark LinearRegression Key: IGNITE-11000 URL: https://issues.apache.org/jira/browse/IGNITE-11000 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 # Write Spark example producing LinearRegression model # Save model to parquet file # Parse parquet file -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10968) [ML] Create new ignite module SparkMLModelImport and add LogRegression converter
Aleksey Zinoviev created IGNITE-10968: - Summary: [ML] Create new ignite module SparkMLModelImport and add LogRegression converter Key: IGNITE-10968 URL: https://issues.apache.org/jira/browse/IGNITE-10968 Project: Ignite Issue Type: Sub-task Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev * Create new module * Add specific dependencies (ml/hadoop/spark/parquet) * Move LogRegression example to this module -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10904) [ML] Refactor all examples with regression to use RegressionMetrics
Aleksey Zinoviev created IGNITE-10904: - Summary: [ML] Refactor all examples with regression to use RegressionMetrics Key: IGNITE-10904 URL: https://issues.apache.org/jira/browse/IGNITE-10904 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Look for all regression examples and add as a final step the RegressionMetrics usage -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10902) [ML] Implement a few regression metrics in one RegressionMetrics class
Aleksey Zinoviev created IGNITE-10902: - Summary: [ML] Implement a few regression metrics in one RegressionMetrics class Key: IGNITE-10902 URL: https://issues.apache.org/jira/browse/IGNITE-10902 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Look for possible metrics in Spark, Smile, Scikit-learn -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10901) [ML][Umbrella] Add support of regression metrics to evaluate regression
Aleksey Zinoviev created IGNITE-10901: - Summary: [ML][Umbrella] Add support of regression metrics to evaluate regression Key: IGNITE-10901 URL: https://issues.apache.org/jira/browse/IGNITE-10901 Project: Ignite Issue Type: Improvement Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Look at scikit-learn metrics like |*Regression*| | | |‘explained_variance’|[{{metrics.explained_variance_score}}|https://scikit-learn.org/stable/modules/generated/sklearn.metrics.explained_variance_score.html#sklearn.metrics.explained_variance_score]| | |‘neg_mean_absolute_error’|[{{metrics.mean_absolute_error}}|https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_absolute_error.html#sklearn.metrics.mean_absolute_error]| | |‘neg_mean_squared_error’|[{{metrics.mean_squared_error}}|https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html#sklearn.metrics.mean_squared_error]| | |‘neg_mean_squared_log_error’|[{{metrics.mean_squared_log_error}}|https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_log_error.html#sklearn.metrics.mean_squared_log_error]| | |‘neg_median_absolute_error’|[{{metrics.median_absolute_error}}|https://scikit-learn.org/stable/modules/generated/sklearn.metrics.median_absolute_error.html#sklearn.metrics.median_absolute_error]| | |‘r2’|[{{metrics.r2_score}}|https://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html#sklearn.metrics.r2_score]| -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10870) [ML] Add an example for KNN/LogReg and multi-class task full Iris dataset
Aleksey Zinoviev created IGNITE-10870: - Summary: [ML] Add an example for KNN/LogReg and multi-class task full Iris dataset Key: IGNITE-10870 URL: https://issues.apache.org/jira/browse/IGNITE-10870 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Add a one or two examples for KNN/LogReg and Iris dataset with 3 classes -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10869) [ML] Add MultiClass classification metrics
Aleksey Zinoviev created IGNITE-10869: - Summary: [ML] Add MultiClass classification metrics Key: IGNITE-10869 URL: https://issues.apache.org/jira/browse/IGNITE-10869 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Add ability to calculate multiple metrics (as binary metrics) for multiclass classification It can be merged with OneVsRest approach -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10866) [ML] Add an example of LogRegression model loading
Aleksey Zinoviev created IGNITE-10866: - Summary: [ML] Add an example of LogRegression model loading Key: IGNITE-10866 URL: https://issues.apache.org/jira/browse/IGNITE-10866 Project: Ignite Issue Type: Sub-task Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Load the LogReg model from Spark via Spark ML Writable to parquet file -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10865) [ML] [Umbrella] Integration with Spark ML
Aleksey Zinoviev created IGNITE-10865: - Summary: [ML] [Umbrella] Integration with Spark ML Key: IGNITE-10865 URL: https://issues.apache.org/jira/browse/IGNITE-10865 Project: Ignite Issue Type: New Feature Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Investigate how to load ML models from Spark -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10804) [ML] Add ability to load LinReg model from Spark to Ignite via PMML
Aleksey Zinoviev created IGNITE-10804: - Summary: [ML] Add ability to load LinReg model from Spark to Ignite via PMML Key: IGNITE-10804 URL: https://issues.apache.org/jira/browse/IGNITE-10804 Project: Ignite Issue Type: Sub-task Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev 1) Write simple ML pipeline for Spark 2) Convert to PMML model 3) Load to Ignite 4) Predict on Ignite -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10803) [ML] Add prototype LinearRegression loading from PMML format
Aleksey Zinoviev created IGNITE-10803: - Summary: [ML] Add prototype LinearRegression loading from PMML format Key: IGNITE-10803 URL: https://issues.apache.org/jira/browse/IGNITE-10803 Project: Ignite Issue Type: Sub-task Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Generate or get existing PMML model for known dataset to load and predict new data in Ignite -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10792) [ML] Add seed to test-train filter
Aleksey Zinoviev created IGNITE-10792: - Summary: [ML] Add seed to test-train filter Key: IGNITE-10792 URL: https://issues.apache.org/jira/browse/IGNITE-10792 Project: Ignite Issue Type: Task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Need to reproduce results from test to test in second Evaluator test -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10713) [ML] Refactor examples with accuracy calculation and another metrics usage
Aleksey Zinoviev created IGNITE-10713: - Summary: [ML] Refactor examples with accuracy calculation and another metrics usage Key: IGNITE-10713 URL: https://issues.apache.org/jira/browse/IGNITE-10713 Project: Ignite Issue Type: Sub-task Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Avoid manual calculation of accuracy, use evaluator instead of counters in examples -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10711) [ML] [Umbrella] Provide metrics to evaluate the quality of model
Aleksey Zinoviev created IGNITE-10711: - Summary: [ML] [Umbrella] Provide metrics to evaluate the quality of model Key: IGNITE-10711 URL: https://issues.apache.org/jira/browse/IGNITE-10711 Project: Ignite Issue Type: New Feature Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 This is an umbrella ticket for all metric-related tickets -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10697) [ML] Add Frequency Encoding
Aleksey Zinoviev created IGNITE-10697: - Summary: [ML] Add Frequency Encoding Key: IGNITE-10697 URL: https://issues.apache.org/jira/browse/IGNITE-10697 Project: Ignite Issue Type: New Feature Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Encode the values to a fraction of all the labels. Can work with linear models if the frequency is correlated with the target value. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10606) [ML] Add tests for Evaluator
Aleksey Zinoviev created IGNITE-10606: - Summary: [ML] Add tests for Evaluator Key: IGNITE-10606 URL: https://issues.apache.org/jira/browse/IGNITE-10606 Project: Ignite Issue Type: Task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Cover the Evaluator static methods by tests. It should be simple tests smaller than Evaluator example -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10605) [ML] Add multiple metrics calculations to Cross-Validation
Aleksey Zinoviev created IGNITE-10605: - Summary: [ML] Add multiple metrics calculations to Cross-Validation Key: IGNITE-10605 URL: https://issues.apache.org/jira/browse/IGNITE-10605 Project: Ignite Issue Type: Improvement Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Extend and refactor CrossValidation class methods with scoreCalculator parameter. Refactor tests and examples and tutorial according new changes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10532) [ML] Add Confusion Matrix for multi-class classification
Aleksey Zinoviev created IGNITE-10532: - Summary: [ML] Add Confusion Matrix for multi-class classification Key: IGNITE-10532 URL: https://issues.apache.org/jira/browse/IGNITE-10532 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Explore ability to integrate the OneVsRest with ConfusionMatrix calculation also it can be implemented only after MultiClassEvaluator (no ticket yet) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10531) [ML] Refactor all examples to use Binary Confusion Matrix instead of calculations by hand
Aleksey Zinoviev created IGNITE-10531: - Summary: [ML] Refactor all examples to use Binary Confusion Matrix instead of calculations by hand Key: IGNITE-10531 URL: https://issues.apache.org/jira/browse/IGNITE-10531 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Change // Build confusion matrix. See https://en.wikipedia.org/wiki/Confusion_matrix int[][] confusionMtx = \{{0, 0}, \{0, 0}}; to usage of ConfusionMatrix -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10530) [ML] Add Confusion Matrix for Binary Classification
Aleksey Zinoviev created IGNITE-10530: - Summary: [ML] Add Confusion Matrix for Binary Classification Key: IGNITE-10530 URL: https://issues.apache.org/jira/browse/IGNITE-10530 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Add special class to build confusion matrix as a product of evaluation process -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10529) [ML] Add Confusion Matrix support for classification algorithms
Aleksey Zinoviev created IGNITE-10529: - Summary: [ML] Add Confusion Matrix support for classification algorithms Key: IGNITE-10529 URL: https://issues.apache.org/jira/browse/IGNITE-10529 Project: Ignite Issue Type: New Feature Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 This is an umbrella ticket for Confusion Matrix Support -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10528) [ML] Fix incorrect comparing of double values in ML examples
Aleksey Zinoviev created IGNITE-10528: - Summary: [ML] Fix incorrect comparing of double values in ML examples Key: IGNITE-10528 URL: https://issues.apache.org/jira/browse/IGNITE-10528 Project: Ignite Issue Type: Bug Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Look at code row if (groundTruth != prediction) in each example Fix with Math.abs or Double.compare method (don't forget precision) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10428) [ML] Add example for OneVsRest trainer/model usage
Aleksey Zinoviev created IGNITE-10428: - Summary: [ML] Add example for OneVsRest trainer/model usage Key: IGNITE-10428 URL: https://issues.apache.org/jira/browse/IGNITE-10428 Project: Ignite Issue Type: Wish Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 This example should use LogReg or SVM or DT to train multiclass model to distinguish classes on prepared dataset (generate or use wide-known dataset) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10426) [ML] Spread parameter isKeepRawLabels across all models
Aleksey Zinoviev created IGNITE-10426: - Summary: [ML] Spread parameter isKeepRawLabels across all models Key: IGNITE-10426 URL: https://issues.apache.org/jira/browse/IGNITE-10426 Project: Ignite Issue Type: Improvement Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Currently, a few models has the parameter isKeepRawLabels and threshold to change predicted value to one of class labels 1 or 0. Discuss this in dev-list and think how to solve this task to optimize MultiClassModel Possible solution: * add these methods to common model * add this method to MultiClassModel and use reflection to check this parameter in apply method for example -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10407) [ML] Add Multi-label multi-class classification trainer and model
Aleksey Zinoviev created IGNITE-10407: - Summary: [ML] Add Multi-label multi-class classification trainer and model Key: IGNITE-10407 URL: https://issues.apache.org/jira/browse/IGNITE-10407 Project: Ignite Issue Type: New Feature Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Improve Ignite ML ability to work with tasks for multi-labeled multi-classification It requiers * extension of current API with models for Double prediction only * addition of common OneVsRest Multi-labeled Multi-classification Model and Trainer * preparing apropriate datasets for example and testing * -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10405) [ML] Refactor GaussianNaiveBayesTrainerExample to read data sample from file
Aleksey Zinoviev created IGNITE-10405: - Summary: [ML] Refactor GaussianNaiveBayesTrainerExample to read data sample from file Key: IGNITE-10405 URL: https://issues.apache.org/jira/browse/IGNITE-10405 Project: Ignite Issue Type: Improvement Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Remove IrisDataset class in Utils and use two_classed_iris.csv to load dataset from csv file. Also, delete method of filling SandboxMLDatasets with double[][] array. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10380) [ML] Drop Multi-label Classification for Logistic Regression and SVM
Aleksey Zinoviev created IGNITE-10380: - Summary: [ML] Drop Multi-label Classification for Logistic Regression and SVM Key: IGNITE-10380 URL: https://issues.apache.org/jira/browse/IGNITE-10380 Project: Ignite Issue Type: Improvement Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 After One-vs-Rest implementation these separate algorithms could be dropped both. Also, rename BinaryClassification LogReg -> LogReg BinarySVM -> SVM NOTE: Appropriate Docs should be dropped in release 2.8 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10371) [ML] Add multiple metrics calculation fo Binary Classification Evaluation process
Aleksey Zinoviev created IGNITE-10371: - Summary: [ML] Add multiple metrics calculation fo Binary Classification Evaluation process Key: IGNITE-10371 URL: https://issues.apache.org/jira/browse/IGNITE-10371 Project: Ignite Issue Type: New Feature Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Add ability to get map of metrics to evaluate binary classification. Try to implement: All implemented metrics should be calculated for one iteration cycle along the data Naive implementation: compose all passed metrics and calculate them separatly -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9936) [ML] Make readable the models ouput in RandomForestClassificationExample
Aleksey Zinoviev created IGNITE-9936: Summary: [ML] Make readable the models ouput in RandomForestClassificationExample Key: IGNITE-9936 URL: https://issues.apache.org/jira/browse/IGNITE-9936 Project: Ignite Issue Type: Improvement Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 The output is >>> Trained model: Models composition [ aggregator = [OnMajorityPredictionsAggregator], models = [ org.apache.ignite.ml.tree.randomforest.data.TreeRoot@7d3d101b, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@30c8681, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@5cdec700, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@6d026701, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@78aa1f72, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@1f75a668, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@35399441, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@4b7dc788, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@6304101a, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@5170bcf4, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@2812b107, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@df6620a, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@4e31276e, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@1a72a540, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@27d5a580, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@198d6542, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@5e403b4a, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@5117dd67, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@5be49b60, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@2931522b, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@7674b62c, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@19e7a160, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@662706a7, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@45a4b042, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@16b2bb0c, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@327af41b, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@6cb6decd, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@c7045b9, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@f99f5e0, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@6aa61224, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@30bce90b, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@3e6f3f28, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@7e19ebf0, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@2474f125, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@7357a011, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@3406472c, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@5717c37, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@68f4865, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@4816278d, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@4eaf3684, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@40317ba2, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@3c01cfa1, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@45d2ade3, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@727eb8cb, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@39d9314d, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@b978d10, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@5b7a8434, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@5c45d770, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@2ce6c6ec, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@1bae316d, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@147a5d08, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@6676f6a0, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@7cbd9d24, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@1672fe87, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@5026735c, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@1b45c0e, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@11f0a5a1, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@10f7f7de, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@73a8da0f, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@50dfbc58, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@4416d64f, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@6bf08014, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@5e3d57c7, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@732d0d24, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@1fb19a0, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@6ee4d9ab, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@5a5338df, org.apache.ignite.ml.tree.randomforest.data.TreeRoot@418c5a9c,
[jira] [Created] (IGNITE-9910) [ML] Move the static copy-pasted datasets from examples to special Util class
Aleksey Zinoviev created IGNITE-9910: Summary: [ML] Move the static copy-pasted datasets from examples to special Util class Key: IGNITE-9910 URL: https://issues.apache.org/jira/browse/IGNITE-9910 Project: Ignite Issue Type: Improvement Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 There are a few copy-pasted datasets like Iris, Titanic and etc. They should be refactored to one dataset class with constants -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9718) [ML] Incorrect JavaDoc in RandomForest
Aleksey Zinoviev created IGNITE-9718: Summary: [ML] Incorrect JavaDoc in RandomForest Key: IGNITE-9718 URL: https://issues.apache.org/jira/browse/IGNITE-9718 Project: Ignite Issue Type: Bug Components: ml Affects Versions: 2.7 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.7 ignite/modules/ml/src/main/java/org/apache/ignite/ml/tree/randomforest/RandomForestTrainer.java:141: warning - @param argument "cntOfTrees" is not a parameter name. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9717) [ML] Add setters methods to Logistic Regression and fix examples/tests
Aleksey Zinoviev created IGNITE-9717: Summary: [ML] Add setters methods to Logistic Regression and fix examples/tests Key: IGNITE-9717 URL: https://issues.apache.org/jira/browse/IGNITE-9717 Project: Ignite Issue Type: Bug Components: ml Affects Versions: 2.7 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.7 Logistic Regression and Multilayered Perceptron can not be used in Pipeline due to unexisting setter methods .withFieldName. Also examples and tests shoould be fixed -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9713) [ML] Fix JavaDocs in ML Prerpocessing package
Aleksey Zinoviev created IGNITE-9713: Summary: [ML] Fix JavaDocs in ML Prerpocessing package Key: IGNITE-9713 URL: https://issues.apache.org/jira/browse/IGNITE-9713 Project: Ignite Issue Type: Bug Components: ml Affects Versions: 2.7 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.7 JavaDocs are incorrect in StringEncoder Preprocessor, Encoder Trainer, Binarization. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9634) [ML] Trainers as pipeline parameters that can be varied
Aleksey Zinoviev created IGNITE-9634: Summary: [ML] Trainers as pipeline parameters that can be varied Key: IGNITE-9634 URL: https://issues.apache.org/jira/browse/IGNITE-9634 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Based http://apache-ignite-developers.2346864.n4.nabble.com/ML-New-Feature-Trainers-as-pipeline-parameters-that-can-be-varied-td35132.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9633) [ML] Hyperparameter tuning improvements umbrella ticket
Aleksey Zinoviev created IGNITE-9633: Summary: [ML] Hyperparameter tuning improvements umbrella ticket Key: IGNITE-9633 URL: https://issues.apache.org/jira/browse/IGNITE-9633 Project: Ignite Issue Type: New Feature Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Umbrella ticket for all hyperparameter tuning improvements -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9587) [ML] Umbrella ticket: Handle different labels in training data and handle unknown labels in test or updated training data correctly
Aleksey Zinoviev created IGNITE-9587: Summary: [ML] Umbrella ticket: Handle different labels in training data and handle unknown labels in test or updated training data correctly Key: IGNITE-9587 URL: https://issues.apache.org/jira/browse/IGNITE-9587 Project: Ignite Issue Type: New Feature Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev The problem is that all algorithms of binary classification are ready to handle the datasets marked with 0/1 labels and predict 0/1 labels without especial mapping. Also the algorithms don't handle situation with unknown labels during the updating and testing phases Possible solution: it could be stored in context of ML training -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9582) Document Model Updating
Aleksey Zinoviev created IGNITE-9582: Summary: Document Model Updating Key: IGNITE-9582 URL: https://issues.apache.org/jira/browse/IGNITE-9582 Project: Ignite Issue Type: Task Components: documentation, ml Reporter: Aleksey Zinoviev Assignee: Alexey Platonov -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9581) Document ANN algorithm based on ACD concept
Aleksey Zinoviev created IGNITE-9581: Summary: Document ANN algorithm based on ACD concept Key: IGNITE-9581 URL: https://issues.apache.org/jira/browse/IGNITE-9581 Project: Ignite Issue Type: Task Components: documentation, ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.7 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9579) Document Random Forest
Aleksey Zinoviev created IGNITE-9579: Summary: Document Random Forest Key: IGNITE-9579 URL: https://issues.apache.org/jira/browse/IGNITE-9579 Project: Ignite Issue Type: Task Components: documentation, ml Reporter: Aleksey Zinoviev Assignee: Alexey Platonov Fix For: 2.7 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9578) Document K-fold cross validation of models
Aleksey Zinoviev created IGNITE-9578: Summary: Document K-fold cross validation of models Key: IGNITE-9578 URL: https://issues.apache.org/jira/browse/IGNITE-9578 Project: Ignite Issue Type: Task Components: documentation, ml Reporter: Aleksey Zinoviev Assignee: Anton Dmitriev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9577) Document Preprocessing
Aleksey Zinoviev created IGNITE-9577: Summary: Document Preprocessing Key: IGNITE-9577 URL: https://issues.apache.org/jira/browse/IGNITE-9577 Project: Ignite Issue Type: Task Components: documentation, ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.7 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9576) Document Multi-Class Logistic Regression
Aleksey Zinoviev created IGNITE-9576: Summary: Document Multi-Class Logistic Regression Key: IGNITE-9576 URL: https://issues.apache.org/jira/browse/IGNITE-9576 Project: Ignite Issue Type: Task Components: documentation, ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.7 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9575) Document Binary Logistic Regression
Aleksey Zinoviev created IGNITE-9575: Summary: Document Binary Logistic Regression Key: IGNITE-9575 URL: https://issues.apache.org/jira/browse/IGNITE-9575 Project: Ignite Issue Type: Task Components: documentation, ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.7 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9574) Document Gradient boosting
Aleksey Zinoviev created IGNITE-9574: Summary: Document Gradient boosting Key: IGNITE-9574 URL: https://issues.apache.org/jira/browse/IGNITE-9574 Project: Ignite Issue Type: Task Components: documentation, ml Reporter: Aleksey Zinoviev Assignee: Alexey Platonov Fix For: 2.7 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9514) [ML] Reduce time for the updating models on many partitions
Aleksey Zinoviev created IGNITE-9514: Summary: [ML] Reduce time for the updating models on many partitions Key: IGNITE-9514 URL: https://issues.apache.org/jira/browse/IGNITE-9514 Project: Ignite Issue Type: Task Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9513) [ML] Unify all preprocessors trainers' generics
Aleksey Zinoviev created IGNITE-9513: Summary: [ML] Unify all preprocessors trainers' generics Key: IGNITE-9513 URL: https://issues.apache.org/jira/browse/IGNITE-9513 Project: Ignite Issue Type: Improvement Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Currently we have EncoderTrainer implements PreprocessingTrainer and BinarizationTrainer implements PreprocessingTrainer It will helps with raw types in OneVsRest or in Pipeline and CV processes -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9497) [ML] Add Pipeline support to Cross-Validation process
Aleksey Zinoviev created IGNITE-9497: Summary: [ML] Add Pipeline support to Cross-Validation process Key: IGNITE-9497 URL: https://issues.apache.org/jira/browse/IGNITE-9497 Project: Ignite Issue Type: New Feature Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Change API of ParamGrid.addHyperParam to support meta-information about Pipeline Stage Add to Cross-Validation method to support evaluate the whole Pipeline Process and inject hyper-parameters from the ParamGrid -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9482) [ML] Refactor all trainers' settters to withFieldName format for meta-algorithms
Aleksey Zinoviev created IGNITE-9482: Summary: [ML] Refactor all trainers' settters to withFieldName format for meta-algorithms Key: IGNITE-9482 URL: https://issues.apache.org/jira/browse/IGNITE-9482 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.7 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.7 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9463) [ML] Update ML tutorial with new model composition/update features
Aleksey Zinoviev created IGNITE-9463: Summary: [ML] Update ML tutorial with new model composition/update features Key: IGNITE-9463 URL: https://issues.apache.org/jira/browse/IGNITE-9463 Project: Ignite Issue Type: New Feature Components: ml Affects Versions: 2.7 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.7 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9393) [ML] KMeans fails on complex data in cache
Aleksey Zinoviev created IGNITE-9393: Summary: [ML] KMeans fails on complex data in cache Key: IGNITE-9393 URL: https://issues.apache.org/jira/browse/IGNITE-9393 Project: Ignite Issue Type: Bug Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Described here http://apache-ignite-users.70518.x6.nabble.com/NPE-exception-in-KMeansTrainer-td23504.html#a23512 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9336) [ML] ANN/SVM Trainer tests produce unpredictable results due to random data generation
Aleksey Zinoviev created IGNITE-9336: Summary: [ML] ANN/SVM Trainer tests produce unpredictable results due to random data generation Key: IGNITE-9336 URL: https://issues.apache.org/jira/browse/IGNITE-9336 Project: Ignite Issue Type: Bug Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Remove random data generation and add static dataset into tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9285) [ML] Add MaxAbsScaler as a preprocessing stage
Aleksey Zinoviev created IGNITE-9285: Summary: [ML] Add MaxAbsScaler as a preprocessing stage Key: IGNITE-9285 URL: https://issues.apache.org/jira/browse/IGNITE-9285 Project: Ignite Issue Type: Sub-task Components: ml Reporter: Aleksey Zinoviev Add analogue of [http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MaxAbsScaler.html#sklearn.preprocessing.MaxAbsScaler] Please look at the MinMaxScaler or Normalization packages in preprocessing package. Add classes if required 1) Preprocessor 2) Trainer 3) custom PartitionData if shuffling is a step of algorithm -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9284) [ML] Add a Standard Scaler
Aleksey Zinoviev created IGNITE-9284: Summary: [ML] Add a Standard Scaler Key: IGNITE-9284 URL: https://issues.apache.org/jira/browse/IGNITE-9284 Project: Ignite Issue Type: Sub-task Reporter: Aleksey Zinoviev Add analogue of [http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html] Please look at the MinMaxScaler or Normalization packages in preprocessing package. Add classes if required 1) Preprocessor 2) Trainer 3) custom PartitionData if shuffling is a step of algorithm -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9283) [ML] Add Discrete Cosine preprocessor
Aleksey Zinoviev created IGNITE-9283: Summary: [ML] Add Discrete Cosine preprocessor Key: IGNITE-9283 URL: https://issues.apache.org/jira/browse/IGNITE-9283 Project: Ignite Issue Type: Sub-task Reporter: Aleksey Zinoviev Add [https://en.wikipedia.org/wiki/Discrete_cosine_transform] Please look at the MinMaxScaler or Normalization packages in preprocessing package. Add classes if required 1) Preprocessor 2) Trainer 3) custom PartitionData if shuffling is a step of algorithm -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9282) [ML] Add Naive Bayes classifier
Aleksey Zinoviev created IGNITE-9282: Summary: [ML] Add Naive Bayes classifier Key: IGNITE-9282 URL: https://issues.apache.org/jira/browse/IGNITE-9282 Project: Ignite Issue Type: Sub-task Components: ml Reporter: Aleksey Zinoviev Naive Bayes classifiers are a family of simple probabilistic classifiers based on applying Bayes' theorem with strong (naive) independence assumptions between the features. So we want to add this algorithm to Apache Ignite ML module. Ideally, implementation should support both multinomial naive Bayes and Bernoulli naive Bayes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9281) [ML] Starter ML tasks
Aleksey Zinoviev created IGNITE-9281: Summary: [ML] Starter ML tasks Key: IGNITE-9281 URL: https://issues.apache.org/jira/browse/IGNITE-9281 Project: Ignite Issue Type: Wish Components: ml Reporter: Aleksey Zinoviev Fix For: None This ticket is an umbrella ticket for ML starter tasks. Please, contact [~zaleslaw] to assign and get help with one of this tasks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9261) [ML] Add ANN algorithm based on ACD concept
Aleksey Zinoviev created IGNITE-9261: Summary: [ML] Add ANN algorithm based on ACD concept Key: IGNITE-9261 URL: https://issues.apache.org/jira/browse/IGNITE-9261 Project: Ignite Issue Type: New Feature Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev The ACD concept is implemented via centroids searching with KMeans help. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9239) [ML] KMeansTrainer crashed if amount of possible clusters more than amount of partitions in dataset
Aleksey Zinoviev created IGNITE-9239: Summary: [ML] KMeansTrainer crashed if amount of possible clusters more than amount of partitions in dataset Key: IGNITE-9239 URL: https://issues.apache.org/jira/browse/IGNITE-9239 Project: Ignite Issue Type: Bug Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev How to reproduce? Set the K parameter in KMeans Trainer to 100, and run KMeansClusterization Example \ StackTrace is Exception in thread "KMeansClusterizationExample-#44" java.lang.RuntimeException: java.lang.IllegalArgumentException: bound must be positive at org.apache.ignite.ml.clustering.kmeans.KMeansTrainer.fit(KMeansTrainer.java:112) at org.apache.ignite.ml.clustering.kmeans.KMeansTrainer.fit(KMeansTrainer.java:46) at org.apache.ignite.ml.trainers.DatasetTrainer.fit(DatasetTrainer.java:68) at org.apache.ignite.examples.ml.clustering.KMeansClusterizationExample.lambda$main$0(KMeansClusterizationExample.java:60) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalArgumentException: bound must be positive at java.util.Random.nextInt(Random.java:388) at org.apache.ignite.ml.clustering.kmeans.KMeansTrainer.initClusterCentersRandomly(KMeansTrainer.java:193) at org.apache.ignite.ml.clustering.kmeans.KMeansTrainer.fit(KMeansTrainer.java:86) ... 4 more The possible solution : correct the mechanism of rndPnts computation in the row 180-190 in KMeansTrainer -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9145) [ML] Add different strategies to index labels in StringEncoderTrainer
Aleksey Zinoviev created IGNITE-9145: Summary: [ML] Add different strategies to index labels in StringEncoderTrainer Key: IGNITE-9145 URL: https://issues.apache.org/jira/browse/IGNITE-9145 Project: Ignite Issue Type: Improvement Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.7 The main idea to add a few strategies of indexing: sorting and so on. Currently it supports only one strategy (most popular with zero and less popular with the max index size). There are can be a few options * 'frequencyDesc': descending order by label frequency (most frequent label assigned 0) * 'frequencyAsc': ascending order by label frequency (least frequent label assigned 0) * 'alphabetDesc': descending alphabetical order * 'alphabetAsc': ascending alphabetical order Please, update the method **transformFrequenciesToEncodingValues and add the strategy as a parameter of trainer. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8542) [ML] Add OneVsRest Trainer to handle cases with multiple class labels in dataset
Aleksey Zinoviev created IGNITE-8542: Summary: [ML] Add OneVsRest Trainer to handle cases with multiple class labels in dataset Key: IGNITE-8542 URL: https://issues.apache.org/jira/browse/IGNITE-8542 Project: Ignite Issue Type: Improvement Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8511) [ML] Add support for Multi-Class Logistic Regression
Aleksey Zinoviev created IGNITE-8511: Summary: [ML] Add support for Multi-Class Logistic Regression Key: IGNITE-8511 URL: https://issues.apache.org/jira/browse/IGNITE-8511 Project: Ignite Issue Type: New Feature Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8451) [ML] Refactor Labeled Dataset: remove unused methods and fields
Aleksey Zinoviev created IGNITE-8451: Summary: [ML] Refactor Labeled Dataset: remove unused methods and fields Key: IGNITE-8451 URL: https://issues.apache.org/jira/browse/IGNITE-8451 Project: Ignite Issue Type: Improvement Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8450) [ML] Cleanup the ML package: remove unused vector/matrix classes
Aleksey Zinoviev created IGNITE-8450: Summary: [ML] Cleanup the ML package: remove unused vector/matrix classes Key: IGNITE-8450 URL: https://issues.apache.org/jira/browse/IGNITE-8450 Project: Ignite Issue Type: Improvement Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8410) [ML] Unify KNNClassification/KNNRegression Model Trainer .fit() signatures
Aleksey Zinoviev created IGNITE-8410: Summary: [ML] Unify KNNClassification/KNNRegression Model Trainer .fit() signatures Key: IGNITE-8410 URL: https://issues.apache.org/jira/browse/IGNITE-8410 Project: Ignite Issue Type: Improvement Components: ml Affects Versions: 2.6 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Make fit calls similar. Should refactor one of trainers and remove one signature. The possible solution to pass dataCache and ignite separately. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8403) [ML] Add Binary Logistic Regression based on partitioned datasets and MLP
Aleksey Zinoviev created IGNITE-8403: Summary: [ML] Add Binary Logistic Regression based on partitioned datasets and MLP Key: IGNITE-8403 URL: https://issues.apache.org/jira/browse/IGNITE-8403 Project: Ignite Issue Type: New Feature Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8399) Add documentation for kNN classification (release 2.5)
Aleksey Zinoviev created IGNITE-8399: Summary: Add documentation for kNN classification (release 2.5) Key: IGNITE-8399 URL: https://issues.apache.org/jira/browse/IGNITE-8399 Project: Ignite Issue Type: Improvement Components: documentation, ml Affects Versions: 2.5 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev In Apache Ignite 2.5 we have added a SVM Binary and Multi-class classification working on top of partition based dataset and now we need to update documentation for this feature. Add page [https://dash.readme.io/project/apacheignite/v2.4/docs/svm-25] Add page [https://dash.readme.io/project/apacheignite/v2.4/docs/svm-multi-class-classification-25] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8398) Update documentation for KMeans clustering (release 2.5)
Aleksey Zinoviev created IGNITE-8398: Summary: Update documentation for KMeans clustering (release 2.5) Key: IGNITE-8398 URL: https://issues.apache.org/jira/browse/IGNITE-8398 Project: Ignite Issue Type: Improvement Components: documentation, ml Affects Versions: 2.5 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev In Apache Ignite 2.5 we have changed a kMeans clustering and remove FuzzyCMeans working on top of partition based dataset and now we need to update documentation for this feature. Previous version: [https://dash.readme.io/project/apacheignite/v2.4/docs/k-means-clustering] update with New version: [https://dash.readme.io/project/apacheignite/v2.4/docs/k-means-clustering-25] IMPORTANT: Remove page [https://dash.readme.io/project/apacheignite/v2.4/docs/fuzzy-c-means-clustering] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8397) Update documentation for kNN regression (release 2.5)
Aleksey Zinoviev created IGNITE-8397: Summary: Update documentation for kNN regression (release 2.5) Key: IGNITE-8397 URL: https://issues.apache.org/jira/browse/IGNITE-8397 Project: Ignite Issue Type: Improvement Components: documentation, ml Affects Versions: 2.5 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev In Apache Ignite 2.5 we have changed a kNN regression working on top of partition based dataset and now we need to update documentation for this feature. Previous version: [https://dash.readme.io/project/apacheignite/v2.4/docs/knn-regression] update with New version: [https://dash.readme.io/project/apacheignite/v2.4/docs/k-nn-regression-25|http://example.com] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8396) Add documentation for kNN classification (release 2.5)
Aleksey Zinoviev created IGNITE-8396: Summary: Add documentation for kNN classification (release 2.5) Key: IGNITE-8396 URL: https://issues.apache.org/jira/browse/IGNITE-8396 Project: Ignite Issue Type: Improvement Components: documentation, ml Affects Versions: 2.5 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev In Apache Ignite 2.5 we have added a normalization preprocessor working on top of partition based dataset and now we need to add documentation for this feature. Previous version: https://dash.readme.io/project/apacheignite/v2.4/docs/knn-classification update with New version: https://dash.readme.io/project/apacheignite/v2.4/docs/k-nn-classification-25 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8250) Adopt Fuzzy CMeans to PartitionedDatasets
Aleksey Zinoviev created IGNITE-8250: Summary: Adopt Fuzzy CMeans to PartitionedDatasets Key: IGNITE-8250 URL: https://issues.apache.org/jira/browse/IGNITE-8250 Project: Ignite Issue Type: Improvement Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8170) [ML] Adopt KMeans example to the Partitioned Dataset
Aleksey Zinoviev created IGNITE-8170: Summary: [ML] Adopt KMeans example to the Partitioned Dataset Key: IGNITE-8170 URL: https://issues.apache.org/jira/browse/IGNITE-8170 Project: Ignite Issue Type: Sub-task Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8169) [ML] Implement Model-Trainer pair for KMeans based on Partitioned Dataset
Aleksey Zinoviev created IGNITE-8169: Summary: [ML] Implement Model-Trainer pair for KMeans based on Partitioned Dataset Key: IGNITE-8169 URL: https://issues.apache.org/jira/browse/IGNITE-8169 Project: Ignite Issue Type: Sub-task Components: ml Reporter: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8168) [ML] Add KMeans version for Partitioned Datasets
Aleksey Zinoviev created IGNITE-8168: Summary: [ML] Add KMeans version for Partitioned Datasets Key: IGNITE-8168 URL: https://issues.apache.org/jira/browse/IGNITE-8168 Project: Ignite Issue Type: Improvement Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8005) [ML] Adopt SVM Linear MultiClass Classification Example to the new Partitioned Dataset
Aleksey Zinoviev created IGNITE-8005: Summary: [ML] Adopt SVM Linear MultiClass Classification Example to the new Partitioned Dataset Key: IGNITE-8005 URL: https://issues.apache.org/jira/browse/IGNITE-8005 Project: Ignite Issue Type: Sub-task Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7938) [ML] Adopt KMeans to the new Partitioned Dataset
Aleksey Zinoviev created IGNITE-7938: Summary: [ML] Adopt KMeans to the new Partitioned Dataset Key: IGNITE-7938 URL: https://issues.apache.org/jira/browse/IGNITE-7938 Project: Ignite Issue Type: Improvement Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7932) [ML] Adopt SVM Linear Binary Classification Example to the new Partitioned Dataset
Aleksey Zinoviev created IGNITE-7932: Summary: [ML] Adopt SVM Linear Binary Classification Example to the new Partitioned Dataset Key: IGNITE-7932 URL: https://issues.apache.org/jira/browse/IGNITE-7932 Project: Ignite Issue Type: Sub-task Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7887) [ML] Adopt SVM Linear Multi-Class Classification Model and Trainer to the new Partitioned Dataset
Aleksey Zinoviev created IGNITE-7887: Summary: [ML] Adopt SVM Linear Multi-Class Classification Model and Trainer to the new Partitioned Dataset Key: IGNITE-7887 URL: https://issues.apache.org/jira/browse/IGNITE-7887 Project: Ignite Issue Type: Sub-task Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7876) [ML] Adopt SVM Linear Binary Classification Model and Trainer to the new Partitioned Dataset
Aleksey Zinoviev created IGNITE-7876: Summary: [ML] Adopt SVM Linear Binary Classification Model and Trainer to the new Partitioned Dataset Key: IGNITE-7876 URL: https://issues.apache.org/jira/browse/IGNITE-7876 Project: Ignite Issue Type: Sub-task Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7830) Adopt kNN model to the new Partitioned Dataset
Aleksey Zinoviev created IGNITE-7830: Summary: Adopt kNN model to the new Partitioned Dataset Key: IGNITE-7830 URL: https://issues.apache.org/jira/browse/IGNITE-7830 Project: Ignite Issue Type: Sub-task Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7828) Adopt yardstick tests for the new version of kNN regression algorithm
Aleksey Zinoviev created IGNITE-7828: Summary: Adopt yardstick tests for the new version of kNN regression algorithm Key: IGNITE-7828 URL: https://issues.apache.org/jira/browse/IGNITE-7828 Project: Ignite Issue Type: Sub-task Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7829) Adopt kNN regression example to the new Partitioned Dataset
Aleksey Zinoviev created IGNITE-7829: Summary: Adopt kNN regression example to the new Partitioned Dataset Key: IGNITE-7829 URL: https://issues.apache.org/jira/browse/IGNITE-7829 Project: Ignite Issue Type: Sub-task Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)