[jira] [Commented] (IGNITE-11942) IGFS and Hadoop Accelerator Discontinuation

2020-04-13 Thread Aleksey Zinoviev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-11942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082827#comment-17082827
 ] 

Aleksey Zinoviev commented on IGNITE-11942:
---

Yes, we could remove, no blockers now!




> IGFS and Hadoop Accelerator Discontinuation
> ---
>
> Key: IGNITE-11942
> URL: https://issues.apache.org/jira/browse/IGNITE-11942
> Project: Ignite
>  Issue Type: Task
>Reporter: Denis A. Magda
>Assignee: Anton Kalashnikov
>Priority: Blocker
> Fix For: 2.9
>
>
> The community has voted for the following decision:
> * IGFS and In-Memory Hadoop Accelerator components are to be discontinued and 
> no longer supported by the community 
> * The existing source code of IGFS and In-Memory Hadoop Accelerator is to be 
> removed from Ignite master. Before that, a special branch like 
> "ignite-igfs-and-hadoop-accelerator" to be forked off the master in order to 
> preserve the sources in Git history for those who might need it. 
> The voting thread:
> http://apache-ignite-developers.2346864.n4.nabble.com/VOTE-Complete-Discontinuation-of-IGFS-and-Hadoop-Accelerator-td42405.html
> Once the changes are made for Ignite 2.8, please contact Denis Magda to 
> update a public documentation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12849) Add New BinaryObject Vectorizer for SparseVectors and Integer Coordinates

2020-04-01 Thread Aleksey Zinoviev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073048#comment-17073048
 ] 

Aleksey Zinoviev commented on IGNITE-12849:
---

Great point for the collaboration, if it is possible, please share a test
project here, In this thread, I need time to dig in it.






> Add New BinaryObject Vectorizer for SparseVectors and Integer Coordinates
> -
>
> Key: IGNITE-12849
> URL: https://issues.apache.org/jira/browse/IGNITE-12849
> Project: Ignite
>  Issue Type: New Feature
>  Components: ml
>Affects Versions: 2.8
>Reporter: Glenn Wiebe
>Assignee: Alexey Zinoviev
>Priority: Minor
> Fix For: 2.9
>
>
> A. DenseVector-based BinaryObjectVectorizer
> When using existing caches as a source of Datasets, the 
> BinaryObjectVectorizer is used.
> The existing BinaryObjectVectorizer only supports the creation of a 
> SparseVector.
> The LUDecomposition utility that supports gaussian factorization for models 
> like GMM have a "Singularity indicator" for which a SparseVector and its null 
> handling will set a matrix column calculation to be zero/0.0 which is below 
> the minimum check value (1e-11) and thus indicate a matrix is not square. 
> This null handling of the SparseMatrix will restrict the use of some 
> algorithms like Gaussian Mixture Models where any Vector dimension that is 
> null will incorrectly signal that a matrix is not square.
> It would be great if we could:
> - Have a BinaryObjectVectorizer that uses a DenseMatrix to eliminate this 
> singularity trigger and enable use of GMM Trainer.
> B. CacheBasedDatasets not treated as Temporary Cache
> When using a cache-based dataset, the close() method destroys the Ignite 
> cache. This means that there is no ability to re-use the data loaded into 
> this dataset.
> It would be great if we could:
> - Not destroy the Ignite Cache holding the dataset on close (of one step in 
> an ML processing flow)
> - Allow for "attaching" to this prior, pre-calculated dataset in subsequent 
> use.
> C. Vector Visibility
> Vectors (unlike other value types, e.g. BinaryObjects) are not visible in 
> standard mechanisms, like the Ignite Web Console, where the toString() method 
> does not present any information about the embedded vector values.
> It would be great if we could:
> - have a Vector.toString() method implementation that presented some 
> information about what is actually in the Vector.
> I have implemented the above items and have used them at a customer where I 
> needed these capabilities (or at least it dramatically reduced the cost and 
> increased the value of the solution).
> It would be great if the community was supportive of this 
> expansion/improvement of the Ignite ML library.
> Thanks,
>   Glenn



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10539) [ML] Make 'with' methods consistent

2019-10-01 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-10539:
--
Affects Version/s: 2.9

> [ML] Make 'with' methods consistent
> ---
>
> Key: IGNITE-10539
> URL: https://issues.apache.org/jira/browse/IGNITE-10539
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Affects Versions: 2.9
>Reporter: Artem Malykh
>Assignee: Aleksey Zinoviev
>Priority: Major
>
> In some places we have 'with*' methods making inplace changes and returning 
> object itself (for example MLPTrainer::withLoss) while in other places we 
> have them creating new instances with corresponding parameter changed (for 
> example DatasetBuilder::withFilter, 
> DatasetBuilder::withUpstreamTrainsformer). This inconsistency makes user look 
> into javadoc each time and worsens overall API consistensy level. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IGNITE-10292) ML: Replace IGFS by model storage for TensorFlow

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev reassigned IGNITE-10292:
-

Assignee: Aleksey Zinoviev

> ML: Replace IGFS by model storage for TensorFlow
> 
>
> Key: IGNITE-10292
> URL: https://issues.apache.org/jira/browse/IGNITE-10292
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Affects Versions: 2.8
>Reporter: Anton Dmitriev
>Assignee: Aleksey Zinoviev
>Priority: Major
> Fix For: 2.8
>
>
> Currently we have a TensorFlow IGFS plugin that provides a file system 
> functionality (see 
> https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/ignite).
>  At the same time IGFS is deprecated and would be great to replace it by a 
> simple model storage based on cache.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12079) [ML][Umbrella] Add advanced preprocessing techniques

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-12079:
--
Priority: Major  (was: Blocker)

> [ML][Umbrella] Add advanced preprocessing techniques
> 
>
> Key: IGNITE-12079
> URL: https://issues.apache.org/jira/browse/IGNITE-12079
> Project: Ignite
>  Issue Type: New Feature
>  Components: ml
>Affects Versions: 2.8
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Major
> Fix For: 2.8
>
>
> *Main goal:*
> To reduce the gap between Apache Spark and Apache Ignite in preprocessing 
> operations. The reducing of the gap could help with loading Spark ML 
> Pipelines to Ignite ML.
>  
> Next steps:
>  # Add Frequency Encoder
>  # Add two Imputing Strategies (MIN, MAX, COUNT, MOST_FREQUENT, 
> LEAST_FREQUENT)
>  # Add RobustScaler (will be added in Spark 3.0)
>  # Add CountVectorizer
>  # Add FeatureHasher
>  # Add QuantileDiscretizer
>  # Add Locality Sensitive Hashing (LSH)
>  # Add LabelEncoder
>  # Add RevertStringIndexing
>  # Add multi-column preprocessor



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (IGNITE-12054) Upgrade Spark module to 2.4

2019-09-25 Thread Aleksey Zinoviev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937715#comment-16937715
 ] 

Aleksey Zinoviev edited comment on IGNITE-12054 at 9/25/19 1:37 PM:


I've started the R about 2.4 version, fixed a few minor changes and posted 
here a few thoughts about current problems here, in Jira comments.

 

An ExternalCatalog was refactored here 
[https://github.com/apache/spark/commit/f38ea00e83099a5ae8d3afdec2e896e43c2db612]
 and all listener properties were inherited in ExternalCatalogWithListener.

 

Nobody yet inherited from this class on Github, the known implementations are 
HiveExternalCatalog and MemoryExternalCatalog (both of them doesn't support 
listeners and events)

 

Also, people in Spark ML couldn't solve the same problem

[http://mail-archives.apache.org/mod_mbox/spark-issues/201812.mbox/%3cjira.13144856.1520975543000.147283.1544598241...@atlassian.jira%3E]

 

Also this question is considered in paper 
[https://www.waitingforcode.com/apache-spark-sql/writing-custom-external-catalog-listeners-apache-spark-sql/read]


was (Author: zaleslaw):
I've started the R about 2.4 version, fixed a few minor changes and posted 
here a few thoughts about current problems here, in Jira comments.

 

An ExternalCatalog was refactored here 
[https://github.com/apache/spark/commit/f38ea00e83099a5ae8d3afdec2e896e43c2db612]
 and all listener properties were inherited in ExternalCatalogWithListener.

 

Nobody yet inherited from this class on Github, the known implementations are 
HiveExternalCatalog and MemoryExternalCatalog (both of them doesn't support 
listeners and events)

 

Also, people in Spark ML couldn't solve the same problem

[http://mail-archives.apache.org/mod_mbox/spark-issues/201812.mbox/%3cjira.13144856.1520975543000.147283.1544598241...@atlassian.jira%3E]

> Upgrade Spark module to 2.4
> ---
>
> Key: IGNITE-12054
> URL: https://issues.apache.org/jira/browse/IGNITE-12054
> Project: Ignite
>  Issue Type: Task
>  Components: spark
>Affects Versions: 2.7.5
>Reporter: Denis Magda
>Assignee: Aleksey Zinoviev
>Priority: Blocker
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Users can't use APIs that are already available in Spark 2.4:
> https://stackoverflow.com/questions/57392143/persisting-spark-dataframe-to-ignite
> Let's upgrade Spark from 2.3 to 2.4 until we extract the Spark Integration as 
> a separate module that can support multiple Spark versions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12054) Upgrade Spark module to 2.4

2019-09-25 Thread Aleksey Zinoviev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937737#comment-16937737
 ] 

Aleksey Zinoviev commented on IGNITE-12054:
---

I've added a PR with compiled version (the previous issue was resolved)

[https://github.com/apache/ignite/pull/6909]

 

But a few example and tests are broken.

> Upgrade Spark module to 2.4
> ---
>
> Key: IGNITE-12054
> URL: https://issues.apache.org/jira/browse/IGNITE-12054
> Project: Ignite
>  Issue Type: Task
>  Components: spark
>Affects Versions: 2.7.5
>Reporter: Denis Magda
>Assignee: Aleksey Zinoviev
>Priority: Blocker
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Users can't use APIs that are already available in Spark 2.4:
> https://stackoverflow.com/questions/57392143/persisting-spark-dataframe-to-ignite
> Let's upgrade Spark from 2.3 to 2.4 until we extract the Spark Integration as 
> a separate module that can support multiple Spark versions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IGNITE-12054) Upgrade Spark module to 2.4

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev reassigned IGNITE-12054:
-

Assignee: Aleksey Zinoviev  (was: Nikolay Izhikov)

> Upgrade Spark module to 2.4
> ---
>
> Key: IGNITE-12054
> URL: https://issues.apache.org/jira/browse/IGNITE-12054
> Project: Ignite
>  Issue Type: Task
>  Components: spark
>Affects Versions: 2.7.5
>Reporter: Denis Magda
>Assignee: Aleksey Zinoviev
>Priority: Blocker
> Fix For: 2.8
>
>
> Users can't use APIs that are already available in Spark 2.4:
> https://stackoverflow.com/questions/57392143/persisting-spark-dataframe-to-ignite
> Let's upgrade Spark from 2.3 to 2.4 until we extract the Spark Integration as 
> a separate module that can support multiple Spark versions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (IGNITE-12054) Upgrade Spark module to 2.4

2019-09-25 Thread Aleksey Zinoviev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937715#comment-16937715
 ] 

Aleksey Zinoviev edited comment on IGNITE-12054 at 9/25/19 1:19 PM:


I've started the R about 2.4 version, fixed a few minor changes and posted 
here a few thoughts about current problems here, in Jira comments.

 

An ExternalCatalog was refactored here 
[https://github.com/apache/spark/commit/f38ea00e83099a5ae8d3afdec2e896e43c2db612]
 and all listener properties were inherited in ExternalCatalogWithListener.

 

Nobody yet inherited from this class on Github, the known implementations are 
HiveExternalCatalog and MemoryExternalCatalog (both of them doesn't support 
listeners and events)

 

Also, people in Spark ML couldn't solve the same problem

[http://mail-archives.apache.org/mod_mbox/spark-issues/201812.mbox/%3cjira.13144856.1520975543000.147283.1544598241...@atlassian.jira%3E]


was (Author: zaleslaw):
I've started the R about 2.4 version, fixed a few minor changes and posted 
here a few thoughts about current problems here, in Jira comments.

 

An ExternalCatalog was [link 
title|[https://github.com/apache/spark/commit/f38ea00e83099a5ae8d3afdec2e896e43c2db612]]
 and all listener properties were inherited in ExternalCatalogWithListener.

 

Nobody yet inherited from this class on Github, the known implementations are 
HiveExternalCatalog and MemoryExternalCatalog (both of them doesn't support 
listeners and events)

 

Also, people in Spark ML couldn't solve the same problem

[http://mail-archives.apache.org/mod_mbox/spark-issues/201812.mbox/%3cjira.13144856.1520975543000.147283.1544598241...@atlassian.jira%3E]

> Upgrade Spark module to 2.4
> ---
>
> Key: IGNITE-12054
> URL: https://issues.apache.org/jira/browse/IGNITE-12054
> Project: Ignite
>  Issue Type: Task
>  Components: spark
>Affects Versions: 2.7.5
>Reporter: Denis Magda
>Assignee: Nikolay Izhikov
>Priority: Blocker
> Fix For: 2.8
>
>
> Users can't use APIs that are already available in Spark 2.4:
> https://stackoverflow.com/questions/57392143/persisting-spark-dataframe-to-ignite
> Let's upgrade Spark from 2.3 to 2.4 until we extract the Spark Integration as 
> a separate module that can support multiple Spark versions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (IGNITE-12054) Upgrade Spark module to 2.4

2019-09-25 Thread Aleksey Zinoviev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937715#comment-16937715
 ] 

Aleksey Zinoviev edited comment on IGNITE-12054 at 9/25/19 1:18 PM:


I've started the R about 2.4 version, fixed a few minor changes and posted 
here a few thoughts about current problems here, in Jira comments.

 

An ExternalCatalog was [link 
title|[https://github.com/apache/spark/commit/f38ea00e83099a5ae8d3afdec2e896e43c2db612]]
 and all listener properties were inherited in ExternalCatalogWithListener.

 

Nobody yet inherited from this class on Github, the known implementations are 
HiveExternalCatalog and MemoryExternalCatalog (both of them doesn't support 
listeners and events)

 

Also, people in Spark ML couldn't solve the same problem

[http://mail-archives.apache.org/mod_mbox/spark-issues/201812.mbox/%3cjira.13144856.1520975543000.147283.1544598241...@atlassian.jira%3E]


was (Author: zaleslaw):
I've started the R about 2.4 version, fixed a few minor changes and posted 
here a few thoughts about current problems here, in Jira comments.

 

An ExternalCatalog was [refactored 
|[https://github.com/apache/spark/commit/f38ea00e83099a5ae8d3afdec2e896e43c2db612]]and
 all listener properties were inherited in ExternalCatalogWithListener.

 

Nobody yet inherited from this class on Github, the known implementations are 
HiveExternalCatalog and MemoryExternalCatalog (both of them doesn't support 
listeners and events)

 

Also, people in Spark ML couldn't solve the same problem

[http://mail-archives.apache.org/mod_mbox/spark-issues/201812.mbox/%3cjira.13144856.1520975543000.147283.1544598241...@atlassian.jira%3E]

> Upgrade Spark module to 2.4
> ---
>
> Key: IGNITE-12054
> URL: https://issues.apache.org/jira/browse/IGNITE-12054
> Project: Ignite
>  Issue Type: Task
>  Components: spark
>Affects Versions: 2.7.5
>Reporter: Denis Magda
>Assignee: Nikolay Izhikov
>Priority: Blocker
> Fix For: 2.8
>
>
> Users can't use APIs that are already available in Spark 2.4:
> https://stackoverflow.com/questions/57392143/persisting-spark-dataframe-to-ignite
> Let's upgrade Spark from 2.3 to 2.4 until we extract the Spark Integration as 
> a separate module that can support multiple Spark versions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (IGNITE-12054) Upgrade Spark module to 2.4

2019-09-25 Thread Aleksey Zinoviev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937715#comment-16937715
 ] 

Aleksey Zinoviev edited comment on IGNITE-12054 at 9/25/19 1:18 PM:


I've started the R about 2.4 version, fixed a few minor changes and posted 
here a few thoughts about current problems here, in Jira comments.

 

An ExternalCatalog was [refactored 
|[https://github.com/apache/spark/commit/f38ea00e83099a5ae8d3afdec2e896e43c2db612]]and
 all listener properties were inherited in ExternalCatalogWithListener.

 

Nobody yet inherited from this class on Github, the known implementations are 
HiveExternalCatalog and MemoryExternalCatalog (both of them doesn't support 
listeners and events)

 

Also, people in Spark ML couldn't solve the same problem

[http://mail-archives.apache.org/mod_mbox/spark-issues/201812.mbox/%3cjira.13144856.1520975543000.147283.1544598241...@atlassian.jira%3E]


was (Author: zaleslaw):
I've started the R about 2.4 version, fixed a few minor changes and posted 
here a few thoughts about current problems here, in Jira comments.

 

An ExternalCatalog was refactored and all listener properties were inherited in 
ExternalCatalogWithListener.

 

Nobody yet inherited from this class on Github, the known implementations are 
HiveExternalCatalog and MemoryExternalCatalog (both of them doesn't support 
listeners and events)

 

Also, people in Spark ML couldn't solve the same problem

[http://mail-archives.apache.org/mod_mbox/spark-issues/201812.mbox/%3cjira.13144856.1520975543000.147283.1544598241...@atlassian.jira%3E]

> Upgrade Spark module to 2.4
> ---
>
> Key: IGNITE-12054
> URL: https://issues.apache.org/jira/browse/IGNITE-12054
> Project: Ignite
>  Issue Type: Task
>  Components: spark
>Affects Versions: 2.7.5
>Reporter: Denis Magda
>Assignee: Nikolay Izhikov
>Priority: Blocker
> Fix For: 2.8
>
>
> Users can't use APIs that are already available in Spark 2.4:
> https://stackoverflow.com/questions/57392143/persisting-spark-dataframe-to-ignite
> Let's upgrade Spark from 2.3 to 2.4 until we extract the Spark Integration as 
> a separate module that can support multiple Spark versions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (IGNITE-12054) Upgrade Spark module to 2.4

2019-09-25 Thread Aleksey Zinoviev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937715#comment-16937715
 ] 

Aleksey Zinoviev edited comment on IGNITE-12054 at 9/25/19 1:16 PM:


I've started the R about 2.4 version, fixed a few minor changes and posted 
here a few thoughts about current problems here, in Jira comments.

 

An ExternalCatalog was refactored and all listener properties were inherited in 
ExternalCatalogWithListener.

 

Nobody yet inherited from this class on Github, the known implementations are 
HiveExternalCatalog and MemoryExternalCatalog (both of them doesn't support 
listeners and events)

 

Also, people in Spark ML couldn't solve the same problem

[http://mail-archives.apache.org/mod_mbox/spark-issues/201812.mbox/%3cjira.13144856.1520975543000.147283.1544598241...@atlassian.jira%3E]


was (Author: zaleslaw):
An ExternalCatalog was refactored and all listener properties were inherited in 
ExternalCatalogWithListener.

 

Nobody yet inherited from this class on Github, the known implementations are 
HiveExternalCatalog and MemoryExternalCatalog (both of them doesn't support 
listeners and events)

 

Also, people in Spark ML couldn't solve the same problem

[http://mail-archives.apache.org/mod_mbox/spark-issues/201812.mbox/%3cjira.13144856.1520975543000.147283.1544598241...@atlassian.jira%3E]

> Upgrade Spark module to 2.4
> ---
>
> Key: IGNITE-12054
> URL: https://issues.apache.org/jira/browse/IGNITE-12054
> Project: Ignite
>  Issue Type: Task
>  Components: spark
>Affects Versions: 2.7.5
>Reporter: Denis Magda
>Assignee: Nikolay Izhikov
>Priority: Blocker
> Fix For: 2.8
>
>
> Users can't use APIs that are already available in Spark 2.4:
> https://stackoverflow.com/questions/57392143/persisting-spark-dataframe-to-ignite
> Let's upgrade Spark from 2.3 to 2.4 until we extract the Spark Integration as 
> a separate module that can support multiple Spark versions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12054) Upgrade Spark module to 2.4

2019-09-25 Thread Aleksey Zinoviev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937715#comment-16937715
 ] 

Aleksey Zinoviev commented on IGNITE-12054:
---

An ExternalCatalog was refactored and all listener properties were inherited in 
ExternalCatalogWithListener.

 

Nobody yet inherited from this class on Github, the known implementations are 
HiveExternalCatalog and MemoryExternalCatalog (both of them doesn't support 
listeners and events)

 

Also, people in Spark ML couldn't solve the same problem

[http://mail-archives.apache.org/mod_mbox/spark-issues/201812.mbox/%3cjira.13144856.1520975543000.147283.1544598241...@atlassian.jira%3E]

> Upgrade Spark module to 2.4
> ---
>
> Key: IGNITE-12054
> URL: https://issues.apache.org/jira/browse/IGNITE-12054
> Project: Ignite
>  Issue Type: Task
>  Components: spark
>Affects Versions: 2.7.5
>Reporter: Denis Magda
>Assignee: Nikolay Izhikov
>Priority: Blocker
> Fix For: 2.8
>
>
> Users can't use APIs that are already available in Spark 2.4:
> https://stackoverflow.com/questions/57392143/persisting-spark-dataframe-to-ignite
> Let's upgrade Spark from 2.3 to 2.4 until we extract the Spark Integration as 
> a separate module that can support multiple Spark versions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IGNITE-10574) [ML] Design API for Ensemble Training

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev reassigned IGNITE-10574:
-

Assignee: Aleksey Zinoviev

> [ML] Design API for Ensemble Training
> -
>
> Key: IGNITE-10574
> URL: https://issues.apache.org/jira/browse/IGNITE-10574
> Project: Ignite
>  Issue Type: Sub-task
>  Components: ml
>Reporter: Yury Babak
>Assignee: Aleksey Zinoviev
>Priority: Major
>
> Currently, we have bagging and boosting. And for boosting we have the 
> separate trainer(GDBTrainer), but for bagging, we have the static method 
> inside TrainerTransformers class. We should choose what approach is better 
> for us.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IGNITE-10843) [ML] In stacking add filter on features kept.

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev reassigned IGNITE-10843:
-

Assignee: Aleksey Zinoviev

> [ML] In stacking add filter on features kept.
> -
>
> Key: IGNITE-10843
> URL: https://issues.apache.org/jira/browse/IGNITE-10843
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Artem Malykh
>Assignee: Aleksey Zinoviev
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IGNITE-10419) [ML] Move person dataset to SandboxMLCache class

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev reassigned IGNITE-10419:
-

Assignee: Aleksey Zinoviev

> [ML] Move person dataset to SandboxMLCache class
> 
>
> Key: IGNITE-10419
> URL: https://issues.apache.org/jira/browse/IGNITE-10419
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Yury Babak
>Assignee: Aleksey Zinoviev
>Priority: Major
>  Labels: examples
> Fix For: 2.8
>
>
> How we have duplicated code in examples, simple cache with several Person 
> records. We should move this cache creation code into SandboxMLCache class



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IGNITE-10539) [ML] Make 'with' methods consistent

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev reassigned IGNITE-10539:
-

Assignee: Aleksey Zinoviev

> [ML] Make 'with' methods consistent
> ---
>
> Key: IGNITE-10539
> URL: https://issues.apache.org/jira/browse/IGNITE-10539
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Artem Malykh
>Assignee: Aleksey Zinoviev
>Priority: Major
>
> In some places we have 'with*' methods making inplace changes and returning 
> object itself (for example MLPTrainer::withLoss) while in other places we 
> have them creating new instances with corresponding parameter changed (for 
> example DatasetBuilder::withFilter, 
> DatasetBuilder::withUpstreamTrainsformer). This inconsistency makes user look 
> into javadoc each time and worsens overall API consistensy level. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IGNITE-10527) [ML] DenseMatrix(double[] mtx, int rows) mixes args

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev reassigned IGNITE-10527:
-

Assignee: Aleksey Zinoviev

> [ML] DenseMatrix(double[] mtx, int rows) mixes args
> ---
>
> Key: IGNITE-10527
> URL: https://issues.apache.org/jira/browse/IGNITE-10527
> Project: Ignite
>  Issue Type: Bug
>  Components: ml
>Reporter: Artem Malykh
>Assignee: Aleksey Zinoviev
>Priority: Major
>
> this(mtx, StorageConstants.ROW_STORAGE_MODE, rows) -> 
> this(mtx, rows, StorageConstants.ROW_STORAGE_MODE);



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IGNITE-10481) [ML] Examples of stacking usage

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev reassigned IGNITE-10481:
-

Assignee: Aleksey Zinoviev

> [ML] Examples of stacking usage
> ---
>
> Key: IGNITE-10481
> URL: https://issues.apache.org/jira/browse/IGNITE-10481
> Project: Ignite
>  Issue Type: Sub-task
>  Components: ml
>Reporter: Yury Babak
>Assignee: Aleksey Zinoviev
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10532) [ML] Add Confusion Matrix for multi-class classification

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-10532:
--
Affects Version/s: (was: 2.8)

> [ML] Add Confusion Matrix for multi-class classification
> 
>
> Key: IGNITE-10532
> URL: https://issues.apache.org/jira/browse/IGNITE-10532
> Project: Ignite
>  Issue Type: Sub-task
>  Components: ml
>Affects Versions: 3.0
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Minor
> Fix For: 2.8
>
>
> Explore ability to integrate the OneVsRest with ConfusionMatrix calculation
> also it can be implemented only after MultiClassEvaluator (no ticket yet)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-8296) Move Spark Scala DataFrames code examples to correct directory and prefix with "Scalar" to follow convention used with other Scala examples

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-8296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-8296:
-
Affects Version/s: (was: 2.4)
   3.0

> Move Spark Scala DataFrames code examples to correct directory and prefix 
> with "Scalar" to follow convention used with other Scala examples 
> 
>
> Key: IGNITE-8296
> URL: https://issues.apache.org/jira/browse/IGNITE-8296
> Project: Ignite
>  Issue Type: Improvement
>  Components: spark
>Affects Versions: 3.0
>Reporter: Akmal Chaudhri
>Assignee: Aleksey Zinoviev
>Priority: Minor
>
> # The Spark Scala DataFrames code examples are in the wrong directory. They 
> should be moved to the correct directory structure.
>  # The Spark Scala DataFrames code examples should follow the naming 
> convention used for other Scala code examples and be prefixed with "Scalar".
> or move SparkScalar example and its logic to the appropriate folder in spark 
> folder



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10575) [ML] Add examples for ensemble training

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-10575:
--
Affects Version/s: 3.0

> [ML] Add examples for ensemble training
> ---
>
> Key: IGNITE-10575
> URL: https://issues.apache.org/jira/browse/IGNITE-10575
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Affects Versions: 3.0
>Reporter: Yury Babak
>Assignee: Aleksey Zinoviev
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10529) [ML][Umbrella] Add Confusion Matrix support for classification algorithms

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-10529:
--
Affects Version/s: (was: 2.8)
   3.0

> [ML][Umbrella] Add Confusion Matrix support for classification algorithms
> -
>
> Key: IGNITE-10529
> URL: https://issues.apache.org/jira/browse/IGNITE-10529
> Project: Ignite
>  Issue Type: New Feature
>  Components: ml
>Affects Versions: 3.0
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Minor
> Fix For: 2.8
>
>
> This is an umbrella ticket for Confusion Matrix Support



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10531) [ML] Refactor all examples to use Binary Confusion Matrix instead of calculations by hand

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-10531:
--
Affects Version/s: (was: 2.8)
   3.0

> [ML] Refactor all examples to use Binary Confusion Matrix instead of 
> calculations by hand
> -
>
> Key: IGNITE-10531
> URL: https://issues.apache.org/jira/browse/IGNITE-10531
> Project: Ignite
>  Issue Type: Sub-task
>  Components: ml
>Affects Versions: 3.0
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Minor
> Fix For: 2.8
>
>
> Change 
> // Build confusion matrix. See https://en.wikipedia.org/wiki/Confusion_matrix
> int[][] confusionMtx = \{{0, 0}, \{0, 0}};
> to usage of ConfusionMatrix



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10530) [ML] Add Confusion Matrix for Binary Classification

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-10530:
--
Affects Version/s: (was: 2.8)
   3.0

> [ML] Add Confusion Matrix for Binary Classification
> ---
>
> Key: IGNITE-10530
> URL: https://issues.apache.org/jira/browse/IGNITE-10530
> Project: Ignite
>  Issue Type: Sub-task
>  Components: ml
>Affects Versions: 3.0
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Minor
> Fix For: 2.8
>
>
> Add special class to build confusion matrix as a product of evaluation process



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-9463) [ML] Update ML tutorial with new model composition/update features

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-9463:
-
Affects Version/s: 2.8

> [ML] Update ML tutorial with new model composition/update features
> --
>
> Key: IGNITE-9463
> URL: https://issues.apache.org/jira/browse/IGNITE-9463
> Project: Ignite
>  Issue Type: New Feature
>  Components: ml
>Affects Versions: 2.8
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Major
> Fix For: 2.8
>
>
> # Add example #10 with model composition (DT and Logit)
>  # Add example #11 with online learning for DT
>  # Add example #12 for bagging
>  # Add example #13 for boosting



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10532) [ML] Add Confusion Matrix for multi-class classification

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-10532:
--
Affects Version/s: 3.0

> [ML] Add Confusion Matrix for multi-class classification
> 
>
> Key: IGNITE-10532
> URL: https://issues.apache.org/jira/browse/IGNITE-10532
> Project: Ignite
>  Issue Type: Sub-task
>  Components: ml
>Affects Versions: 3.0, 2.8
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Minor
> Fix For: 2.8
>
>
> Explore ability to integrate the OneVsRest with ConfusionMatrix calculation
> also it can be implemented only after MultiClassEvaluator (no ticket yet)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-9732) Add joins to Spark Dataframe examples

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-9732:
-
Affects Version/s: (was: 2.6)

> Add joins to Spark Dataframe examples
> -
>
> Key: IGNITE-9732
> URL: https://issues.apache.org/jira/browse/IGNITE-9732
> Project: Ignite
>  Issue Type: Improvement
>  Components: examples, spark
>Affects Versions: 2.8
>Reporter: Valentin Kulichenko
>Assignee: Aleksey Zinoviev
>Priority: Major
> Fix For: 2.8
>
>
> {{IgniteDataFrameExample}} creates two tables - {{city}} and {{person}}, but 
> only {{person}} is actually used. Need to add join examples.
> Would also be great to demonstrate the fact that optimization is working and 
> joins are executed in Ignite, not Spark (using {{explain()}}, maybe?).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-11723) IgniteSpark integration should support skipStore option for internal dataStreamer (IgniteRdd and Ignite DataFrame)

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-11723:
--
Affects Version/s: (was: 2.7)

> IgniteSpark integration should support skipStore option for internal 
> dataStreamer (IgniteRdd and Ignite DataFrame)
> --
>
> Key: IGNITE-11723
> URL: https://issues.apache.org/jira/browse/IGNITE-11723
> Project: Ignite
>  Issue Type: Improvement
>  Components: spark
>Affects Versions: 2.8
>Reporter: Andrey Aleksandrov
>Assignee: Aleksey Zinoviev
>Priority: Critical
> Fix For: 2.8
>
>
> At the moment this option can't be set. But this integrations could be used 
> for initial data loading also for the caches with cache stores 
> implementation. 
> With skipStore option, we could avoid write-through behavior during this 
> initial data loading.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-11724) IgniteSpark integration forget to close the IgniteContext and stops the client node in case if error during PairFunction logic

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-11724:
--
Affects Version/s: (was: 2.7)
   2.8

> IgniteSpark integration forget to close the IgniteContext and stops the 
> client node in case if error during PairFunction logic 
> ---
>
> Key: IGNITE-11724
> URL: https://issues.apache.org/jira/browse/IGNITE-11724
> Project: Ignite
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 2.8
>Reporter: Andrey Aleksandrov
>Assignee: Aleksey Zinoviev
>Priority: Major
> Fix For: 2.8
>
>
> Next code could hang in case if PairFunction logic will throw the exception:
> JavaPairRDD rdd_records = records.mapToPair(new MapFunction());
> JavaIgniteContext igniteContext = new 
> JavaIgniteContext<>(sparkCtx, configUrl);
> JavaIgniteRDD igniteRdd = igniteContext. Value>fromCache(cacheName);
> igniteRdd.savePairs(rdd_records);
> Looks like next internal code (saveValues method)should also close the 
> IgniteContext in case of an unexpected exception, not only data streamer:
>  try {
>     it.foreach(value ⇒ {
>          val key = affinityKeyFunc(value, node.orNull)
>           streamer.addData(key, value)
>        })
>     }
>     finally {
>         streamer.close()
>     }
>  })
> }



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-9732) Add joins to Spark Dataframe examples

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-9732:
-
Affects Version/s: 2.8

> Add joins to Spark Dataframe examples
> -
>
> Key: IGNITE-9732
> URL: https://issues.apache.org/jira/browse/IGNITE-9732
> Project: Ignite
>  Issue Type: Improvement
>  Components: examples, spark
>Affects Versions: 2.6, 2.8
>Reporter: Valentin Kulichenko
>Assignee: Aleksey Zinoviev
>Priority: Major
> Fix For: 2.8
>
>
> {{IgniteDataFrameExample}} creates two tables - {{city}} and {{person}}, but 
> only {{person}} is actually used. Need to add join examples.
> Would also be great to demonstrate the fact that optimization is working and 
> joins are executed in Ignite, not Spark (using {{explain()}}, maybe?).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-11723) IgniteSpark integration should support skipStore option for internal dataStreamer (IgniteRdd and Ignite DataFrame)

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-11723:
--
Affects Version/s: 2.8

> IgniteSpark integration should support skipStore option for internal 
> dataStreamer (IgniteRdd and Ignite DataFrame)
> --
>
> Key: IGNITE-11723
> URL: https://issues.apache.org/jira/browse/IGNITE-11723
> Project: Ignite
>  Issue Type: Improvement
>  Components: spark
>Affects Versions: 2.7, 2.8
>Reporter: Andrey Aleksandrov
>Assignee: Aleksey Zinoviev
>Priority: Critical
> Fix For: 2.8
>
>
> At the moment this option can't be set. But this integrations could be used 
> for initial data loading also for the caches with cache stores 
> implementation. 
> With skipStore option, we could avoid write-through behavior during this 
> initial data loading.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10869) [ML] Add MultiClass classification metrics

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-10869:
--
Affects Version/s: (was: 2.8)

> [ML] Add MultiClass classification metrics
> --
>
> Key: IGNITE-10869
> URL: https://issues.apache.org/jira/browse/IGNITE-10869
> Project: Ignite
>  Issue Type: Sub-task
>  Components: ml
>Affects Versions: 3.0
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Minor
> Fix For: 2.8
>
>
> Add ability to calculate multiple metrics (as binary metrics) for multiclass 
> classification
> It can be merged with OneVsRest approach



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10869) [ML] Add MultiClass classification metrics

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-10869:
--
Affects Version/s: 3.0

> [ML] Add MultiClass classification metrics
> --
>
> Key: IGNITE-10869
> URL: https://issues.apache.org/jira/browse/IGNITE-10869
> Project: Ignite
>  Issue Type: Sub-task
>  Components: ml
>Affects Versions: 3.0, 2.8
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Minor
> Fix For: 2.8
>
>
> Add ability to calculate multiple metrics (as binary metrics) for multiclass 
> classification
> It can be merged with OneVsRest approach



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10870) [ML] Add an example for KNN/LogReg and multi-class task full Iris dataset

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-10870:
--
Affects Version/s: (was: 2.8)
   3.0

> [ML] Add an example for KNN/LogReg and multi-class task full Iris dataset
> -
>
> Key: IGNITE-10870
> URL: https://issues.apache.org/jira/browse/IGNITE-10870
> Project: Ignite
>  Issue Type: Sub-task
>  Components: ml
>Affects Versions: 3.0
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Minor
> Fix For: 2.8
>
>
> Add a one or two examples for KNN/LogReg and Iris dataset with 3 classes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-6642) [Umbrella] Integration with PMML

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-6642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-6642:
-
Affects Version/s: 3.0

> [Umbrella] Integration with PMML
> 
>
> Key: IGNITE-6642
> URL: https://issues.apache.org/jira/browse/IGNITE-6642
> Project: Ignite
>  Issue Type: New Feature
>  Components: ml
>Affects Versions: 3.0
>Reporter: Yury Babak
>Assignee: Aleksey Zinoviev
>Priority: Minor
>
> PMML - Predictive Model Markup Language is XML based language which used in 
> SPARK MLlib and others platforms.
> Here some additional info about PMML:
> (i) http://dmg.org/pmml/v4-3/GeneralStructure.html
> (i) https://github.com/jpmml/jpmml-model



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-9283) [ML] Add Discrete Cosine preprocessor

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-9283:
-
Affects Version/s: 3.0

> [ML] Add Discrete Cosine preprocessor
> -
>
> Key: IGNITE-9283
> URL: https://issues.apache.org/jira/browse/IGNITE-9283
> Project: Ignite
>  Issue Type: Sub-task
>  Components: ml
>Affects Versions: 3.0
>Reporter: Aleksey Zinoviev
>Assignee: Ilya Lantukh
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Add [https://en.wikipedia.org/wiki/Discrete_cosine_transform]
> Please look at the MinMaxScaler or Normalization packages in preprocessing 
> package.
> Add classes if required
> 1) Preprocessor
> 2) Trainer
> 3) custom PartitionData if shuffling is a step of algorithm
>  
> Requirements for successful PR:
>  # PartitionedDataset usage
>  # Trainer-Model paradigm support
>  # Tests for Model and for Trainer (and other stuff)
>  # Example of usage with small, but famous dataset like IRIS, Titanic or 
> House Prices
>  # Javadocs/codestyle according guidelines
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-9746) [ML] Add Complement Naive Bayes

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-9746:
-
Affects Version/s: 3.0

> [ML] Add Complement Naive Bayes
> ---
>
> Key: IGNITE-9746
> URL: https://issues.apache.org/jira/browse/IGNITE-9746
> Project: Ignite
>  Issue Type: Sub-task
>  Components: ml
>Affects Versions: 3.0
>Reporter: Ravil Galeyev
>Priority: Major
>
> Naive Bayes classifiers are a family of simple probabilistic classifiers 
> based on applying Bayes' theorem with strong (naive) independence assumptions 
> between the features.
> So we want to add this algorithm to Apache Ignite ML module.
> [Complement Naive 
> Bayes|http://scikit-learn.org/stable/modules/naive_bayes.html#complement-naive-bayes]
>  is an adaptation of the standard multinomial naive Bayes (MNB) algorithm 
> that is particularly suited for imbalanced data sets.
> Requirements for successful PR:
>  # PartitionedDataset usage
>  # Trainer-Model paradigm support
>  # Tests for Model and for Trainer (and other stuff)
>  # Example of usage with small, but famous dataset like IRIS, Titanic or 
> House Prices
>  # Javadocs/codestyle according guidelines



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-9281) [ML] Starter ML tasks

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-9281:
-
Affects Version/s: 3.0

> [ML] Starter ML tasks
> -
>
> Key: IGNITE-9281
> URL: https://issues.apache.org/jira/browse/IGNITE-9281
> Project: Ignite
>  Issue Type: Wish
>  Components: ml
>Affects Versions: 3.0
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Minor
> Fix For: None
>
>
> This ticket is an umbrella ticket for ML starter tasks.
> Please, contact [~zaleslaw] to assign and get help with one of this tasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10438) [ML] DBSCAN

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-10438:
--
Affects Version/s: 3.0

> [ML] DBSCAN
> ---
>
> Key: IGNITE-10438
> URL: https://issues.apache.org/jira/browse/IGNITE-10438
> Project: Ignite
>  Issue Type: New Feature
>  Components: ml
>Affects Versions: 3.0
>Reporter: Yury Babak
>Assignee: Aleksey Zinoviev
>Priority: Trivial
>
> Density-based spatial clustering of applications with noise (DBSCAN)
> [wiki description|https://en.wikipedia.org/wiki/DBSCAN]
> We could test this algorithm on TWO_CLASSED_IRIS and IRIS (see 
> MLSandboxDatasets enum)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10804) [ML] Add ability to load LinReg model from Spark to Ignite via PMML

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-10804:
--
Affects Version/s: 3.0

> [ML] Add ability to load LinReg model from Spark to Ignite via PMML
> ---
>
> Key: IGNITE-10804
> URL: https://issues.apache.org/jira/browse/IGNITE-10804
> Project: Ignite
>  Issue Type: Sub-task
>  Components: ml
>Affects Versions: 3.0, 2.8
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Minor
>
> 1) Write simple ML pipeline for Spark
> 2) Convert to PMML model
> 3) Load to Ignite
> 4) Predict on Ignite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10804) [ML] Add ability to load LinReg model from Spark to Ignite via PMML

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-10804:
--
Affects Version/s: (was: 2.8)

> [ML] Add ability to load LinReg model from Spark to Ignite via PMML
> ---
>
> Key: IGNITE-10804
> URL: https://issues.apache.org/jira/browse/IGNITE-10804
> Project: Ignite
>  Issue Type: Sub-task
>  Components: ml
>Affects Versions: 3.0
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Minor
>
> 1) Write simple ML pipeline for Spark
> 2) Convert to PMML model
> 3) Load to Ignite
> 4) Predict on Ignite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10407) [ML][Umbrella] Add Multi-label multi-class classification trainer and model

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-10407:
--
Affects Version/s: 3.0

> [ML][Umbrella] Add Multi-label multi-class classification trainer and model
> ---
>
> Key: IGNITE-10407
> URL: https://issues.apache.org/jira/browse/IGNITE-10407
> Project: Ignite
>  Issue Type: New Feature
>  Components: ml
>Affects Versions: 3.0
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Minor
>
> Improve Ignite ML ability to work with tasks for multi-labeled 
> multi-classification
> It requiers
>  * extension of current API with models for Double prediction only
>  * addition of common OneVsRest Multi-labeled Multi-classification Model and 
> Trainer
>  * preparing apropriate datasets for example and testing
>  *



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10804) [ML] Add ability to load LinReg model from Spark to Ignite via PMML

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-10804:
--
Affects Version/s: 2.8

> [ML] Add ability to load LinReg model from Spark to Ignite via PMML
> ---
>
> Key: IGNITE-10804
> URL: https://issues.apache.org/jira/browse/IGNITE-10804
> Project: Ignite
>  Issue Type: Sub-task
>  Components: ml
>Affects Versions: 2.8
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Minor
>
> 1) Write simple ML pipeline for Spark
> 2) Convert to PMML model
> 3) Load to Ignite
> 4) Predict on Ignite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-7327) Add CSV loading to Labeled Dataset with Loader

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-7327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-7327:
-
Affects Version/s: 3.0

> Add CSV loading to Labeled Dataset with Loader 
> ---
>
> Key: IGNITE-7327
> URL: https://issues.apache.org/jira/browse/IGNITE-7327
> Project: Ignite
>  Issue Type: New Feature
>  Components: ml
>Affects Versions: 3.0
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Trivial
>
> Comment from [~dmitrievanthony]
> Lots of datasets (from Kaggle for example) are supplied in CSV format with 
> header line. In connection with it does it make sense to:
> Use some CSV parsing (it's a bit more complicated than just splitting by 
> comma)?
> Add ability to use first header line as a source for so called feature names?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-8250) Adopt Fuzzy CMeans to PartitionedDatasets

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-8250:
-
Affects Version/s: 3.0

> Adopt Fuzzy CMeans to PartitionedDatasets
> -
>
> Key: IGNITE-8250
> URL: https://issues.apache.org/jira/browse/IGNITE-8250
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Affects Versions: 3.0
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Trivial
>
> Add Model/Trainer, tests, example



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-7025) Implement different strategies to fill missed data in LabeledDataset during loading from file

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-7025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-7025:
-
Affects Version/s: 3.0

> Implement different strategies to fill missed data in LabeledDataset during 
> loading from file
> -
>
> Key: IGNITE-7025
> URL: https://issues.apache.org/jira/browse/IGNITE-7025
> Project: Ignite
>  Issue Type: Task
>  Components: ml
>Affects Versions: 3.0
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Trivial
>
> For example, it can be four strategies 
> * Fill missed value with zero or empty string or default value for 
> categorical features = ZERO
> * Fill missed value with mean on column/Requires an additional time to 
> calculate = MEAN
> * Fill missed value with mode on column. Requires an additional time to 
> calculate = MODE
> * Deletes observation with missed values. Transforms dataset and changes 
> indexing = DELETE



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-7328) Improve Labeled Dataset loading from txt file

2019-09-25 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-7328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-7328:
-
Affects Version/s: 3.0

> Improve Labeled Dataset loading from txt file
> -
>
> Key: IGNITE-7328
> URL: https://issues.apache.org/jira/browse/IGNITE-7328
> Project: Ignite
>  Issue Type: New Feature
>  Components: ml
>Affects Versions: 3.0
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Trivial
>
> 1. Wouldn't it be better to parse rows in-place (not to save them as strings 
> at first)? In current implementation we will be needed to keep a dataset in 
> memory twice and it might be a problem for big datasets.
> 2. What about the case when a dataset contains not only a numerical data? Do 
> we consider this case or for such purposes some other "DatasetLoader" will be 
> used?
> 3. Just an idea, in case we don't want to fall on bad data (99% of cases) 
> would be great to understand the quality of loaded dataset such as number of 
> missed rows/values.
> 4. Does a situation when a row doesn't contain required number of columns 
> should be considered as "bad data" and don't break parsing with 
> IndexOutOfBoundException?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-8296) Move Spark Scala DataFrames code examples to correct directory and prefix with "Scalar" to follow convention used with other Scala examples

2019-09-23 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-8296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-8296:
-
Description: 
# The Spark Scala DataFrames code examples are in the wrong directory. They 
should be moved to the correct directory structure.
 # The Spark Scala DataFrames code examples should follow the naming convention 
used for other Scala code examples and be prefixed with "Scalar".

or move SparkScalar example and its logic to the appropriate folder in spark 
folder

  was:
# The Spark Scala DataFrames code examples are in the wrong directory. They 
should be moved to the correct directory structure.
 # The Spark Scala DataFrames code examples should follow the naming convention 
used for other Scala code examples and be prefixed with "Scalar".


> Move Spark Scala DataFrames code examples to correct directory and prefix 
> with "Scalar" to follow convention used with other Scala examples 
> 
>
> Key: IGNITE-8296
> URL: https://issues.apache.org/jira/browse/IGNITE-8296
> Project: Ignite
>  Issue Type: Improvement
>  Components: spark
>Affects Versions: 2.4
>Reporter: Akmal Chaudhri
>Assignee: Aleksey Zinoviev
>Priority: Minor
>
> # The Spark Scala DataFrames code examples are in the wrong directory. They 
> should be moved to the correct directory structure.
>  # The Spark Scala DataFrames code examples should follow the naming 
> convention used for other Scala code examples and be prefixed with "Scalar".
> or move SparkScalar example and its logic to the appropriate folder in spark 
> folder



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-11723) IgniteSpark integration should support skipStore option for internal dataStreamer (IgniteRdd and Ignite DataFrame)

2019-09-23 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-11723:
--
Priority: Critical  (was: Major)

> IgniteSpark integration should support skipStore option for internal 
> dataStreamer (IgniteRdd and Ignite DataFrame)
> --
>
> Key: IGNITE-11723
> URL: https://issues.apache.org/jira/browse/IGNITE-11723
> Project: Ignite
>  Issue Type: Improvement
>  Components: spark
>Affects Versions: 2.7
>Reporter: Andrey Aleksandrov
>Assignee: Aleksey Zinoviev
>Priority: Critical
> Fix For: 2.8
>
>
> At the moment this option can't be set. But this integrations could be used 
> for initial data loading also for the caches with cache stores 
> implementation. 
> With skipStore option, we could avoid write-through behavior during this 
> initial data loading.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12218) [ML] Add support for Strings in Vectorizer

2019-09-23 Thread Aleksey Zinoviev (Jira)
Aleksey Zinoviev created IGNITE-12218:
-

 Summary: [ML] Add support  for Strings in Vectorizer
 Key: IGNITE-12218
 URL: https://issues.apache.org/jira/browse/IGNITE-12218
 Project: Ignite
  Issue Type: Sub-task
Affects Versions: 2.8
Reporter: Aleksey Zinoviev
Assignee: Aleksey Zinoviev
 Fix For: 2.8


Currently the signatures of vectorizers are limited, should extend for Strings 
support



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12217) [ML] Add support for label encoding

2019-09-23 Thread Aleksey Zinoviev (Jira)
Aleksey Zinoviev created IGNITE-12217:
-

 Summary: [ML] Add support for label encoding
 Key: IGNITE-12217
 URL: https://issues.apache.org/jira/browse/IGNITE-12217
 Project: Ignite
  Issue Type: Sub-task
Affects Versions: 2.8
Reporter: Aleksey Zinoviev
Assignee: Aleksey Zinoviev
 Fix For: 2.8


Support handling of training on Mushroom dataset

See part of the discussion: "My dataset is Mushrooms
<[https://www.kaggle.com/uciml/mushroom-classification]> dataset from Kaggle.
There are only categorial features and categorical labels."



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12216) [ML][Umbrella]Advanced support of categorical features

2019-09-23 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-12216:
--
Summary: [ML][Umbrella]Advanced support of categorical features  (was: 
[ML][Umbrella])

> [ML][Umbrella]Advanced support of categorical features
> --
>
> Key: IGNITE-12216
> URL: https://issues.apache.org/jira/browse/IGNITE-12216
> Project: Ignite
>  Issue Type: New Feature
>  Components: ml
>Affects Versions: 2.8
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Minor
> Fix For: 2.8
>
>
> Discussion here
> [http://apache-ignite-developers.2346864.n4.nabble.com/ML-DISCUSSION-Big-Double-problem-td42262.html#a42267]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12216) [ML][Umbrella]

2019-09-23 Thread Aleksey Zinoviev (Jira)
Aleksey Zinoviev created IGNITE-12216:
-

 Summary: [ML][Umbrella]
 Key: IGNITE-12216
 URL: https://issues.apache.org/jira/browse/IGNITE-12216
 Project: Ignite
  Issue Type: New Feature
  Components: ml
Affects Versions: 2.8
Reporter: Aleksey Zinoviev
Assignee: Aleksey Zinoviev
 Fix For: 2.8


Discussion here

[http://apache-ignite-developers.2346864.n4.nabble.com/ML-DISCUSSION-Big-Double-problem-td42262.html#a42267]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IGNITE-10865) [ML] [Umbrella] Integration with Spark ML

2019-09-20 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev resolved IGNITE-10865.
---
Resolution: Fixed

> [ML] [Umbrella] Integration with Spark ML
> -
>
> Key: IGNITE-10865
> URL: https://issues.apache.org/jira/browse/IGNITE-10865
> Project: Ignite
>  Issue Type: New Feature
>  Components: ml
>Affects Versions: 2.8
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Critical
> Fix For: 2.8
>
>
> Investigate how to load ML models from Spark



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IGNITE-11723) IgniteSpark integration should support skipStore option for internal dataStreamer (IgniteRdd and Ignite DataFrame)

2019-09-19 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev reassigned IGNITE-11723:
-

Assignee: Aleksey Zinoviev  (was: Andrey Aleksandrov)

> IgniteSpark integration should support skipStore option for internal 
> dataStreamer (IgniteRdd and Ignite DataFrame)
> --
>
> Key: IGNITE-11723
> URL: https://issues.apache.org/jira/browse/IGNITE-11723
> Project: Ignite
>  Issue Type: Improvement
>  Components: spark
>Affects Versions: 2.7
>Reporter: Andrey Aleksandrov
>Assignee: Aleksey Zinoviev
>Priority: Major
> Fix For: 2.8
>
>
> At the moment this option can't be set. But this integrations could be used 
> for initial data loading also for the caches with cache stores 
> implementation. 
> With skipStore option, we could avoid write-through behavior during this 
> initial data loading.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IGNITE-11724) IgniteSpark integration forget to close the IgniteContext and stops the client node in case if error during PairFunction logic

2019-09-19 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev reassigned IGNITE-11724:
-

Assignee: Aleksey Zinoviev

> IgniteSpark integration forget to close the IgniteContext and stops the 
> client node in case if error during PairFunction logic 
> ---
>
> Key: IGNITE-11724
> URL: https://issues.apache.org/jira/browse/IGNITE-11724
> Project: Ignite
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 2.7
>Reporter: Andrey Aleksandrov
>Assignee: Aleksey Zinoviev
>Priority: Major
> Fix For: 2.8
>
>
> Next code could hang in case if PairFunction logic will throw the exception:
> JavaPairRDD rdd_records = records.mapToPair(new MapFunction());
> JavaIgniteContext igniteContext = new 
> JavaIgniteContext<>(sparkCtx, configUrl);
> JavaIgniteRDD igniteRdd = igniteContext. Value>fromCache(cacheName);
> igniteRdd.savePairs(rdd_records);
> Looks like next internal code (saveValues method)should also close the 
> IgniteContext in case of an unexpected exception, not only data streamer:
>  try {
>     it.foreach(value ⇒ {
>          val key = affinityKeyFunc(value, node.orNull)
>           streamer.addData(key, value)
>        })
>     }
>     finally {
>         streamer.close()
>     }
>  })
> }



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IGNITE-12141) Ignite Spark Integration Support Schema on Table Write

2019-09-19 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev reassigned IGNITE-12141:
-

Assignee: Aleksey Zinoviev

> Ignite Spark Integration Support Schema on Table Write
> --
>
> Key: IGNITE-12141
> URL: https://issues.apache.org/jira/browse/IGNITE-12141
> Project: Ignite
>  Issue Type: Improvement
>  Components: spark
>Affects Versions: 2.7.5
>Reporter: Manoj G T
>Assignee: Aleksey Zinoviev
>Priority: Critical
> Fix For: 2.8
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Ignite 2.6 doesn't allow to create table on any schema other than Public 
> Schema and this is the reason for not supporting "OPTION_SCHEMA" during 
> Overwrite mode. Now that Ignite supports to create the table in any given 
> schema it will be great if we can incorporate the changes to support 
> "OPTION_SCHEMA" during Overwrite mode and make it available as part of next 
> Ignite release.
>  
> +Related Issue:+
> [https://stackoverflow.com/questions/57782033/apache-ignite-spark-integration-not-working-with-schema-name]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IGNITE-12159) Ignite spark doesn't support Alter Column syntax

2019-09-19 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev reassigned IGNITE-12159:
-

Assignee: Aleksey Zinoviev

> Ignite spark doesn't support Alter Column syntax
> 
>
> Key: IGNITE-12159
> URL: https://issues.apache.org/jira/browse/IGNITE-12159
> Project: Ignite
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 2.7.5
>Reporter: Andrey Aleksandrov
>Assignee: Aleksey Zinoviev
>Priority: Critical
> Fix For: 2.8
>
>
> Steps:
> 1)Start the server
>  2)Run next SQL commands
> CREATE TABLE person (id LONG, name VARCHAR(64), age LONG, city_id DOUBLE, 
> zip_code LONG, PRIMARY KEY (name)) WITH "backups=1"
>  ALTER TABLE person ADD COLUMN (first_name VARCHAR(64), last_name VARCHAR(64))
> 3)After that run next spark code:
>        String configPath = "client.xml";
>        
>     SparkConf sparkConf = new SparkConf()
>     .setMaster("local")
>     .setAppName("Example"); 
>       IgniteSparkSession.builder()
>     .appName("Spark Ignite catalog example")
>     .master("local")
>     .config("ignite.disableSparkSQLOptimization", true)
>     .igniteConfig(configPath)
>     .getOrCreate();
>   
>        Dataset df2 = igniteSession.sql("select * from person");   
>        df2.show();
> The result will contain only 5 columns from CREATE TABLE call.
> [http://apache-ignite-users.70518.x6.nabble.com/Altered-sql-table-adding-new-columns-does-not-reflect-in-Spark-shell-td29265.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10792) [ML] Add seed to test-train filter

2019-09-18 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-10792:
--
Issue Type: Bug  (was: Task)

> [ML] Add seed to test-train filter
> --
>
> Key: IGNITE-10792
> URL: https://issues.apache.org/jira/browse/IGNITE-10792
> Project: Ignite
>  Issue Type: Bug
>  Components: ml
>Affects Versions: 2.8
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Major
> Fix For: 2.8
>
>
> Need to reproduce results from test to test in second Evaluator test



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10792) [ML] Add seed to test-train filter

2019-09-18 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-10792:
--
Priority: Critical  (was: Major)

> [ML] Add seed to test-train filter
> --
>
> Key: IGNITE-10792
> URL: https://issues.apache.org/jira/browse/IGNITE-10792
> Project: Ignite
>  Issue Type: Bug
>  Components: ml
>Affects Versions: 2.8
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Critical
> Fix For: 2.8
>
>
> Need to reproduce results from test to test in second Evaluator test



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12157) [ML] Implement distributed ROC AUC computation

2019-09-18 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-12157:
--
Affects Version/s: 2.8

> [ML] Implement distributed ROC AUC computation
> --
>
> Key: IGNITE-12157
> URL: https://issues.apache.org/jira/browse/IGNITE-12157
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Affects Versions: 2.8
>Reporter: Alexey Platonov
>Assignee: Alexey Platonov
>Priority: Major
>
> Currently, we don't have valid ROC AUC computation at all. We should 
> implement it with predict proba interface in models. It is desirable that ROC 
> AUC computation will be distributed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12156) [ML] Add meta information to models

2019-09-18 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-12156:
--
Affects Version/s: 3.0

> [ML] Add meta information to models
> ---
>
> Key: IGNITE-12156
> URL: https://issues.apache.org/jira/browse/IGNITE-12156
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Affects Versions: 3.0
>Reporter: Alexey Platonov
>Assignee: Alexey Platonov
>Priority: Major
>
> Current models don't contain any information about their learning process or 
> a features meta-information or information about the type of model (for 
> example RegressionTree doesn't differ from ClassificationTree in the 
> interface). It leads to extra work in other APIs. For example, we cannot 
> define automatically set of metrics for a user-passed model in Evaluator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12155) [ML] Implementation of distributed estimator

2019-09-18 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-12155:
--
Affects Version/s: 2.8

> [ML] Implementation of distributed estimator
> 
>
> Key: IGNITE-12155
> URL: https://issues.apache.org/jira/browse/IGNITE-12155
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Affects Versions: 2.8
>Reporter: Alexey Platonov
>Assignee: Alexey Platonov
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, we don't have an ability to compute metrics in a distributed 
> manner and it leads to reading all data from partitions to a client using 
> Qursor. We should develop a framework for distributed computing of metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12180) [ML] Add support of the additional Imputing Strategies

2019-09-17 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-12180:
--
Description: Add support of the next Imputing Strategies: MIN, MAX, COUNT, 
LEAST_FREQUENT  (was: Add support of the next Imputing Strategies: MIN, MAX)
Summary: [ML] Add support of the additional Imputing Strategies  (was: 
[ML] Add support of the next Imputing Strategies: MIN, MAX)

> [ML] Add support of the additional Imputing Strategies
> --
>
> Key: IGNITE-12180
> URL: https://issues.apache.org/jira/browse/IGNITE-12180
> Project: Ignite
>  Issue Type: Sub-task
>  Components: ml
>Affects Versions: 2.8
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Major
> Fix For: 2.8
>
>
> Add support of the next Imputing Strategies: MIN, MAX, COUNT, LEAST_FREQUENT



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (IGNITE-12180) [ML] Add support of the next Imputing Strategies: MIN, MAX

2019-09-17 Thread Aleksey Zinoviev (Jira)
Aleksey Zinoviev created IGNITE-12180:
-

 Summary: [ML] Add support of the next Imputing Strategies: MIN, MAX
 Key: IGNITE-12180
 URL: https://issues.apache.org/jira/browse/IGNITE-12180
 Project: Ignite
  Issue Type: Sub-task
  Components: ml
Affects Versions: 2.8
Reporter: Aleksey Zinoviev
Assignee: Aleksey Zinoviev
 Fix For: 2.8


Add support of the next Imputing Strategies: MIN, MAX



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (IGNITE-12168) [ML] Flaky ML example tests

2019-09-16 Thread Aleksey Zinoviev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930475#comment-16930475
 ] 

Aleksey Zinoviev commented on IGNITE-12168:
---

[~Pavlukhin] Many thanks, waited for feedback from user

> [ML] Flaky ML example tests
> ---
>
> Key: IGNITE-12168
> URL: https://issues.apache.org/jira/browse/IGNITE-12168
> Project: Ignite
>  Issue Type: Bug
>  Components: ml
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Critical
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Discussed here 
> [http://apache-ignite-developers.2346864.n4.nabble.com/After-IGNITE-12148-the-Examples-suite-has-unstable-tests-td43469.html]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (IGNITE-12168) [ML] Flaky ML example tests

2019-09-13 Thread Aleksey Zinoviev (Jira)
Aleksey Zinoviev created IGNITE-12168:
-

 Summary: [ML] Flaky ML example tests
 Key: IGNITE-12168
 URL: https://issues.apache.org/jira/browse/IGNITE-12168
 Project: Ignite
  Issue Type: Bug
  Components: ml
Reporter: Aleksey Zinoviev
Assignee: Aleksey Zinoviev


Discussed here 
[http://apache-ignite-developers.2346864.n4.nabble.com/After-IGNITE-12148-the-Examples-suite-has-unstable-tests-td43469.html]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (IGNITE-12054) Upgrade Spark module to 2.4

2019-09-10 Thread Aleksey Zinoviev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926751#comment-16926751
 ] 

Aleksey Zinoviev commented on IGNITE-12054:
---

[~dmagda] [~NIzhikov] Let's discuss how to support both versions like 2.3 and 
2.4 (because a lot of people use both now). Maybe we could provide support of 
both versions depending on the outer parameter (SPARK_VERSION) implementing 
internally. I could try to investigate this feature, if nobody will start the 
work in the nearest future

> Upgrade Spark module to 2.4
> ---
>
> Key: IGNITE-12054
> URL: https://issues.apache.org/jira/browse/IGNITE-12054
> Project: Ignite
>  Issue Type: Task
>  Components: spark
>Affects Versions: 2.7.5
>Reporter: Denis Magda
>Assignee: Nikolay Izhikov
>Priority: Blocker
> Fix For: 2.8
>
>
> Users can't use APIs that are already available in Spark 2.4:
> https://stackoverflow.com/questions/57392143/persisting-spark-dataframe-to-ignite
> Let's upgrade Spark from 2.3 to 2.4 until we extract the Spark Integration as 
> a separate module that can support multiple Spark versions.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (IGNITE-9732) Add joins to Spark Dataframe examples

2019-09-10 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev reassigned IGNITE-9732:


Assignee: Aleksey Zinoviev

> Add joins to Spark Dataframe examples
> -
>
> Key: IGNITE-9732
> URL: https://issues.apache.org/jira/browse/IGNITE-9732
> Project: Ignite
>  Issue Type: Improvement
>  Components: examples, spark
>Affects Versions: 2.6
>Reporter: Valentin Kulichenko
>Assignee: Aleksey Zinoviev
>Priority: Major
> Fix For: 2.8
>
>
> {{IgniteDataFrameExample}} creates two tables - {{city}} and {{person}}, but 
> only {{person}} is actually used. Need to add join examples.
> Would also be great to demonstrate the fact that optimization is working and 
> joins are executed in Ignite, not Spark (using {{explain()}}, maybe?).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (IGNITE-8296) Move Spark Scala DataFrames code examples to correct directory and prefix with "Scalar" to follow convention used with other Scala examples

2019-09-10 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-8296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev reassigned IGNITE-8296:


Assignee: Aleksey Zinoviev

> Move Spark Scala DataFrames code examples to correct directory and prefix 
> with "Scalar" to follow convention used with other Scala examples 
> 
>
> Key: IGNITE-8296
> URL: https://issues.apache.org/jira/browse/IGNITE-8296
> Project: Ignite
>  Issue Type: Improvement
>  Components: spark
>Affects Versions: 2.4
>Reporter: Akmal Chaudhri
>Assignee: Aleksey Zinoviev
>Priority: Minor
>
> # The Spark Scala DataFrames code examples are in the wrong directory. They 
> should be moved to the correct directory structure.
>  # The Spark Scala DataFrames code examples should follow the naming 
> convention used for other Scala code examples and be prefixed with "Scalar".



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (IGNITE-9463) [ML] Update ML tutorial with new model composition/update features

2019-09-08 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-9463:
-
Priority: Major  (was: Trivial)

> [ML] Update ML tutorial with new model composition/update features
> --
>
> Key: IGNITE-9463
> URL: https://issues.apache.org/jira/browse/IGNITE-9463
> Project: Ignite
>  Issue Type: New Feature
>  Components: ml
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Major
> Fix For: 2.8
>
>
> # Add example #10 with model composition (DT and Logit)
>  # Add example #11 with online learning for DT
>  # Add example #12 for bagging
>  # Add example #13 for boosting



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (IGNITE-7025) Implement different strategies to fill missed data in LabeledDataset during loading from file

2019-09-08 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-7025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-7025:
-
Priority: Trivial  (was: Minor)

> Implement different strategies to fill missed data in LabeledDataset during 
> loading from file
> -
>
> Key: IGNITE-7025
> URL: https://issues.apache.org/jira/browse/IGNITE-7025
> Project: Ignite
>  Issue Type: Task
>  Components: ml
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Trivial
>
> For example, it can be four strategies 
> * Fill missed value with zero or empty string or default value for 
> categorical features = ZERO
> * Fill missed value with mean on column/Requires an additional time to 
> calculate = MEAN
> * Fill missed value with mode on column. Requires an additional time to 
> calculate = MODE
> * Deletes observation with missed values. Transforms dataset and changes 
> indexing = DELETE



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (IGNITE-9936) [ML] Make readable the models ouput in RandomForestClassificationExample

2019-09-08 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-9936:
-
Priority: Major  (was: Minor)

> [ML] Make readable the models ouput in RandomForestClassificationExample
> 
>
> Key: IGNITE-9936
> URL: https://issues.apache.org/jira/browse/IGNITE-9936
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Affects Versions: 2.8
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Major
> Fix For: 2.8
>
>
> The output is 
> >>> Trained model: Models composition [
>  aggregator = [OnMajorityPredictionsAggregator],
>  models = [
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@7d3d101b,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@30c8681,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@5cdec700,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@6d026701,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@78aa1f72,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@1f75a668,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@35399441,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@4b7dc788,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@6304101a,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@5170bcf4,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@2812b107,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@df6620a,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@4e31276e,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@1a72a540,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@27d5a580,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@198d6542,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@5e403b4a,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@5117dd67,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@5be49b60,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@2931522b,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@7674b62c,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@19e7a160,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@662706a7,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@45a4b042,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@16b2bb0c,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@327af41b,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@6cb6decd,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@c7045b9,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@f99f5e0,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@6aa61224,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@30bce90b,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@3e6f3f28,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@7e19ebf0,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@2474f125,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@7357a011,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@3406472c,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@5717c37,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@68f4865,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@4816278d,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@4eaf3684,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@40317ba2,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@3c01cfa1,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@45d2ade3,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@727eb8cb,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@39d9314d,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@b978d10,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@5b7a8434,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@5c45d770,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@2ce6c6ec,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@1bae316d,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@147a5d08,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@6676f6a0,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@7cbd9d24,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@1672fe87,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@5026735c,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@1b45c0e,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@11f0a5a1,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@10f7f7de,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@73a8da0f,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@50dfbc58,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@4416d64f,
>  org.apache.ignite.ml.tree.randomforest.data.TreeRoot@6bf08014,
>  

[jira] [Updated] (IGNITE-10792) [ML] Add seed to test-train filter

2019-09-08 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-10792:
--
Priority: Major  (was: Minor)

> [ML] Add seed to test-train filter
> --
>
> Key: IGNITE-10792
> URL: https://issues.apache.org/jira/browse/IGNITE-10792
> Project: Ignite
>  Issue Type: Task
>  Components: ml
>Affects Versions: 2.8
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Major
> Fix For: 2.8
>
>
> Need to reproduce results from test to test in second Evaluator test



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (IGNITE-10865) [ML] [Umbrella] Integration with Spark ML

2019-09-08 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-10865:
--
Priority: Critical  (was: Blocker)

> [ML] [Umbrella] Integration with Spark ML
> -
>
> Key: IGNITE-10865
> URL: https://issues.apache.org/jira/browse/IGNITE-10865
> Project: Ignite
>  Issue Type: New Feature
>  Components: ml
>Affects Versions: 2.8
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Critical
> Fix For: 2.8
>
>
> Investigate how to load ML models from Spark



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (IGNITE-12079) [ML][Umbrella] Add advanced preprocessing techniques

2019-09-08 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-12079:
--
Priority: Blocker  (was: Major)

> [ML][Umbrella] Add advanced preprocessing techniques
> 
>
> Key: IGNITE-12079
> URL: https://issues.apache.org/jira/browse/IGNITE-12079
> Project: Ignite
>  Issue Type: New Feature
>  Components: ml
>Affects Versions: 2.8
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Blocker
> Fix For: 2.8
>
>
> *Main goal:*
> To reduce the gap between Apache Spark and Apache Ignite in preprocessing 
> operations. The reducing of the gap could help with loading Spark ML 
> Pipelines to Ignite ML.
>  
> Next steps:
>  # Add Frequency Encoder
>  # Add two Imputing Strategies (MIN, MAX, COUNT, MOST_FREQUENT, 
> LEAST_FREQUENT)
>  # Add RobustScaler (will be added in Spark 3.0)
>  # Add CountVectorizer
>  # Add FeatureHasher
>  # Add QuantileDiscretizer
>  # Add Locality Sensitive Hashing (LSH)
>  # Add LabelEncoder
>  # Add RevertStringIndexing
>  # Add multi-column preprocessor



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (IGNITE-11295) [ML] Add readme file to SparkModelParser module

2019-09-08 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-11295:
--
Priority: Critical  (was: Major)

> [ML] Add readme file to SparkModelParser module
> ---
>
> Key: IGNITE-11295
> URL: https://issues.apache.org/jira/browse/IGNITE-11295
> Project: Ignite
>  Issue Type: Sub-task
>  Components: ml
>Affects Versions: 2.8
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Critical
> Fix For: 2.8
>
>
> This file should contain examples of usage and instruction how to use this 
> module



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (IGNITE-12148) [ML] Recommendation Engine

2019-09-08 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-12148:
--
Priority: Blocker  (was: Critical)

> [ML] Recommendation Engine
> --
>
> Key: IGNITE-12148
> URL: https://issues.apache.org/jira/browse/IGNITE-12148
> Project: Ignite
>  Issue Type: New Feature
>  Components: ml
>Affects Versions: 2.8
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Blocker
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The main idea - the provide the recommendation engine to build the 
> recommendation system over the Ignite cache and via SQL operators



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (IGNITE-12148) [ML] Recommendation Engine

2019-09-08 Thread Aleksey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-12148:
--
Priority: Critical  (was: Major)

> [ML] Recommendation Engine
> --
>
> Key: IGNITE-12148
> URL: https://issues.apache.org/jira/browse/IGNITE-12148
> Project: Ignite
>  Issue Type: New Feature
>  Components: ml
>Affects Versions: 2.8
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Critical
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The main idea - the provide the recommendation engine to build the 
> recommendation system over the Ignite cache and via SQL operators



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (IGNITE-9746) [ML] Add Complement Naive Bayes

2019-09-08 Thread Aleksey Zinoviev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-9746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925138#comment-16925138
 ] 

Aleksey Zinoviev commented on IGNITE-9746:
--

[~rgaleyev] Should we close this ticket as a fixed? Correct me if I'm wrong

> [ML] Add Complement Naive Bayes
> ---
>
> Key: IGNITE-9746
> URL: https://issues.apache.org/jira/browse/IGNITE-9746
> Project: Ignite
>  Issue Type: Sub-task
>  Components: ml
>Reporter: Ravil Galeyev
>Priority: Major
>
> Naive Bayes classifiers are a family of simple probabilistic classifiers 
> based on applying Bayes' theorem with strong (naive) independence assumptions 
> between the features.
> So we want to add this algorithm to Apache Ignite ML module.
> [Complement Naive 
> Bayes|http://scikit-learn.org/stable/modules/naive_bayes.html#complement-naive-bayes]
>  is an adaptation of the standard multinomial naive Bayes (MNB) algorithm 
> that is particularly suited for imbalanced data sets.
> Requirements for successful PR:
>  # PartitionedDataset usage
>  # Trainer-Model paradigm support
>  # Tests for Model and for Trainer (and other stuff)
>  # Example of usage with small, but famous dataset like IRIS, Titanic or 
> House Prices
>  # Javadocs/codestyle according guidelines



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (IGNITE-12148) [ML] Recommendation Engine

2019-09-06 Thread Aleksey Zinoviev (Jira)
Aleksey Zinoviev created IGNITE-12148:
-

 Summary: [ML] Recommendation Engine
 Key: IGNITE-12148
 URL: https://issues.apache.org/jira/browse/IGNITE-12148
 Project: Ignite
  Issue Type: New Feature
  Components: ml
Affects Versions: 2.8
Reporter: Aleksey Zinoviev
Assignee: Aleksey Zinoviev
 Fix For: 2.8


The main idea - the provide the recommendation engine to build the 
recommendation system over the Ignite cache and via SQL operators



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (IGNITE-10697) [ML] Add Frequency Encoding

2019-08-16 Thread Aleksey Zinoviev (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-10697:
--
Issue Type: Sub-task  (was: New Feature)
Parent: IGNITE-12079

> [ML] Add Frequency Encoding
> ---
>
> Key: IGNITE-10697
> URL: https://issues.apache.org/jira/browse/IGNITE-10697
> Project: Ignite
>  Issue Type: Sub-task
>  Components: ml
>Affects Versions: 2.8
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Trivial
> Fix For: 2.8
>
>
> Encode the values to a fraction of all the labels. Can work with linear 
> models if the frequency is correlated with the target value.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IGNITE-12079) [ML][Umbrella] Add advanced preprocessing techniques

2019-08-16 Thread Aleksey Zinoviev (JIRA)
Aleksey Zinoviev created IGNITE-12079:
-

 Summary: [ML][Umbrella] Add advanced preprocessing techniques
 Key: IGNITE-12079
 URL: https://issues.apache.org/jira/browse/IGNITE-12079
 Project: Ignite
  Issue Type: New Feature
  Components: ml
Affects Versions: 2.8
Reporter: Aleksey Zinoviev
Assignee: Aleksey Zinoviev
 Fix For: 2.8


*Main goal:*

To reduce the gap between Apache Spark and Apache Ignite in preprocessing 
operations. The reducing of the gap could help with loading Spark ML Pipelines 
to Ignite ML.

 

Next steps:
 # Add Frequency Encoder
 # Add two Imputing Strategies (MIN, MAX, COUNT, MOST_FREQUENT, LEAST_FREQUENT)
 # Add RobustScaler (will be added in Spark 3.0)
 # Add CountVectorizer
 # Add FeatureHasher
 # Add QuantileDiscretizer
 # Add Locality Sensitive Hashing (LSH)
 # Add LabelEncoder
 # Add RevertStringIndexing
 # Add multi-column preprocessor



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (IGNITE-7022) Use QuadTree for kNN performance

2019-08-16 Thread Aleksey Zinoviev (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-7022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev resolved IGNITE-7022.
--
Resolution: Won't Fix

> Use QuadTree for kNN performance
> 
>
> Key: IGNITE-7022
> URL: https://issues.apache.org/jira/browse/IGNITE-7022
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Minor
>
> Now, kNN implementation is not too fast. Its performance could be increased 
> with [https://en.wikipedia.org/wiki/Quadtree]
> Also, benchmarks should be provided too



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (IGNITE-10441) Fluent API refactoring.

2019-07-31 Thread Aleksey Zinoviev (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16897038#comment-16897038
 ] 

Aleksey Zinoviev commented on IGNITE-10441:
---

Great thing after year of using. [~amalykh] could you share more wisdom about 
this approach (paper, book, link?)

> Fluent API refactoring.
> ---
>
> Key: IGNITE-10441
> URL: https://issues.apache.org/jira/browse/IGNITE-10441
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Artem Malykh
>Assignee: Artem Malykh
>Priority: Major
>
> In many classes we have fluent API ("with*" methods). We have following 
> problem: these methods should return exactly instance of it's own class 
> (otherwise we'll have problems with subclasses, more precisely, if with 
> method is declared in class A and we have class B extending A, with method 
> (if we do not override it) will return A). Currently we opted to override 
> "with" methods in subclasses. There is one solution which is probably more 
> elegant, but involves relatively complex generics construction which reduces 
> readability:
>  
> {code:java}
> class A> {
>   Self withX(X x) {
> this.x = x;
>  
> return (Self)this;
>   }
> class B> extends A {
>// No need to override "withX" here
>Self withY(Y y) {
>  this.y = y;
>  
>  return(Self)this;
>}
> }
> class C> extends B {
>// No need to override "withX" and "withY" methods here.
> }
> //... etc
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (IGNITE-9283) [ML] Add Discrete Cosine preprocessor

2019-07-31 Thread Aleksey Zinoviev (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16897035#comment-16897035
 ] 

Aleksey Zinoviev commented on IGNITE-9283:
--

[~ilantukh] Great, will make the review on the next week and leave the comments 
on github. thank you

> [ML] Add Discrete Cosine preprocessor
> -
>
> Key: IGNITE-9283
> URL: https://issues.apache.org/jira/browse/IGNITE-9283
> Project: Ignite
>  Issue Type: Sub-task
>  Components: ml
>Reporter: Aleksey Zinoviev
>Assignee: Ilya Lantukh
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add [https://en.wikipedia.org/wiki/Discrete_cosine_transform]
> Please look at the MinMaxScaler or Normalization packages in preprocessing 
> package.
> Add classes if required
> 1) Preprocessor
> 2) Trainer
> 3) custom PartitionData if shuffling is a step of algorithm
>  
> Requirements for successful PR:
>  # PartitionedDataset usage
>  # Trainer-Model paradigm support
>  # Tests for Model and for Trainer (and other stuff)
>  # Example of usage with small, but famous dataset like IRIS, Titanic or 
> House Prices
>  # Javadocs/codestyle according guidelines
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Issue Comment Deleted] (IGNITE-9633) [ML] Hyper-parameter tuning via Genetic Algorithm

2019-07-30 Thread Aleksey Zinoviev (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-9633:
-
Comment: was deleted

(was: 
https://ci.ignite.apache.org/viewLog.html?buildId=4432443=IgniteTests24Java8_RunMl)

> [ML] Hyper-parameter tuning via Genetic Algorithm
> -
>
> Key: IGNITE-9633
> URL: https://issues.apache.org/jira/browse/IGNITE-9633
> Project: Ignite
>  Issue Type: New Feature
>  Components: ml
>Affects Versions: 2.8
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Minor
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Umbrella ticket for all hyperparameter tuning improvements



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (IGNITE-9633) [ML] Hyper-parameter tuning via Genetic Algorithm

2019-07-30 Thread Aleksey Zinoviev (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896344#comment-16896344
 ] 

Aleksey Zinoviev commented on IGNITE-9633:
--

https://ci.ignite.apache.org/viewLog.html?buildId=4432443=IgniteTests24Java8_RunMl

> [ML] Hyper-parameter tuning via Genetic Algorithm
> -
>
> Key: IGNITE-9633
> URL: https://issues.apache.org/jira/browse/IGNITE-9633
> Project: Ignite
>  Issue Type: New Feature
>  Components: ml
>Affects Versions: 2.8
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Minor
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Umbrella ticket for all hyperparameter tuning improvements



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (IGNITE-9634) [ML] Trainers as pipeline parameters that can be varied

2019-07-23 Thread Aleksey Zinoviev (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev resolved IGNITE-9634.
--
Resolution: Won't Fix

> [ML] Trainers as pipeline parameters that can be varied
> ---
>
> Key: IGNITE-9634
> URL: https://issues.apache.org/jira/browse/IGNITE-9634
> Project: Ignite
>  Issue Type: Sub-task
>  Components: ml
>Affects Versions: 2.8
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Major
> Fix For: 2.8
>
>
> Based 
> http://apache-ignite-developers.2346864.n4.nabble.com/ML-New-Feature-Trainers-as-pipeline-parameters-that-can-be-varied-td35132.html



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (IGNITE-9633) [ML] Hyper-parameter tuning via Genetic Algorithm

2019-07-23 Thread Aleksey Zinoviev (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-9633:
-
Summary: [ML] Hyper-parameter tuning via Genetic Algorithm  (was: [ML] 
Hyperparameter tuning improvements umbrella ticket)

> [ML] Hyper-parameter tuning via Genetic Algorithm
> -
>
> Key: IGNITE-9633
> URL: https://issues.apache.org/jira/browse/IGNITE-9633
> Project: Ignite
>  Issue Type: New Feature
>  Components: ml
>Affects Versions: 2.8
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Minor
> Fix For: 2.8
>
>
> Umbrella ticket for all hyperparameter tuning improvements



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (IGNITE-9283) [ML] Add Discrete Cosine preprocessor

2019-07-18 Thread Aleksey Zinoviev (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887951#comment-16887951
 ] 

Aleksey Zinoviev commented on IGNITE-9283:
--

[~ilantukh] Happy to hear that, mention me to review it when PR will be prepared

> [ML] Add Discrete Cosine preprocessor
> -
>
> Key: IGNITE-9283
> URL: https://issues.apache.org/jira/browse/IGNITE-9283
> Project: Ignite
>  Issue Type: Sub-task
>  Components: ml
>Reporter: Aleksey Zinoviev
>Assignee: Ilya Lantukh
>Priority: Major
>
> Add [https://en.wikipedia.org/wiki/Discrete_cosine_transform]
> Please look at the MinMaxScaler or Normalization packages in preprocessing 
> package.
> Add classes if required
> 1) Preprocessor
> 2) Trainer
> 3) custom PartitionData if shuffling is a step of algorithm
>  
> Requirements for successful PR:
>  # PartitionedDataset usage
>  # Trainer-Model paradigm support
>  # Tests for Model and for Trainer (and other stuff)
>  # Example of usage with small, but famous dataset like IRIS, Titanic or 
> House Prices
>  # Javadocs/codestyle according guidelines
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (IGNITE-9283) [ML] Add Discrete Cosine preprocessor

2019-07-16 Thread Aleksey Zinoviev (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886097#comment-16886097
 ] 

Aleksey Zinoviev commented on IGNITE-9283:
--

[~ilantukh] are you going to make this task?

> [ML] Add Discrete Cosine preprocessor
> -
>
> Key: IGNITE-9283
> URL: https://issues.apache.org/jira/browse/IGNITE-9283
> Project: Ignite
>  Issue Type: Sub-task
>  Components: ml
>Reporter: Aleksey Zinoviev
>Assignee: Ilya Lantukh
>Priority: Major
>
> Add [https://en.wikipedia.org/wiki/Discrete_cosine_transform]
> Please look at the MinMaxScaler or Normalization packages in preprocessing 
> package.
> Add classes if required
> 1) Preprocessor
> 2) Trainer
> 3) custom PartitionData if shuffling is a step of algorithm
>  
> Requirements for successful PR:
>  # PartitionedDataset usage
>  # Trainer-Model paradigm support
>  # Tests for Model and for Trainer (and other stuff)
>  # Example of usage with small, but famous dataset like IRIS, Titanic or 
> House Prices
>  # Javadocs/codestyle according guidelines
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (IGNITE-10697) [ML] Add Frequency Encoding

2019-07-16 Thread Aleksey Zinoviev (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886093#comment-16886093
 ] 

Aleksey Zinoviev commented on IGNITE-10697:
---

Draft solution: [https://github.com/gridgain/apache-ignite/tree/ignite-10697]

> [ML] Add Frequency Encoding
> ---
>
> Key: IGNITE-10697
> URL: https://issues.apache.org/jira/browse/IGNITE-10697
> Project: Ignite
>  Issue Type: New Feature
>  Components: ml
>Affects Versions: 2.8
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Trivial
> Fix For: 2.8
>
>
> Encode the values to a fraction of all the labels. Can work with linear 
> models if the frequency is correlated with the target value.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (IGNITE-9513) [ML] Unify all preprocessors trainers' generics

2019-04-18 Thread Aleksey Zinoviev (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev resolved IGNITE-9513.
--
   Resolution: Resolved
Fix Version/s: 2.8

> [ML] Unify all preprocessors trainers' generics
> ---
>
> Key: IGNITE-9513
> URL: https://issues.apache.org/jira/browse/IGNITE-9513
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Aleksey Zinoviev
>Assignee: Aleksey Zinoviev
>Priority: Trivial
> Fix For: 2.8
>
>
> Currently we have
> EncoderTrainer implements PreprocessingTrainer
> and
> BinarizationTrainer implements PreprocessingTrainer Vector>
> It will helps with raw types in OneVsRest or in Pipeline and CV processes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IGNITE-11642) [ML] Umbrella: API for Feature/Label extracting (part 2)

2019-04-18 Thread Aleksey Zinoviev (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev resolved IGNITE-11642.
---
Resolution: Resolved

> [ML] Umbrella: API for Feature/Label extracting (part 2)
> 
>
> Key: IGNITE-11642
> URL: https://issues.apache.org/jira/browse/IGNITE-11642
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Alexey Platonov
>Assignee: Aleksey Zinoviev
>Priority: Critical
>  Labels: stability
> Fix For: 2.8
>
>
> Replace current lambdas with fixed API



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IGNITE-11582) [ML] Pipelines should work with Vectorizers

2019-04-18 Thread Aleksey Zinoviev (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev resolved IGNITE-11582.
---
Resolution: Fixed

> [ML] Pipelines should work with Vectorizers
> ---
>
> Key: IGNITE-11582
> URL: https://issues.apache.org/jira/browse/IGNITE-11582
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Alexey Platonov
>Assignee: Aleksey Zinoviev
>Priority: Major
>  Labels: stability
> Fix For: 2.8
>
>
> Currently Pipelines are implemented using feature/label extraction functions 
> with generic label value. We should adapt pipelines for vectorizers (maybe 
> after this ticket - IGNITE-11481).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11582) [ML] Pipelines should work with Vectorizers

2019-04-18 Thread Aleksey Zinoviev (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-11582:
--
Fix Version/s: 2.8

> [ML] Pipelines should work with Vectorizers
> ---
>
> Key: IGNITE-11582
> URL: https://issues.apache.org/jira/browse/IGNITE-11582
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Alexey Platonov
>Assignee: Aleksey Zinoviev
>Priority: Major
>  Labels: stability
> Fix For: 2.8
>
>
> Currently Pipelines are implemented using feature/label extraction functions 
> with generic label value. We should adapt pipelines for vectorizers (maybe 
> after this ticket - IGNITE-11481).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11581) [ML] Adapt tutorial to new vectorizer API

2019-04-18 Thread Aleksey Zinoviev (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev updated IGNITE-11581:
--
Fix Version/s: 2.8

> [ML] Adapt tutorial to new vectorizer API
> -
>
> Key: IGNITE-11581
> URL: https://issues.apache.org/jira/browse/IGNITE-11581
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Alexey Platonov
>Assignee: Aleksey Zinoviev
>Priority: Major
>  Labels: stability
> Fix For: 2.8
>
>
> Currently tutorial uses old feature-labels extraction API



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IGNITE-11580) [ML] Evaluators should accept Vectorizers

2019-04-18 Thread Aleksey Zinoviev (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Zinoviev resolved IGNITE-11580.
---
Resolution: Resolved

> [ML] Evaluators should accept Vectorizers
> -
>
> Key: IGNITE-11580
> URL: https://issues.apache.org/jira/browse/IGNITE-11580
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Alexey Platonov
>Assignee: Aleksey Zinoviev
>Priority: Major
>  Labels: stability
> Fix For: 2.8
>
>
> Currently evaluation API uses old interface with separated feature-label 
> extractors. In context of IGNITE-11449 we should change this API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   4   5   >