[IGNITE-12633] [Question] Purpose of creating a new instance of QueryCancelledException every time?

2020-05-31 Thread Ravil Galeyev
Hello Igniters,

I've started working on IGNITE-12633


If I understand the goal of the ticket right, QueryCancelledException
needs a constructor that accepts a cause exception.
@iseliverstov please, correct me if I'm wrong.

As the first step, I looked at `QueryCancelledException` usages and
found an interesting thing.

When an exception is cought, we check the cause
`if (X.cause(e, QueryCancelledException.class) != null)`

If the cause is QueryCancelledException, we throw or return a new
instance of QueryCancelledException: `return exceptionToResult(new
QueryCancelledException());`.
It happens i.e. in JdbcRequestHandler.java#L504


I'd like to know what the purpose of creating a new instance instead
of reusing the existing one?
`X.cause(e, T.class)` returns either null or T.

Therefore, we can write
```
QueryCancelledException cause = X.cause(e, QueryCancelledException.class);
if (cause != null) return exceptionToResult(cause);
```

I did it in my branch hereJdbcRequestHandler.java#L504

In this way, we won't lose the cause of QueryCancelledException.

What I'm misunderstanding?

Best regards,
Ravil


Re: Is ML module @IgniteExperimental?

2020-02-17 Thread Ravil Galeyev
Hi Team,

First of all, let me introduce myself. I’m Ravil, I contribute to the ML
module since 2018 and from time to time I make talks about it. (I..e data
science summit in Warsaw [1]).

So, Alexey made a huge effort to develop the ML module but he is not alone.
If you check the repo you will find other contributors.

Therefore the ML module is alive and is able to run and has the roadmap.
For me, it means that it’s not a raw project.

Regarding documentation, it’d like to mention the code is the best
documentation :)

We have examples for most algorithms [2]. But if it needed I’m ready to
help the community with documentation in English German Polish or Russain.


[1] https://dssconf.pl/

[2]
https://github.com/apache/ignite/tree/master/examples/src/main/java/org/apache/ignite/examples/ml

Best regards,

Ravil


On Mon, 17 Feb 2020 at 11:49, Alexey Zinoviev 
wrote:

> Hello, Igniters, and you, Nikolay.
>
> First of all, if you have real interest to the ML module and its state, I
> could make call with you and explain this.
>
>
> *As far as I know, for now, we have only 1 active contributor to this area
> -Alexey Zinoviev.*
> Currently, we have 2 active contributors, me and Ravil Galeeyev, a few
> newbies, another guys who started tensorflow and another modules and
> submodules don't visit the community for many months.
>
> *Is ML module production ready?*
> This release will be the first release, than ML is production ready and it
> totally my work.
>
>
> *Can someone related to the ML, please, give some examples of the CVE
> orissues that can be fixed only with removing a bunch of modules?*
> CVE is not the main reason to remove the "bunch of modules", but part of
> the story.
> The main reason, the modules are not work proper way, were experimental,
> never released as a production-ready, support old, outdated version, the
> external frameworks, like Tensorflow, move integration with ignite to the
> special repos, they are not finished, the code there is broken and couldn't
> be fixed, because and I have no power/C++ skills/permission to commit
> something to them and time to support this broken modules.
>
> Also broken TF module blocks the removal of IGFS.
>
> Found CVE were related to the dependencies related to hadoop/tf/parquet and
> so on.
>
> *Should we mark it with the @IgniteExperimental? *
> I don't know, we have no this RAW annotation a few weeks ago and I don't
> know how we should use it.
> It could be, if you finish the discussion about this annotation and the
> write docs about it and share it with me.
>
>
> * As far as I know, the ML module has no documentation. Is it correct? Dowe
> have plans to fix it?*
> The ML docs are here, on our Ignite documentation
> https://apacheignite.readme.io/docs/machine-learning
> of course, something could be wrong, 1.5 year we are not released Ignite
> Yes, I have plan to fix. Of course, after fixing all bugs in release branch
>
> *Should we move it to the ignite-extensions?*
> No we shouldn't, I don't want this, and have a lot of arguments and
> currently is not the time for discussion about this (they are too young and
> have now real infrastructure and release-cycle)
>
> P.S. Community, I understand that the removal of module looks strange, but
> we could understand that ML was a strange experiment without roadmap and
> this situation is finished.
> Now, I have roadmap (will be published later), newbie tickets, the ability
> to prepare correct docs, the understanding what could be used by clients
> and first of all, production-ready ML (it could be run on Ignite-cluster,
> really, it works.)
>
> If you, Igniters, believe, that I could be a good maintainer for ML module,
> please support me here in this thread
> If you think, that I do something wrong, OK, please write it too, I'll read
> carefully.
>
> I spent a few months to fix bugs in components, which were abandoned by
> their creators.
>
> My goal: Ignite should have light-weight, easy-integrated ML without
> strange and unfinished experiments which could be not maintained. It's a
> part of common movement in Ignite (removal of modules or moving them to the
> separate repos).
>
>
> пн, 17 февр. 2020 г. в 12:10, Nikolay Izhikov :
>
> > Hello, Igniters.
> >
> > Can someone bring some light on the state of the ML module in Ignite?
> > As far as I know, for now, we have only 1 active contributor to this
> area -
> > Alexey Zinoviev.
> > I see how whole modules come and go from the module - [1]
> >
> > Please, also note this quote:
> >
> > > Also as a result of good testing from both side (from me and Stepan) we
> > > found a lot of bugs and CVEs in hadoop related components that should
> be
> > > removed in release branch too.
> >
> > 0. As far as I know, the ML module has no documentation. Is it correct?
> Do
> > we have plans to fix it?
> >
> > 1. Can someone related to the ML, please, give some examples of the CVE
> or
> > issues that can be fixed only with removing a bunch of modules?
> >
> > 

[ML][IGNITE-12383][PR] Distances measures pull request

2020-02-09 Thread Ravil Galeyev
Hi team,

A week ago I submitted a pull-request
 for IGNITE-12383
 but it still
unreviewed.
I know that currently, you all are busy with the release but can somebody
take a look at it?

Dear
@zaleslaw @avplatonov
I mentioned you in the PR, but it looks like I did something wrong,
if you didn't receive notifications.

If you are busy now just let me know and I'll proceed with another task.

Best regards,
Ravil


[jira] [Created] (IGNITE-12105) [ML] Implement projection vector.

2019-08-26 Thread Ravil Galeyev (Jira)
Ravil Galeyev created IGNITE-12105:
--

 Summary: [ML] Implement projection vector.
 Key: IGNITE-12105
 URL: https://issues.apache.org/jira/browse/IGNITE-12105
 Project: Ignite
  Issue Type: Task
  Components: ml
Reporter: Ravil Galeyev
Assignee: Ravil Galeyev


According to the discussion in 
[PR|[https://github.com/apache/ignite/pull/6567]] it's nice to have a  vector 
projecting by some filter.

 

Such kind of preprocessor should be implemented.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[ML][DISCUSSION] Big Double problem

2019-06-10 Thread Ravil Galeyev
Hi Team,

I tried to run Ignite ML across the dataset with categorical features and
came across some problems.

My dataset is Mushrooms
 dataset from Kaggle.
There are only categorial features and categorical labels.

(so-called classification problem). My attempt you can find in my repo

.

My goal is to make a pipeline which takes raw string values, encodes them
to numbers, then train a model.

The first problem is the Vectorizer.

I started with DummyVectorizer but it supports only Double labels.

All other vectorizers have the same issue because all of them are inherited

from DefaultLabelVectorizer

where Double labels are hardcoded at the generic level.

I didn’t find an approach to work with only categorical data with standard
Ignite vectorizers. I wrote my own.

The second problem. EncoderTrainer (in my case STRING_ENCODER).

It doesn’t encode labels. The trainer just ignores labels. See
EncoderTrainer

.

Probably ignoring labels makes sense, but…

The third problem. ClassCastException.

There are “hidden” (for user) casts labels to Double in model trainers

i.e. SVMLinearClassificationTrainer
,
DiscreteNaiveBayesTrainer etc.

Feel free to use my regex \(Double\).*\.label\(\) to search other casts.

To sum up, I can say that there are assumptions that labels are numeric
values,

but if we solve a classification problem, labels can be whatever.

But I didn’t find an easy way to preprocess them.



If you have any question or need details, feel free to write to me.

Best regards,

Ravil


[ML] IGNITE-9978 Compound Naive Bayes Pull Request

2019-05-31 Thread Ravil Galeyev
Hi Team,

A week ago I submitted a pull-request
 for IGNITE-9978
 but it still unreviewed.
Can somebody take a look at it?

Dear
@zaleslaw @ybabak @avplatonov @dmitrievanthony
I mentioned you in the PR, but it looks like I did something wrong,
if you didn't receive notifications.

If you are busy now just let me know and I'll proceed with another task.

Best regards,
Ravil


Re: [ML] IGNITE-9282 Naive Bayes task split

2018-10-05 Thread Ravil Galeyev
Hi Team

so, can somebody look at PR https://github.com/apache/ignite/pull/4869 with
naive Bayes?

@zaleslaw, @ybabak do you have time?
if reviewing takes time, can I take some other task?
I.e. I can implement IGNITE-9284
 Standart scaler for a
while.
Because I've already implemented one preprocessor and know what to do.

On Mon, 1 Oct 2018 at 09:44 Alexey Zinoviev  wrote:

> Great, I support this idea, will help with review too
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>


Re: [ML] IGNITE-9282 Naive Bayes task split

2018-09-30 Thread Ravil Galeyev
Hi Yuriy,

I created new tickets for other Bayes classifiers
IGNITE-9745 <https://issues.apache.org/jira/browse/IGNITE-9745> Multinomial
Naive Bayes
IGNITE-9746 <https://issues.apache.org/jira/browse/IGNITE-9746> Complement
Naive Bayes
IGNITE-9747 <https://issues.apache.org/jira/browse/IGNITE-9747> Bernoulli
Naive Bayes

they are isolated and independent. That's why anybody can work on them.

Regards,
Ravil

On Sun, 30 Sep 2018 at 19:53 Yuriy Babak  wrote:

> Hi Ravil,
>
> I think this is a good idea. I prefer to have several small single-feature
> tickets instead of a big one with several features.
>
> I will start reviewing 9282 on this week. Also, I looking forward to seeing
> those new tickets.
>
> Regards,
> Yuriy
>
> вс, 30 сент. 2018 г. в 3:58, Ravil Galeyev :
>
> > Hi Team,
> > I work on implementing Naive Bayes classifiers.
> >
> > Withing IGNITE-9282 <https://issues.apache.org/jira/browse/IGNITE-9282>
> I
> > implemented a Gaussian Bayes and created a PR
> > https://github.com/apache/ignite/pull/4869
> >
> > But  there are already a lot of changes
> > That's' why I'd like to create separate tasks for
> > multinomial and Bernoulli Bayes classifiers and continue work.
> >
> > Any objections?
> >
> > Best regards,
> > Ravil
> >
>


[jira] [Created] (IGNITE-9747) [ML] Add Bernoulli Naive Bayes classifier

2018-09-30 Thread Ravil Galeyev (JIRA)
Ravil Galeyev created IGNITE-9747:
-

 Summary: [ML] Add Bernoulli Naive Bayes classifier
 Key: IGNITE-9747
 URL: https://issues.apache.org/jira/browse/IGNITE-9747
 Project: Ignite
  Issue Type: Sub-task
  Components: ml
Reporter: Ravil Galeyev


Naive Bayes classifiers are a family of simple probabilistic classifiers based 
on applying Bayes' theorem with strong (naive) independence assumptions between 
the features.

So we want to add this algorithm to Apache Ignite ML module.

[Bernoulli Naive 
Bayes|http://scikit-learn.org/stable/modules/naive_bayes.html#bernoulli-naive-bayes]
 implements the naive Bayes training and classification algorithms for data 
that is distributed according to multivariate Bernoulli distributions; i.e., 
there may be multiple features but each one is assumed to be a binary-valued 
(Bernoulli, boolean) variable.

Requirements for successful PR:
 # PartitionedDataset usage
 # Trainer-Model paradigm support
 # Tests for Model and for Trainer (and other stuff)
 # Example of usage with small, but famous dataset like IRIS, Titanic or House 
Prices
 # Javadocs/codestyle according guidelines



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9746) [ML] Add Complement Naive Bayes

2018-09-30 Thread Ravil Galeyev (JIRA)
Ravil Galeyev created IGNITE-9746:
-

 Summary: [ML] Add Complement Naive Bayes
 Key: IGNITE-9746
 URL: https://issues.apache.org/jira/browse/IGNITE-9746
 Project: Ignite
  Issue Type: Sub-task
  Components: ml
Reporter: Ravil Galeyev


Naive Bayes classifiers are a family of simple probabilistic classifiers based 
on applying Bayes' theorem with strong (naive) independence assumptions between 
the features.

So we want to add this algorithm to Apache Ignite ML module.

[Complement Naive 
Bayes|http://scikit-learn.org/stable/modules/naive_bayes.html#complement-naive-bayes]
 is an adaptation of the standard multinomial naive Bayes (MNB) algorithm that 
is particularly suited for imbalanced data sets.

Requirements for successful PR:
 # PartitionedDataset usage
 # Trainer-Model paradigm support
 # Tests for Model and for Trainer (and other stuff)
 # Example of usage with small, but famous dataset like IRIS, Titanic or House 
Prices
 # Javadocs/codestyle according guidelines



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9745) [ML] Add Multinomial Naive Bayes

2018-09-30 Thread Ravil Galeyev (JIRA)
Ravil Galeyev created IGNITE-9745:
-

 Summary: [ML] Add Multinomial Naive Bayes
 Key: IGNITE-9745
 URL: https://issues.apache.org/jira/browse/IGNITE-9745
 Project: Ignite
  Issue Type: Sub-task
  Components: ml
Reporter: Ravil Galeyev


Naive Bayes classifiers are a family of simple probabilistic classifiers based 
on applying Bayes' theorem with strong (naive) independence assumptions between 
the features.

So we want to add this algorithm to Apache Ignite ML module.

[Multinomial Naive 
Bayes|http://scikit-learn.org/stable/modules/naive_bayes.html#multinomial-naive-bayes]
  implements the naive Bayes algorithm for multinomially distributed data.

Requirements for successful PR:
 # PartitionedDataset usage
 # Trainer-Model paradigm support
 # Tests for Model and for Trainer (and other stuff)
 # Example of usage with a small, but a famous dataset like IRIS, Titanic or 
House Prices
 # Javadocs/codestyle according guidelines



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[ML] IGNITE-9282 Naive Bayes task split

2018-09-29 Thread Ravil Galeyev
Hi Team,
I work on implementing Naive Bayes classifiers.

Withing IGNITE-9282  I
implemented a Gaussian Bayes and created a PR
https://github.com/apache/ignite/pull/4869

But  there are already a lot of changes
That's' why I'd like to create separate tasks for
multinomial and Bernoulli Bayes classifiers and continue work.

Any objections?

Best regards,
Ravil


[ML] IGNITE-9282 task

2018-09-06 Thread Ravil Galeyev
Hi Team,

I've taken IGNITE-9282 
I'm going to implement Naive Bayes classifier.

Best regards,
Ravil


Ignite new contributor

2018-08-28 Thread Ravil Galeyev
Hi Team,

I'd like to join to Apache Ignite development.
Especially to the ML part.
Currently, I work on IGNITE-9285

and I'm going to continue work with subtasks from IGNITE-9281
 (ML starter tasks).

My Jira login is rgaleyev

Best regards,
Ravil