Re: [shogun] Undefined reference linking to libshogun 5.0

2017-03-27 Thread Fernando J . Iglesias García
Hello Jose,

Could you isolate the error in a minimal example and share the code? Thank
you!

Cheers,
Fernando.

On 27 March 2017 at 07:20, Jose Gomez  wrote:

> Hi all,
>
> I made a test C++ program using a mysql database.
> The compilation line I use:
>
> g++ -std=c++11 database.cpp whole_dataset.cpp -o whole_dataset -lshogun
> -lmysqlclient
>
> I get
>
> /tmp/ccCFhRAD.o: In function `shogun::CSGObject::operator=(shogun::CSGObject
> const&)':
> whole_dataset.cpp:(.text._ZN6shogun9CSGObjectaSERKS0_[_ZN6shogun9CSGObjectaSERKS0_]+0x27):
> undefined reference to `shogun::Unique ect::Self>::operator=(shogun::Unique const&)'
> collect2: error: ld returned 1 exit status
>
> I'm using libshogun17 and libshogun-dev stable packages from the stable
> ppa repository in ubuntu 14.04
>
> Am I missing some extra dependencies?
>
> Thanks in advance for your help.
>
> Greetings
>
> Jose Gomez
>
>


Re: [shogun] Undefined reference linking to libshogun 5.0

2017-03-27 Thread Fernando J . Iglesias García
Hello Jose,

I cannot reproduce the linker error in Ubuntu 16.04 using libshogun16 (the
package in the default Ubuntu repositories, without adding the ppa). If it
is possible for your application and these packages are also available in
Ubuntu 14.04, I suggest you try with those instead (i.e. without the ppa).

Still, your sample program is at least missing init_shogun() and
exit_shogun() which will lead to runtime errors.

Cheers,
Fernando.

On 27 March 2017 at 11:38, Jose Gomez <jose.gomez.lo...@gmail.com> wrote:

> Hello Fernando,
>
> You can find below the code I use without database access. I compiled it
> using
>
>  g++ -std=c++11 sample.cpp -o sample -lshogun
>
> I get the same error as I mentioned.
>
> Thanks for your interest.
>
> Greetings,
>
> Jose
>
> #include 
> #include 
> #include 
> #include 
> #include 
>
> using namespace shogun;
>
> int main(int argc, char **argv)
> {
> int iFolders=10;
> int i,k;
> int iRows=100;
> SGMatrix *psFeatures=new SGMatrix[iFolders];
> SGVector *psLabels=new SGVector[iFolders];
> CDenseFeatures *features=new CDenseFeatures[
> iFolders];
> CMulticlassLabels *labels=new CMulticlassLabels[iFolders];
> double **ppdFeatures= new double*[iFolders];
> double **ppdLabels= new double*[iFolders];
> CGaussianNaiveBayes *gnb=new CGaussianNaiveBayes[iFolders];
> double dAc=0;
> for (i=0; i<iFolders; i++)
> {
> ppdFeatures[i]=(double *)calloc(3*iRows, sizeof(double));
> ppdLabels[i]=(double *)calloc(iRows, sizeof(double));
> psFeatures[i]=SGMatrix(ppdFeatures[i], iRows, 3);
> psLabels[i]=SGVector(ppdLabels[i], iRows);
> features[i]=CDenseFeatures(psFeatures[i]);
> labels[i]=CMulticlassLabels(psLabels[i]);
> gnb[i] = CGaussianNaiveBayes(features+i, labels+i);
> gnb[i].train();
> }
>
> CMulticlassAccuracy eval = CMulticlassAccuracy();
> for (i=0, k=0; i<iFolders;i++)
> {
> for (int j=0; j<iFolders; j++)
> {
> if (i!=j)
> {
> auto labels_predict = gnb[j].apply_multiclass(features+i);
> dAc += eval.evaluate(labels_predict, labels+i);
> k++;
>
> }
> }
> }
>
> }
>
>
> On 03/27/2017 10:03 AM, Fernando J. Iglesias García wrote:
>
> Hello Jose,
>
> Could you isolate the error in a minimal example and share the code? Thank
> you!
>
> Cheers,
> Fernando.
>
> On 27 March 2017 at 07:20, Jose Gomez <jose.gomez.lo...@gmail.com> wrote:
>
>> Hi all,
>>
>> I made a test C++ program using a mysql database.
>> The compilation line I use:
>>
>> g++ -std=c++11 database.cpp whole_dataset.cpp -o whole_dataset -lshogun
>> -lmysqlclient
>>
>> I get
>>
>> /tmp/ccCFhRAD.o: In function `shogun::CSGObject::operator=(shogun::CSGObject
>> const&)':
>> whole_dataset.cpp:(.text._ZN6shogun9CSGObjectaSERKS0_[_ZN6shogun9CSGObjectaSERKS0_]+0x27):
>> undefined reference to `shogun::Unique> ect::Self>::operator=(shogun::Unique const&)'
>> collect2: error: ld returned 1 exit status
>>
>> I'm using libshogun17 and libshogun-dev stable packages from the stable
>> ppa repository in ubuntu 14.04
>>
>> Am I missing some extra dependencies?
>>
>> Thanks in advance for your help.
>>
>> Greetings
>>
>> Jose Gomez
>>
>>
>
>


Re: [shogun] Automatize Code Style Checks

2017-04-05 Thread Fernando J . Iglesias García
Automatising style checks is indeed something very nice to have.

I think it is worth to take into account that labelling a PR as failure
because of style failures might make the process of getting into
contributing to Shogun less friendly for newcomers.

On 5 April 2017 at 10:10, Viktor Gal  wrote:

> Hi Giovanni
>
> great idea! there’s already a clang-format file in my shogun folder for a
> while… but it was never finished as there was always something else to be
> worked on. lemme try to dig it up for you the formatting file - as far as
> i’ve got with it - and then we could finish that together? + add the format
> checking into the travis...
>
> cheers,
> viktor
>
>
> > On 5 Apr 2017, at 3:47 PM, Giovanni De Toni 
> wrote:
> >
> > Hello everybody,
> >
> > Since I noticed that many of the developers' comments on PRs are
> targeted to fix code formatting and style, I thought it would be very
> useful to have a way to reduce the time spent on these small issues (and
> therefore, gain more time to focus on the "working" code).
> >
> > There is a tool called `clang-format` that can be used to format a piece
> of code to follow predefined style guidelines. There is also a python
> script called `git-clang-format` that can be used to check if commits
> respect these guidelines and to produce diffs (with newly formatted code)
> that can be later applied with the `git apply` command.
> >
> > It is also possible to configure Travis to make a build fail if the code
> is not consistent with the coding conventions. For example, ROOT is a big
> project that uses this approach to validate its PRs.
> >
> > We could make our personalized clang-format configuration (that is
> almost trivial) and simply write a custom script which will be executed by
> Travis before the build process to make sure that all the code submitted is
> well written.
> >
> > I think this could be a great addition to the project since it will
> produce a more readable and consistent Shogun codebase over time.
> >
> > Here some further references:
> >   • https://clang.llvm.org/docs/ClangFormat.html
> >   • https://clang.llvm.org/docs/ClangFormatStyleOptions.html
> >   • https://github.com/llvm-mirror/clang/blob/master/
> tools/clang-format/git-clang-format
> > What do you think? Could it be worth the effort?
> >
> > Cheers,
> > Giovanni De Toni
>
>
>


Re: [shogun] ISSUE #3847

2017-08-07 Thread Fernando J . Iglesias García
Welcome Sahil!

Great that you have already successfully set up your dev environment.

For this particular task, I think it will be useful to get familiar with
Shogun's cross-validation. You could start by checking the related examples
(like this one
).
Then, you can get into understanding how the splitting strategy is
implemented internally (you can find the implementation by following the
appropriate include file from the example). You will also need to
understand details about the time-series splitting strategy, the links in
the github issue will be useful for this.

After, you should be ready to start implementing the time-series splitting.
Let us know how it goes.

Hope that helps!

Cheers,
Fernando.

On 5 August 2017 at 20:29, sahil chaddha  wrote:

> Ma'am/Sir,
>
>I want to work on this https://github.com/shogun-toolbox/shogun/issues/
> 3847. But I have no idea where to start. I am new to such big projects.
> Can anyone guide me through it? I have already setup the environment, ran
> tests and examples successfully.
>
> *Sahil Chaddha*
> Fourth Year Undergraduate Student
> Department of Metallurgy and Materials Engineering
> IIT Kharagpur, West Bengal - 721302
> +91-7872705997 <+91%2078727%2005997>,  LinkedIn
>  | Github
> 
>


Re: [shogun] I had a problem with Shogun

2017-05-10 Thread Fernando J . Iglesias García
Hey Zhao,

Can you provide the exact error message you are getting? Also if possible
please provide a minimal code sample reproducing your problem.

Cheers,
Fernando.

ps. I suggest you to subscribe to the mailing list so that your e-mails are
not 'waitlisted' until manual approval.

On 9 May 2017 at 02:54, 赵乐 <1414119...@qq.com> wrote:

> Hello!
>   I am a student from China who recently tried to use Shogun to do my
> homework. But in the process of installation and use, there are some
> problems, which is why I am sending this message. I hope you can help me.
>   I tried to install Shogun in the ubuntu 14.4 and Python2.7 environments,
> and here's the steps I took:
>
> sudo add-apt-repository ppa:shogun-toolbox/stable
> sudo apt-get update
>
> sudo apt-get install libshogun17
>
> sudo apt-get install python-shogun
>
>   Then used the ipython notebook to run the Multiple Kernel Learning
> example provided by the Shogun website in the browser.
>   The problems I encountered during these two processes were as follows:
> 1. I don't know if Shogun should be installed this way because I can't
> run the MKL example on my computer,because the code does not run, and
> output "only supports SVMlight".
> 2. Then, I try to run the examples in the Cloud, and the same problem
> arose again.
>   I am eager to receive your help, and I would appreciate it very mach.
> Le Zhao
> 2017.05.09
>


Re: [shogun] About CBinaryLabels and CMultilabelLabels

2017-05-29 Thread Fernando J . Iglesias García
@Heiko, I think this is a good idea. If I understand it correctly, you
suggest that the api has access to only one class for labels and then
Shogun (internally) figures out what label type (private for the api)
should be used (based on the input data, or in the learning algorithm once
they need to be used).

Is this documented somewhere in github?

On 29 May 2017 at 11:18, Heiko Strathmann 
wrote:

> I suggest:
> - habe only one class for labels, CLabels, not abstract.
> -remove all other label classes
>
> Internally there might be more distinction, an impl class that is
> instantiated based on the passed input for example.
>
>
>
> On Mon, 29 May 2017 at 08:51, Tiramisu Ling  wrote:
>
>> Thank you for your reply!
>>
>>>
>>> Rather get rid of all classes for labels, offer a single one that
>>> transparently does all the conversions, and checks validity/range.
>>
>>
>> Is that mean we should do the conversions inside CMultilabelLabels? Or
>> we could make CBinaryLabels accept zero as value of label.
>>
>> 2017-05-29 14:33 GMT+08:00 Heiko Strathmann :
>>
>>> Please no new class!
>>> Rather get rid of all classes for labels, offer a single one that
>>> transparently does all the conversions, and checks validity/range.
>>>
>>> Zero sense in my eyes to have multiple classes for labels when we do
>>> runtime checks in the algorithms anyways.
>>>
>>> Also not templates please
>>>
>>> H
>>>
>>> On Mon, 29 May 2017 at 08:28, Tiramisu Ling 
>>> wrote:
>>>
 Hi, I have a question about the difference between CBinaryLabels
 and CMultilabelLabels. Why we need to make CBinaryLabels as {-1, 1} but
 CMultilabelLabels define as {0,1...num_classs-1}? What about define
 something like 'DigitalLabels' which can accept {-1, 0, 1, ...num_classes},
 or just use CBinaryLabels as {0, 1}.

 Because we are going to Add global fixture with binary label data(issue
 3812 ), and
 I add some code(the for loop part
 )
 to make it could work with multilclass data(comment
 ).
 But the problem is CBinaryLabels can only accept -1 or 1 but
 CMultilabelLabels have 0,1 So I don't know how to generate two
 different labels by *one *uniform process(don't need to distinguish or
 specify which kind of label we want). And it seems like I can't cast from
 one label to another directly.

 Please give me some help about that. Thank you very much!

 Best Regards,
 MikeLing


 --
>>> Sent from my phone
>>>
>>
>> --
> Sent from my phone
>


Re: [shogun] About CBinaryLabels and CMultilabelLabels

2017-05-29 Thread Fernando J . Iglesias García
On 29 May 2017 at 08:27, Tiramisu Ling  wrote:

> Hi, I have a question about the difference between CBinaryLabels
> and CMultilabelLabels. Why we need to make CBinaryLabels as {-1, 1} but
> CMultilabelLabels define as {0,1...num_classs-1}? What about define
> something like 'DigitalLabels' which can accept {-1, 0, 1, ...num_classes},
> or just use CBinaryLabels as {0, 1}.
>

> Because we are going to Add global fixture with binary label data(issue
> 3812 ), and I
> add some code(the for loop part
> )
> to make it could work with multilclass data(comment
> ).
> But the problem is CBinaryLabels can only accept -1 or 1 but
> CMultilabelLabels have 0,1 So I don't know how to generate two
> different labels by *one *uniform process(don't need to distinguish or
> specify which kind of label we want). And it seems like I can't cast from
> one label to another directly.
>

I think the CBinaryLabels constructor with a threshold can help you to
create them from a vector of 0s and 1s:
https://github.com/shogun-toolbox/shogun/blob/develop/src/shogun/labels/BinaryLabels.cpp#L52

For your understanding, the reason why binary labels take values on {-1, 1}
has probably to do with the fact that some binary classification algorithms
are formulated succinctly exploiting these values. For example, consider a
linear model with weight vector is w, bias b, the feature vector of data
sample i is x_i and its label l_i. Note that l_i can be either -1 or 1.
Learning could be expressed as,
   find awesome w and b
   subject to
 for all i, (w*x_i+b)l_i >= 0


>
> Please give me some help about that. Thank you very much!
>
> Best Regards,
> MikeLing
>
>
>


Re: [shogun] [GSoC] Weekly Report for Apply Shogun Framework to Basketball Data Analyzation

2017-06-10 Thread Fernando J . Iglesias García
On 10 June 2017 at 02:51, Ting Pan  wrote:

> Hi all,
>
> Sorry for the late weekly report of the first week because of the exams:(
> I will do this on Monday next time.
>
> During the first week of GSoC, I mainly work on building ipython notebook
> for player clustering and game result prediction. I finished the data
> collection and did some data cleaning stuff as well. I will finish all
> Python notebooks for both clustering and prediction and work on training a
> model to classify players into different types this week. My weekly blog
> about clustering basketball players with Shogun is here: link
> 
>
> *Achievements:*
>
> • I wrote data collection script and collected necessary data from
> stats.nba.com and finished data cleaning and formatting for this project.
>

Hey Ting,

Do you have this code also somewhere in github?


> • I have listed all the data format of the dataset and explain each
> attribute. I wrote the description for play type data, particularly which
> is the main data type for this project.
> • I finished the K-means clustering notebook and refactored loss function
> with the advice by mentors. With this notebook, I clustered the players
> into different types with K-means and will finish labeling the players
> based on the clustering result in this week
>
> *Plan for this week:*
>
> • I have listed all the ideas I collected during the last week and will
> pick up the most related ideas and think about how those ideas can be
> applied to our project.
> • I will represent each team with the average score of each type and the
> played-time of each type for the prediction. And then I will finish
> creating proofs of concept that we could predict the game result based on
> player types.
> • I will also train a model to classify players into different types base
> on K-means clustering results.
>
> Best regards,
> Ting Pan
>


[shogun] Kick-off chat GSoC NN

2018-05-07 Thread Fernando J . Iglesias García
Hello,

Yesterday Elfarouk and I had the kick-off call for his gsoc project. A
brief summary of the discussions:

The PR integrating Stan in the build system has been merged.

Next is:
- to finish up the cookbook PR (some small language fixes and adapting to
new API);
- as well as starting off the loss abstract class that hands the work to
Stan. Further, there will be two subclasses implementing popular loss
functions (squared and cross-entropy). These will also serve as an
illustrative example for users about how to implement their own losses.

Do not hesitate and ask away if anything is missing or not clear.

Cheers,
Fernando.


Re: [shogun] Quick Question on SGVector

2018-05-12 Thread Fernando J . Iglesias García
When passing SGVector (or any non-primitive type) as an argument, I think
it is a good idea to use either const reference (for input or read-only
parameters) or pointer (for input and output parameters). In this way it is
clear from the calling site whether a method/function will modify its
arguments.

A bit more on this:
https://google.github.io/styleguide/cppguide.html#Reference_Arguments

On 12 May 2018 at 11:47, Heiko Strathmann 
wrote:

> BTW if you see pointers to SGVectors being passed around, that should
> probably be changed. Can you share the locations of it?
> H
>
> 2018-05-12 10:46 GMT+01:00 Heiko Strathmann :
>
>> Hi
>>
>> I cc the list as others might have the same question.
>>
>> SGVector (same for SGMarix) is a memory wrapper for c/c++ arrays, that
>> implements an automatic reference counter and therefore shared ownership.
>> If you assign the vector to another one using operator=, (or similarly,
>> pass it by value), then what happens is that a new SGVector structure is
>> generated, but it points to the same memory. Once all SGVector instances
>> that point to the memory block are destroyed, the memory block is freed.
>> This is why passing it by value is relatively efficient (it doesnt copy
>> the actual memory) and in particular it allows shared ownership, such as
>> SGVector get_vector() { return m_vector; }
>> which returns returns a copy of the SGVector instance that points to the
>> same memory that m_vector does.
>> This would allow me to modify a member variable as
>> auto vec = obj->get_vector()
>> vec[0] = 5; // now the memory block where Object::m_vector points to is
>> changed
>>
>> Now sometimes, helper methods that accept vectors are called many times
>> (in a loop). Copy- assigning the SGVector doesnt copy the memory block but
>> creates a new instance everytime, which can be slow. This is why we
>> sometimes pass around references, in particular const references make a
>> speed difference (see linalg).
>> There should never be the need to pass a pointer to an SGVector to a
>> helper method.
>>
>> Summary answer to your questions:
>> Passing SGVector by value creates a shared ownership of a fixed memory
>> block.
>> With respect to the memory block, passing SGVector by value allows for
>> what you called inplace updates (of the memory block).
>> Whenever you pass vectors to a helper method, you dont need to share
>> ownership, and therefore you can pass a by reference (if you want to
>> modify) or even const reference (if you just want to read). Obviously,
>> shared ownership comes at a (small) cost, which you might want to avoid in
>> low-level methods (say linalg).
>>
>> Hope that helps!
>> H
>>
>>
>> 2018-05-12 2:54 GMT+01:00 Elfarouk Harb :
>>
>>> Hi Heiko,
>>>
>>> Just a quick question about something that is confusing me a bit. Some
>>> times, if there is a need to do an inplace update of a variable, I see that
>>> SGVector is passed by a pointer, meaning:
>>> SGVector* ref_vars. However, sometimes, they are passed to functions
>>> which do in place updates but they are passed by value: SGVector
>>> ref_vars (example: http://shogun-toolbox.org/api/
>>> latest/DescendUpdaterWithCorrection_8cpp_source.html#l00054  
>>> variable_reference
>>> is passed by value but the function acts on it as if it is passed by
>>> reference). Is SGVector somehow being passed by reference in both ways
>>> or am I missing something?
>>>
>>> Thanks a lot,
>>> Elfarouk
>>>
>>
>>
>


[shogun] Fwd: Update 3

2018-05-24 Thread Fernando J . Iglesias García
-- Forwarded message -
From: Elfarouk Harb <eyfmh...@gmail.com>
Date: Wed, May 23, 2018, 22:18
Subject: Update 3
To: Fernando J. Iglesias García <fernando.iglesi...@gmail.com>


Hey Fernando,

Hope you're having a nice day.

For today, I did:

1) Addressed *all* issues in the cookbook and updated my medium blog post
2) Addressed *some* issues in the new PR (for the cost function)
3) Started planning how I'll approach with changing the API of the
minimizers to use stan
4) Found a bug with new API and opened a new issue.

Tomorrow I will:
1) Finish addressing the problems in the new PR (After I have a talk with
you or Heiko or Viktor on the IRC to verify some few things)
2) Start writing some code for the API of the minimizer that will use stan.
3) Help port a few examples to the new API. I will check the last few pull
requests by Wuwei Lin and see if I can send in a PR to help with porting
changes.
4) Address any new changes if any in the cookbook.

Have a nice day,
Elfarouk


[shogun] Fwd: Cookbooks Needed?

2018-06-26 Thread Fernando J . Iglesias García
-- Forwarded message -
From: Fernando J. Iglesias García 
Date: Tue, 26 Jun 2018 at 13:12
Subject: Re: [shogun] Cookbooks Needed?
To: Elfarouk Harb 


Have you tried querying the project's issues in GitHub? This got me
something back:
https://github.com/shogun-toolbox/shogun/issues?utf8=%E2%9C%93=is%3Aissue+is%3Aopen+cookbook

Also, if you have been told in a PR that a certain cookbook is not
necessary, then I am guessing that happened because the new cookbook in the
PR was essentially a copy+paste or very similar to an existing cookbook.
Did you check if there's anything relevant not yet in cookbooks? You could
check the list of tests of examples as a reference.

On Tue, 26 Jun 2018 at 01:10, Elfarouk Harb  wrote:

> Hi Everyone,
>
> Is there a list of cookbooks that are actually needed? I want to write a
> new cookbook but not exactly sure on what class (Since in previous prs both
> CMinimumSquareLoss and CDistance classes weren't needed).
>
> Thanks,
> Elfarouk
>


[shogun] Friendly blog review

2018-08-07 Thread Fernando J . Iglesias García
Hi Shubham,

For the GSoC wrap-up, I have been taking a look at your blog. Very good
work with the writing, it is nice and clear!

In the feature dispatching post
, one
thing that I would really like to see is the connection between CRTP and
dynamic+static polymorphism (aka virtual+template). The latter comes into
place from having a class hierarchy (e.g. Features, Machine) and the desire
of making some child class aware of specialized numerical types with
templates.

It could also be nice to explain the difference (and why) between the CRTP
in wikipedia
 (with
one template argument) and Shogun's (with two).

Everyone, what do you think?

Cheers,
Fernando.


Re: [shogun] Final post and personal page

2018-08-13 Thread Fernando J . Iglesias García
Hi Wuwei,

Very nice job with the final blog post, it gives a great overview of your
project.

A couple of small suggestions:
- I found the pipeline section the strongest, maybe it is an idea to move
it a bit higher up in the post to showcase it even more;
- in the linalg section, I am missing some motivation answering why going
from compile to runtime type information.

Congratulations!
Fernando.

On Sat, Aug 4, 2018, 08:36 Wuwei Lin  wrote:

> hi,
>
> here are the GSoC final post
> http://wuwei.io/post/2018/08/gsoc18-final-review/
> and personal page http://wuwei.io/post/2018/08/gsoc18-summary-page/
>
> feel free to review and feedback.
>
> thanks.
> wuwei
>


Re: [shogun] [GSoC] Refine the TensorBoard integration

2018-04-19 Thread Fernando J . Iglesias García
Hi Albert,

Thanks a lot for your e-mail. Your interest is very much appreciated.

I think your idea is reasonable and makes sense. One of Shogun's children
projects, Tapkee, is in fact built in that way: it is a standalone
header-only library, and additionally it is included in Shogun. I am
guessing this could be the reason why tflogger is a separate repo as well.

It would be a good idea to include Giovanni and Viktor in this
conversation. In fact, it is preferable to keep this type of discussion in
the open in the mailing list. You can of course add anybody in the senders
list directly, when you consider it appropriate. All in all, the
shogun-list should be in the CC list at least.

A couple of follow-up questions.
- What would you like to refactor in tflogger?
- What do you find problematic in the proposal?

Of course, feel free to open an issue and document your initiative.

Cheers,
Fernando.

On 19 April 2018 at 15:45, Jinquan Sun  wrote:

> Hi all:
>  I am Albert Sun (github url: https://github.com/sunalbert). I have
> applied the GSoC2018 project about "Inside the black box" in Shogun.
>  Recently, I investigate the feasible ways to refine the TensorBoard
> integration. The following two projects may provide some ideas for us:
>  1. dmlc/tensorboard 
>  2. awslabs/mxboard 
> Both two project aims to deliver a visualization solution for MXNet users.
> Compared to dmlc/tensorboard , the
> awslabs/mxboard  can support text,
> audio ,curve, embedding visualization. In my initial proposal, I proposed
> to integrate the TensorBoard APIs into a higher wrapper level. Thanks to
> the above projects, now I think we can rewrite the log toolkit in C++ ,
> which can handle various data types. Besides, we can enable it support
> multiple languages with the help of SWIG. The most ideal result is that the
> toolkit does not depend on Shogun, thus we can provide the toolkit for
> various machine/deep learning toolkit. This will be very cool. As a start,
> I'd like to refactor the shogun-toolbox/tflogger
> .
>  This proposal is rough and even problematic. I look forward to your
> suggestions. If you think the proposal is feasible, I'd like to open a new
> issue in github to list the details of it.
>
> Best,
> Albert Sun
>


Re: [shogun] [GSoC][Heiko] Regarding the project "Inside that black box"

2018-03-23 Thread Fernando J . Iglesias García
Hello Albert Sun,

Thank you for reaching out and welcome!

Issues label "good first use", like the one you mentioned, are just
recommendations to start getting hands-on.

We look forward to reading your proposal.

Cheers,
Fernando.

On 22 March 2018 at 15:31, Jinquan Sun  wrote:

> Hey there!
>
> I am Albert Sun (github url: https://github.com/sunalbert). I am really
> interested in shogun and want to make some contribution to it in the
> subject in GSoC 2018.
>
> I major in computer vision and employ several deep learning framework
> (e.g. tensorflow, pytorch etc. BTW, I once made contribution to a dynamic
> deep learning framework: dynet) in my daily work.  Some techniques to make
> the training  process controllable and visible have been widely used in
> these popular deep learning framework. I really want to make some
> contribution to shogun and introduce the same features into shogun.
>
> I am writing my proposal as asked in the wiki. I would submit it as soon
> as possible.
>
> By the way, is it necessary to fix the issue 3889(https://github.com/
> shogun-toolbox/shogun/issues/3889) as an entrance task? I have created
> PRs for fixing other issues (Hope it is also helpful for applying the
> project "Inside that black box").
>
> Thank you for your patience.
> Albert Sun
>


Re: [shogun] [shogun-team] design idea for feature type dispatching

2018-06-29 Thread Fernando J . Iglesias García
Hallo,

I think this looks good!

I like that it is flexible in the sense of supporting algorithms with and
without the dispatcher.

Just to be sure: CAlgorithm, CKernelAlgorithm, and CKernelAlgorithm2 are
showcasing three different ways of algorithm implementation, right? Namely,
with only DenseFeatures, with Dense and String Features, and with no
dispatching, respectively.

Cheers,
Fernando.

On Fri, 29 Jun 2018 at 11:02, Heiko Strathmann 
wrote:

> This mix-in idea doesnt work, as it turned out in discussions with
> Fernando and Shubham.
>
> Here is an attempt to do the same thing using a macro. Slightly ugly but
> well within Shogun's general style
> https://gist.github.com/karlnapf/2dd6a23001242cf01a45c99103b736d6
>
> Am Di., 26. Juni 2018 um 18:05 Uhr schrieb Heiko Strathmann <
> heiko.strathm...@gmail.com>:
>
>> The main thing is that we still want templated specialisations of the
>> train methods. Not sure that works with this other approach? But would have
>> to check...
>> I think we could think of the proposed solution as a mix of the double
>> dispatching (for feature type, string, dense, etc) and the mix ins for the
>> template method overloading...or?
>>
>> On Tue, 26 Jun 2018 at 12:07, Fernando J. Iglesias García <
>> fernando.iglesi...@gmail.com> wrote:
>>
>>> A different idea to using mix-ins for dispatching:
>>> https://en.wikipedia.org/wiki/Double_dispatch#Double_dispatch_in_C++
>>> In a nutshell the idea is that on calling CMahine::train(CFeatures* f)
>>> we use some method in the hierarchy of CFeatures that works out the
>>> "downcast". It feels a bit like Shogun's obtain_from_generic, though a key
>>> difference is that obtain_from_generic is static.
>>>
>>> What do you think?
>>>
>>> On Mon, 25 Jun 2018 at 17:55, Heiko Strathmann <
>>> heiko.strathm...@gmail.com> wrote:
>>>
>>>> feedback welcome.
>>>>
>>>> https://gist.github.com/karlnapf/95a9c72a642d61ec268a39407f8761b2
>>>>
>>>> Problem:
>>>> currently, dispatching happens inside train_machine of algorithm
>>>> specializations: redundant code, error prone, might be forgotten at all
>>>> Actually, only LARS, LDA do this, most other classes do nothing, i.e.
>>>> crash/error when something else than float64 is passed (bad)
>>>> LARS/LDA solved this via making the train method templated and then
>>>> dispatch inside train_machine.
>>>>
>>>> Solution we propose (result of a discussion with Viktor last week,
>>>> refined in a meeting with Giovanni and Shubhab today): Use mixins to keep
>>>> dispatching code in algorithm specializations, this allows for templated
>>>> train methods and gives a compile error if they are not implemented by the
>>>> algorithm author. Yet we can centralize the code to make algorithm
>>>> specialisations nicer and less error prone. See gist.
>>>> We will have to think how all this works with multiple feature types
>>>> (string, etc), and also how multiple mix-ins can be combined (e.g. LARS is
>>>> a LinearMachine, IterativeMixIn, DenseTrainMixIn, and it would be the
>>>> 'iteration' method that would be templated.
>>>> Shubham will draft a compiling minimal example for this.
>>>>
>>>>
>>>> First attempt (doesnt work)
>>>> Move dispatching into base class CMachine. Call templated train methods
>>>> there which are overloaded in subclass. BUT cannot have virtual templated
>>>> methods, so this wont fly.
>>>>
>>>>
>>>> H
>>>>
>>> ___
>>>> shogun-team mailing list
>>>> shogun-t...@shogun-toolbox.org
>>>> https://nn7.de/cgi-bin/mailman/listinfo/shogun-team
>>>>
>>> --
>> Sent from my phone
>>
>


Re: [shogun] Update

2019-08-16 Thread Fernando J . Iglesias García
Hi Ahmed,

Thanks a lot for the update. The findings about enabling the R interface
are valuable.
Hope you had fun and learned something cool about sycl 

Adding some redundancy to gsoc's admin e-mails, feel free to send us the
url to this work package they mention whenever you like. I think all these
investigations you have been doing (such as this one about R, the function
maps to make the developer's life easier, etc.) should be part of it. Of
course, on top of all the other things you have been busy with during the
project. Investigation results are very valuable as they can become the
origin of entrance issues for new developers, new features, and even
upcoming gsoc projects!

Cheers,
Fernando.

On Fri, 16 Aug 2019 at 08:26, Ahmed Essam  wrote:

> Hi all,
>
> Just a quick update. I tried reactivating the R interface, and almost no
> test have passed. Here are some of the problems that I encountered:
>
> 1. There are no enums in R, so they're mapped to strings. This doesn't
> work when the argument is int (specifically  machine_int_t) instead of the
> enum, as in the factories.
>
> 2. The typemaps are messed up, specially with overloaded functions as
> "put". For example, using "put" with a value for any number maps to
> SGVector instead. I believe this comes from the fact that the type of an R
> vector is the same as the type of its primitive.
>
> 3. After disabling SGVector typemap, the typemaps are still messed up. For
> example, calling "put" with a floating point number sets the value to an
> integer instead.
>
> 4. By just enabling the R interface, and when testing on the CI, there is
> a problem with pathing to the library's location:
> Error in library(shogun) : there is no package called 'shogun'
> https://dev.azure.com/theartfulae/shogun/_build/results?buildId=324
>
> I believe that all these problems should be easily fixable.
>
> I also looked into sycl with eigen (heard some talks about sycl, and
> played with toy examples.) I don't think there is enough time to produce
> results about its potential for linalg.
>
> Finally, tomorrow, I will send a draft for the final blog post.
>
> Thanks,
> Ahmed Essam.
>


Re: [shogun] Final Blogpost

2019-08-19 Thread Fernando J . Iglesias García
Hi Ahmed,

Thanks a lot for yet another round of good posts!

I have a small comment/doubt about the summary post
https://medium.com/@theartful.ae/gsoc-19-with-shogun-
project-review-c3cebddd1c,
in the linalg refactor; expression templates would be to tackle a different
issue to the one you describe in the previous paragraph (the one with your
design with std::visit and std::variant), right?

Great job!.

Cheers,
Fernando.

On Sat, 17 Aug 2019 at 18:43, Heiko Strathmann 
wrote:

> Nice one! I like them both :) actually especially the second one is really
> interesting because it is nontechnical
>
> I will send a round of feedback tomorrow, have some suggestions.
>
>
>
> On Sat, 17 Aug 2019 at 13:52, Ahmed Essam  wrote:
>
>> Hi all,
>>
>> I wrote two blogposts about the work I've done this summer, and the
>> experience I've had.
>> Here are the links:
>>
>> https://medium.com/@theartful.ae/gsoc-19-with-shogun-project-review-c3cebddd1c
>>
>> https://medium.com/@theartful.ae/my-gsoc-experience-with-shogun-d463670d901b
>>
>> I think however I might need to scrap and rewrite the second one as it
>> doesn't follow a logical order, and doesn't set the right tone.
>>
>> Thanks,
>> Ahmed Essam.
>>
> --
> Sent from my phone
>


Re: [shogun] Third Term of GSoC

2019-07-29 Thread Fernando J . Iglesias García
Hi Ahmed,

Thanks a lot for your feedback and sharing your thoughts.

About the suggestion on learning an algorithm, improving/optimizing it,
etc. IMHO, it should be possible to learn the math background and the
algorithm internals as you do the job. This of course depends (a lot) on
the algorithm itself, but maybe don't discard this option right away,
especially when it motivates you.
Have you taken a look at the issues in GitHub? Maybe there's something in
there that could get you started with learning some algorithm at the same
time that we could get some work done solving the issue.

About the list, let's go then for the PR for views first.

Another item you did some nice work on was the linalg design, how do you
feel about it?

Cheers,
Fernando.

On Sun, 28 Jul 2019 at 00:46, Ahmed Essam  wrote:

> Hello all,
>
> Thanks for your constructive criticism about last term. It was a little
> messy, and I didn't do my best. I will be more communicative and won't skip
> on blog posts :")
>
> Regarding this term, I need ideas for what to work on. What I'm currently
> interested in is learning about the implemented algorithms and maybe
> optimizing them if needed. But this is beyond the scope of my project, plus
> I lack the mathematical background, so maybe not in this summer :")
> I don't feel particularly passionate about any specific item of my
> original proposal, but here is the list:
> 1. Immutable features:
> - Subsets: I will come up with a PR for views soon (my mental image about
> an approach I had in mind isn't pretty yet).
> - Internal data access: Replacing 'get_feature_vector' and similar methods.
> - Preprocessors: Should we have on the fly methods? Should they be dropped
> from the features interface and depend instead on Transformer interface?
> 2. Stateless Distance API
> 3. Stateless Kernel API: The cached kernel bit is nearly finished.Then
> there is the set of "optimization" methods like "init_optimization" and
> "delete_optimization". Should be solvable using multiple inheritance (won't
> be a problem with swig since it's not exposed).
> 4. StringFeatures cleanup: Replacing SGString (nearly finished). Replacing
> raw pointers. Creating a new class for embedded string features.
> 5. SGObject Mixins: mutliple inheritance won't work. we could flatten the
> inheritance, but it would require variadic templates to do properly, and it
> doesn't work with swig.
>
> If you have any other ideas however, I think I would probably prefer it :")
>
> Thanks,
> Ahmed Essam.
>
>


Re: [shogun] spdlog

2019-07-23 Thread Fernando J . Iglesias García
Hey Ahmed,

On Mon, 22 Jul 2019 at 23:33, Ahmed Essam  wrote:

> Thanks for the feedback!
>
> spdlog is friendly with multi threaded code. In fact, I currently use in
> the PR (as viktor suggested) an async logger, which does the logging in a
> different thread altogether.
>

I think Heiko's point was regarding the global Shogun object you mentioned
in the first e-mail.


> Regarding the exposure of spdlog, I can add the functionality for
> directing streams to files, and we might discuss what other functionality
> is expected. However, exposing spdlog would give a huge amount of
> flexibility, as you can easily do whatever you want, thanks to their
> amazing interface and built-in sinks.
>
> Note that getting rid of macros means getting rid of source location
> (line, function, and file name).
>

I think source location is actually quite important for logging. There
should be some alternative keeping it, while allowing us to get rid of
macro's if that's worth.

About using fmtlib for "python-like syntax instead of or in addition to
printf's", what do you think?


>
>
> On Mon, Jul 22, 2019 at 11:24 PM Heiko Strathmann <
> heiko.strathm...@gmail.com> wrote:
>
>> Hi!
>>
>> Great initiative! See below
>>
>> Am Mo., 22. Juli 2019 um 19:49 Uhr schrieb Ahmed Essam <
>> theartful...@gmail.com>:
>>
>>> Regarding using spdlog, here are the assumptions:
>>> 1. There will be only one global SGIO object for logging.
>>>
>> Similar problems as for random: This means that multithreaded code will
>> also use this for logging, and we will need some sort of critical around
>> the log output. I wonder how that would affect performance. Also I wonder
>> how other (multithreaded) libs do this? Some minimal research might be good
>> here.
>>
>>
>>> 2. Two streams are used: stdout (for all log levels except for
>>> MSG_ERROR) and stderr, and the user can direct any of them.
>>>
>> That sounds very reasonable
>>
>> 3. spdlog will be exposed. That is the user will use spdlog sinks to
>>> redirect streams. I don't see a reason to wrap around them or add options
>>> to manipulate them further. Exposing spdlog is more flexible. So we can use
>>> "dist_sink" for example to redirect to multiple sinks.
>>>
>> I am not too sure about this. Backend libraries tend to change over the
>> years and explicitly adding those into the codebase usually causes
>> headaches later on. So I'd actually prefer some plugin like thing, but I
>> can probably be convinced ;)
>>
>>
>>>
>>> Some questions:
>>> 1. Are these assumptions reasonable?
>>> 2. Should we use fmtlib? I currently format using sprintf, but fmtlib is
>>> supposedly similar in runtime speed, and adds a little overhead in compile
>>> time (header library with compile time format checking!). The downside (for
>>> us) is that it uses python-like formatting.
>>>
>> Seems fine to me
>>
>>
>>> 3. Should we get rid of macros?
>>>
>> I think that would be nice. I guess they were used to switch off things
>> at compile time. We don't really do that anymore.
>>
>>
>


Re: [shogun] Regarding the Time Series GSoC project at Shogun

2020-02-27 Thread Fernando J . Iglesias García via shogun-list
Hi Rijul,

Thanks for reaching out!
I am including in the CC our mailing list and Markus.

It is great you are already familiar with time series data. To start with
your contribution to Shogun, the very first step is to send at least one
patch. This can be anything, related to your project of interest or on a
completely different topic (like fixing a known issue, adding a new example
showcasing something that appeals you).
If you need inspiration for an initial task, you can check out our list of
entrance issues that you can find on github.

Note that contributing during the GSoC application period is a requirement.
We want you to get familiar with general ways of working in open source and
with Shogun.

For the time series project itself, I suggest you to read well the project
description, get familiar with the concepts described, explore from the
links, and of course feel free to ask any questions.

Looking forward to hearing from you.

Cheers,
Fernando.

On Tue, 25 Feb 2020 at 20:58, Rijul Ganguly 
wrote:

> Respected Mr. Garcia,
>
> I am a 3rd-year undergraduate student pursuing a Computer Science degree
> at Birla Institute of Technology and Science (BITS) - Pilani, Goa campus,
> India.
>
> I would be interested to contribute to designing the Time Series API as
> mentioned on the project page. I have a basic idea of how pandas deals with
> time series and about how object oriented APIs work. I do not have any
> experience on the Shogun API. Can you please tell me the best way to get
> familiar with the same, and what other requirements you have for me?
>
> Thanking you
> Rijul Ganguly
>
>


Re: [shogun] General Typed Testing Proposal

2020-01-16 Thread Fernando J . Iglesias García via shogun-list
Hi Ahmed,

Great to hear from you! Apologies for the long delayed reply, the e-mail
was directly archived and I didn't notice it until now.

I like the idea.

How would you like to proceed?
We are currently in GSoC 2020 application period and we will apply as
mentoring org. Would you like to participate with Shogun in 2020 as student
or even mentor?
We could make your generic testing with introspection idea a project
proposal, or part of one.

Cheers,
Fernando.

On Tue, 10 Dec 2019 at 18:44, Ahmed Essam via shogun-list <
shogun-list@shogun-toolbox.org> wrote:

> Hello all,
>
> Currently, there is no systematic testing of classes that does not require
> the manual enumeration of new sub-classes. An attempt to make a testing
> framework that can automatically test class families has been made in PR
> #4712 .
>
> I want to make progress towards a systematic framework for class testing
> with the following goals:
> 1. Provide a reflection-like metaprogramming method to enumerate the class
> hierarchy.
> 2. Use this generated information to create static typed tests. This
> differs from the previous PR in that: (a) we don't need to update class
> lists manually, and (b) provide a neat way to specialize tests given the
> used type.
>
> We can do this by extending the "class_list.py" script to generate the
> whole class hierarchy, and using a template language to answer type queries.
> Here is an example of what that would look like:
> IterativeMachine_unittest.cc
> .
>
> I have a working example locally that uses the exact same syntax. There is
> a shortcoming of this approach: the parsing is done by an ad-hoc python
> script. A possible solution is to use a tool to automate the parsing like
> jinja, which has been used before in 'clone_unittest.cc`.
>
> Since there are multiple ad-hoc python scripts that parse specific files,
> I think we can extend this proposal to other parts of the code base to
> unify template file generation.
>
> Thanks,
> Ahmed Essam.
>