Hi there! My name is Paco, I'm a computer science student (degree) which is
starting to learn ML and is interested in get involved with shogun project
in order to improve his knowledge about ML and OpenSource project workflows.

I write this in order to ask some question about how to start using shogun
with real world examples (I'm interested in use it, for example, with
Kaggle competitions).

It is true that there are lots of examples telling us how to use shogun but
most of them don't explain how to preproces dataset before conver it in a
RealFeatures instance and, all python notebooks and doc examples use always
numerical data from start to end...

So, I would like to know how should I proceed with shogun when I have non
numerical data and also when I have labels and features in the same file.

Should I modify my dataset before start using shogun in order to separate
labels and features and to convert categorical data in some kind of dummy
variables or even codify each possible value in a integer number? Or could
I use shogun for this purpose?

In university I've use pandas framework to this kind of tasks and then,
I've been using scikit-sklearn for algorithms (always use python) and I
think I could use pandas + shogun in the same way but... as there exists
lot of IO class such as CCSVFile class or CStringFeatures class I think
maybe I could use them instead of pandas (because I could use pandas only
working in python and use shogun would be nice in others shogun-supported
languages like C++ or R.

That's my question, I would like to know your opinions and, if possible I
would like to see real-world projects which uses shogun to solve problems.

Greeting  --  Francisco (Paco) Navarro Morales.

Reply via email to