Hi there! My name is Paco, I'm a computer science student (degree) which is starting to learn ML and is interested in get involved with shogun project in order to improve his knowledge about ML and OpenSource project workflows.
I write this in order to ask some question about how to start using shogun with real world examples (I'm interested in use it, for example, with Kaggle competitions). It is true that there are lots of examples telling us how to use shogun but most of them don't explain how to preproces dataset before conver it in a RealFeatures instance and, all python notebooks and doc examples use always numerical data from start to end... So, I would like to know how should I proceed with shogun when I have non numerical data and also when I have labels and features in the same file. Should I modify my dataset before start using shogun in order to separate labels and features and to convert categorical data in some kind of dummy variables or even codify each possible value in a integer number? Or could I use shogun for this purpose? In university I've use pandas framework to this kind of tasks and then, I've been using scikit-sklearn for algorithms (always use python) and I think I could use pandas + shogun in the same way but... as there exists lot of IO class such as CCSVFile class or CStringFeatures class I think maybe I could use them instead of pandas (because I could use pandas only working in python and use shogun would be nice in others shogun-supported languages like C++ or R. That's my question, I would like to know your opinions and, if possible I would like to see real-world projects which uses shogun to solve problems. Greeting -- Francisco (Paco) Navarro Morales.