Hi! After UIMA I want to bring up an other topic: the question of feedback about the suggested enhancements for machine learning. This would basically mean that the enhancement engine gets notified about what happened with a suggested enhancement.
I see that there are plans to tackle this (https://issues.apache.org/jira/browse/STANBOL-570) , so I thought I share some findings we had with Sztakipedia in this topic. It might be useful for someone and I want to write a white paper on this topic anyway, so I will be happy if someone wants to discuss this. In Sztakipedia, there is a feedback mechanism I'm not going to describe now as I realized that adjustments has to be done on it. Instead, I share you how I would do it if I started now, and I will try to translate it to Stanbol concepts: == Enhancement statuses == These would be the logical statuses of an enhancement suggestion (0) enhancement created (a) shown, but not acted upon yet (+good to have: for how many seconds was it displayed) - the enhancement is displayed to the user (b) accepted (+seconds it displayed until acceptance) - user explicitly clicked/tapped, etc. on this suggestion to accept (c) rejected (+seconds) - user explicitly clicked/tapped to reject (d) not acted upon and not shown anymore (+seconds it was displayed) - the enhancement suggestion is not shown anymore: shifted out because of other suggestions, window closed, etc Life cycle: (0) -> (a) -> (b or c or d) Most of the time it is safe to assume that an enhancement is in (a) by default, after it went out - however, some clients might want to filter out certain enhancements, so that they are never displayed. I'm pretty sure you will have this phenomenon with stanbol - cms integrators will download and start stanbol and just customize&filter what comes out of it, because for them it will be much easier to make adjustments on the client side than to touch stanbol stuff. But if some enhancements are filtered out by some rule we don't know that corrupts the feedback. So yes, I think there should be an (a) callback as well. I think (b), (c), (d) are straightforward. So there are potentially two status changes on which a callback should be called. Below are some variables that are good to have when the callback comes in: == Application Id == The application or CMS that is consuming the Enhancement Suggestions should be identified with e.g a string. One might want to put many apps to a Stanbol endpoint I think. == Presentation Id == Some web designs are likely to "sell" more enhancements than others. So the CMS that is displaying the enhancement should provide some simple string name that at least locally identifies the mode of presentation. Different designs should get different names. Maybe most of the time there will be one way of presentation per application so this field might appear kinda pointless. But once people start to use skins and stuff like that, it all changes. == out of how many? == I suppose you either train complete Enhancement Chains to score their engines, or Enhancement Engines individually (or both). If you want to train a particular Engine, it is good to know how many other enhancements from other engines were displayed when (a) (b) (c) or (d) callback comes in. If you are the only Engine in town, you get a much higher clicks then if there are dozens of suggestions. If I am correct, the Engine cannot always get this info from the ContentItem == Document Id == The document on which the user works should be identified with e.g. a string. == user pseudonym == We should be able to tell which click came from which user. Not only for some sophisticated user profiling (which is always useful), but for a simpler reason. Naturally, the users should be able to reject or make go away some enhancement suggestions while they are working on a document. And the system they use must remember these decisions. Otherwise, it will recommend the same, already rejected thing for the same document for the same user, which will make it look terribly stupid. Of course these might be in the ContentItem when possible, or sent as parameters with the callback. Ok, these were my views on feedback gathering from document enhancement, distilled after three major iterations with the Sztakipedia software. What do you think? What from the above could be realized with Stanbol? Cheers Mihály
