Could the 'metadata model' be a separate file?

On Tue, Jun 7, 2011 at 12:22 AM, Hector Yee <[email protected]> wrote:
> I've used systems before that kept the original mapping to the classifier
> specific mapping.
> It can be nice because you can add new features and an old model may still
> work because the new features would be out of range of the old mappings.
> It can also provide a place to store score statistics (such as min / max /
> avg / std dev) for classifiers that need to normalize their features, such
> as the linear models.
>
> It could be something like this
>
> FeatureInfo
>  int32 original_index
>  int32 internal_index
>  float min_value
>  float max_value
>
> FeatureSetInfo
>  repeated FeatureInfo
>
> The drawback is potentially adding 32-bytes per feature, which could be
> detrimental in terms of size, especially for high dimensional feature spaces
> (e.g. text).
> If the writable interface could make this optional it would work.
> Or we could make all classifiers have a fixed header that we write
> containing the common meta-data followed by the actual model itself.
>
> On Mon, Jun 6, 2011 at 3:17 AM, Ted Dunning <[email protected]> wrote:
>
>> You have to remember that mapping.  You will have created it when you
>> encoded the target variable.
>>
>> This is occasionally a nasty problem.  I have considered adding the ability
>> to record a dictionary in the classification models, but have not done so.
>>
>> What interface would you like to see?
>>
>> Hector, you might like a vote on this.  What do you think?
>>
>> Jeff, what do you think about the impact on the clustering/classification
>> unification?
>>
>> On Sat, Jun 4, 2011 at 10:39 PM, XiaoboGu <[email protected]> wrote:
>>
>> > How can I find the map between original target labels and the encoded
>> > target codes?
>> >
>>
>
>
>
> --
> Yee Yang Li Hector
> http://hectorgon.blogspot.com/ (tech + travel)
> http://hectorgon.com (book reviews)
>



-- 
Lance Norskog
[email protected]

Reply via email to