Vasil-

Would you consider adding your estimation algorithm to this patch?
https://issues.apache.org/jira/browse/MAHOUT-563

The estimator in there now is stupid- a real one would make the Canopy
algorithms orders of magnitude more useful.

Lance

On Fri, Jan 21, 2011 at 7:16 AM, Ted Dunning <[email protected]> wrote:
> On Fri, Jan 21, 2011 at 12:39 AM, Vasil Vasilev <[email protected]> wrote:
>
>>
>> dimension 1: Using linear regression with gradient descent algorithm I find
>> what is the trend of the line, i.e. is it increasing, decreasing or
>> straight
>> line
>> dimension 2: Knowing the approximating line (from the linear regression) I
>> count how many times this line gets crossed by the original signal. This
>> helps in separating the cyclic data from all the rest
>> dimension 3: What is the biggest increase/decrease of a single signal line.
>> This helps find shifts
>>
>> So to say - I put a semantics for the data that are to be clustered (I
>> don't
>> know if it is correct to do that, but I couldn't think of how an algorithm
>> could cope with the task without such additional semantics)
>>
>
> It is very common for feature extraction like this to be the key for
> data-mining projects.   Such features are absolutely critical for most time
> series mining and are highly application dependent.
>
> One key aspect of your features is that they are shift invariant.
>
>
>> Also I developed a small swing application which visualizes the clustered
>> signals and which helped me in playing with the algorithms.
>>
>
> Great idea.
>



-- 
Lance Norskog
[email protected]

Reply via email to