Hi !

We would like to pass customized distance metrics to Canopy and KMeans. We
would like to define weights, parameters, objects, so that when Canopy or
KMeans computes a distance on two given vectors it can access these extra
parameters. We are using Mahout 0.7.

Basically we are looking for the same features than
WeightedDistanceMeasure, apart from the fact that we do not know how to
specify the "weightsFile" and more importantly, we need to make it work
with customized Configuration objects, i.e., if Mahout executes internally
a simple "new Configuration()" with the hope of getting the appropriates
values, this solution will not work.

Our first attempt to pass instanciated DistanceMeasure objects to
CanopyDriver.run() or KMeansDriver.run() did not work as it seems the
distance-measure objects are re-created with the constructor without
argument on each nodes, so it is not possible to customize them, nor to use
getter/setters, not to use static variables because their values will get
lost when the objects will be transmitted on the nodes.

We later thought about using the Configuration() object, by setting
properties/values using set/get. For this it seems we need to define our
own abstract class of "WeightedDistanceMeasure" as the Configuration object
is not directly accessible from herited classes (for instance from
EuclidianDistanceMeasure).

But more important, it seems that for Canopy, the configuration is lost
between the mappers and the reducers. Effectively, in the funcion
ClusterData, line 372 of the
file core/src/main/java/org/apache/mahout/clustering/canopy/CanopyDriver.java,
it seems that the call to run() does not take into account the "conf"
parameter... Should not it be ClusterClassificationDriver.run(conf, ...).
Actually we tested a new version of the code and it worked for Canopy.

But for KMeans, the call stack looks more obscure and it was impossible for
us to figure out what to do...

Thank you for any help!

Cheers,







2013/5/16 <[email protected]>

> Hi! This is the ezmlm program. I'm managing the
> [email protected] mailing list.
>
> I'm working for my owner, who can be reached
> at [email protected].
>
> Acknowledgment: I have added the address
>
>    [email protected]
>
> to the user mailing list.
>
> Welcome to [email protected]!
>
> Please save this message so that you know the address you are
> subscribed under, in case you later want to unsubscribe or change your
> subscription address.
>
>
> --- Administrative commands for the user list ---
>
> I can handle administrative requests automatically. Please
> do not send them to the list address! Instead, send
> your message to the correct command address:
>
> To subscribe to the list, send a message to:
>    <[email protected]>
>
> To remove your address from the list, send a message to:
>    <[email protected]>
>
> Send mail to the following for info and FAQ for this list:
>    <[email protected]>
>    <[email protected]>
>
> Similar addresses exist for the digest list:
>    <[email protected]>
>    <[email protected]>
>
> To get messages 123 through 145 (a maximum of 100 per request), mail:
>    <[email protected]>
>
> To get an index with subject and author for messages 123-456 , mail:
>    <[email protected]>
>
> They are always returned as sets of 100, max 2000 per request,
> so you'll actually get 100-499.
>
> To receive all messages with the same subject as message 12345,
> send a short message to:
>    <[email protected]>
>
> The messages should contain one line or word of text to avoid being
> treated as sp@m, but I will ignore their content.
> Only the ADDRESS you send to is important.
>
> You can start a subscription for an alternate address,
> for example "[email protected]", just add a hyphen and your
> address (with '=' instead of '@') after the command word:
> <[email protected]>
>
> To stop subscription for this address, mail:
> <[email protected]>
>
> In both cases, I'll send a confirmation message to that address. When
> you receive it, simply reply to it to complete your subscription.
>
> If despite following these instructions, you do not get the
> desired results, please contact my owner at
> [email protected]. Please be patient, my owner is a
> lot slower than I am ;-)
>
> --- Enclosed is a copy of the request I received.
>
> Return-Path: <[email protected]>
> Received: (qmail 58557 invoked by uid 99); 16 May 2013 16:39:20 -0000
> Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230)
>     by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 May 2013 16:39:20
> +0000
> X-ASF-Spam-Status: No, hits=1.5 required=5.0
>         tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS
> X-Spam-Check-By: apache.org
> Received-SPF: pass (nike.apache.org: domain of [email protected]
> 209.85.210.51 as permitted sender)
> Received: from [209.85.210.51] (HELO mail-da0-f51.google.com)
> (209.85.210.51)
>     by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 May 2013 16:39:11
> +0000
> Received: by mail-da0-f51.google.com with SMTP id h15so1755207dan.24
>         for <user-sc.1368722205.hlenahiphdlcdfinndmm-korbeille=
> [email protected]>; Thu, 16 May 2013 09:38:50 -0700 (PDT)
> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
>         d=gmail.com; s=20120113;
>         h=mime-version:x-received:in-reply-to:references:date:message-id
>          :subject:from:to:content-type;
>         bh=8z8pEmWUvDuxnKGP/BqYbBnUh1GzLxO4hK7mAIxPZuk=;
>
> b=Ey7utC6zW1KiGgh9Xq0xEG0km/eLDEz3ClK9I+7KqSuavn6r8RI4eY0gguVfnpScbd
>
>  r9UpwPi7GrxrdvDYKEzJypjtt2ndzV937QXMXkOyQyGPoqX4vSYfBmojLZggHuXmOmKm
>
>  JBJEF1M3liQUq3OauvlUyE1ZtNbERuIOJ3UWIBfhJWnTTUUZtVRpgPe5LPkB+8fUFBMq
>
>  eTkVaNDS+03VKaxbObet1P5L+Utm8I8tbjj2fkLyH52Hdi2Lj2maTL+HQJvx9licLueN
>
>  WOhE5uzE6m5czHmwPPzZU+XNF4wm7aJpurgLHneKjB0DDBq8ePqGiot2R2imVh/8eJEA
>          7thg==
> MIME-Version: 1.0
> X-Received: by 10.66.121.169 with SMTP id
> ll9mr44383858pab.126.1368722330472;
>  Thu, 16 May 2013 09:38:50 -0700 (PDT)
> Received: by 10.70.11.35 with HTTP; Thu, 16 May 2013 09:38:50 -0700 (PDT)
> In-Reply-To: <[email protected]>
> References: <[email protected]>
> Date: Thu, 16 May 2013 18:38:50 +0200
> Message-ID: <
> camntc0hjzwfylxbmbhj6ctasnn63bdx8-da8uy5cgb8shij...@mail.gmail.com>
> Subject: Re: confirm subscribe to [email protected]
> From: H H <[email protected]>
> To: user-sc.1368722205.hlenahiphdlcdfinndmm-korbeille=
> [email protected]
> Content-Type: multipart/alternative; boundary=047d7b2e4fa44dd89904dcd880eb
> X-Virus-Checked: Checked by ClamAV on apache.org
>
>

Reply via email to