[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2012-02-12 Thread Dan Brickley (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206416#comment-13206416
 ] 

Dan Brickley commented on MAHOUT-524:
-

Shannon informs me I'm getting this error because node IDs must be counted from 
zero. I've updated the wiki to say this more explicitly. So this JIRA can stay 
closed, phew.

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, 
> MAHOUT-524.patch, MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, 
> aff.txt, raw.txt, screenshot-1.jpg, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2012-02-11 Thread Dan Brickley (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206256#comment-13206256
 ] 

Dan Brickley commented on MAHOUT-524:
-

I just tried spectral k-means with some wikipedia/dbpedia data (1.0 affinities 
for every page and topic category URL pair in the Wiki. Data came from 
http://downloads.dbpedia.org/3.7/en/article_categories_en.nt.bz2 and is dropped 
in the Web at http://danbri.org/2012/spectral/dbpedia/ (I posted .csv plus an 
int-to-URL dictionary file).

My best guess at commandline (running this w/ today's trunk + a fresh 
0.20.203.0 hadoop pseudo-cluster) was this:

mahout spectralkmeans -i wiki/ -o output1 -k 20 -d 4192499 --maxIter 10
(where hdfs wiki/ subdir contains the .csv data file)

Unfortunately I'm hitting one of the various problems discussed above. If 
anyone else can reproduce this, perhaps a fresh JIRA is needed.

It gets stuck after the first job, with an essentially empty seqfile. Full 
transcript here: https://gist.github.com/1804016

(checked with "mahout seqdumper --seqFile 
output1/calculations/diagonal/part-r-0")

This is essentially the same experience I had back in Sept (see above) running 
a similar test. 

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, 
> MAHOUT-524.patch, MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, 
> aff.txt, raw.txt, screenshot-1.jpg, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-12-29 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177385#comment-13177385
 ] 

Hudson commented on MAHOUT-524:
---

Integrated in Mahout-Quality #1279 (See 
[https://builds.apache.org/job/Mahout-Quality/1279/])
MAHOUT-524: committing patch since Shannon has no internet. All tests run

jeastman : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1225596
Files : 
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/spectral/common/VectorMatrixMultiplicationJob.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/spectral/kmeans/SpectralKMeansDriver.java
* 
/mahout/trunk/examples/src/main/java/org/apache/mahout/clustering/display/DisplayClustering.java
* 
/mahout/trunk/examples/src/main/java/org/apache/mahout/clustering/display/DisplaySpectralKMeans.java
* 
/mahout/trunk/math/src/main/java/org/apache/mahout/math/decomposer/lanczos/LanczosSolver.java
* 
/mahout/trunk/math/src/main/java/org/apache/mahout/math/decomposer/lanczos/LanczosState.java


> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, 
> MAHOUT-524.patch, MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, 
> aff.txt, raw.txt, screenshot-1.jpg, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-12-29 Thread Jeff Eastman

I just committed the patch with numClusters = 3 in DisplaySpectralKMeans.
Jeff

On 12/28/11 9:25 PM, Shannon Quinn wrote:

Sorry for replying via the dev list, but I am without Internet access beyond my 
phone. Yes, unless anyone testing can find issues with this patch (or with the 
one Grant posted earlier, as mine contains his), it is meant to be committed.

Due to the aforementioned lack of Internet, if someone could commit this for me 
that's be fantastic.

On Dec 28, 2011, at 17:13, "Jeff Eastman (Commented) (JIRA)"  
wrote:


[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176876#comment-13176876
 ]

Jeff Eastman commented on MAHOUT-524:
-

Shannon, is this patch ready to commit? I've installed it and verified that 
DisplaySpectralKMeans is indeed finding clusters. By increasing the numClusters 
from 2 to 3 it now does a credible job of finding the 3 clusters present in the 
generated data.


DisplaySpectralKMeans example fails
---

Key: MAHOUT-524
URL: https://issues.apache.org/jira/browse/MAHOUT-524
Project: Mahout
 Issue Type: Bug
 Components: Clustering
   Affects Versions: 0.4, 0.5
   Reporter: Jeff Eastman
   Assignee: Shannon Quinn
 Labels: clustering, k-means, visualization
Fix For: 0.6

Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, 
MAHOUT-524.patch, MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, aff.txt, 
raw.txt, screenshot-1.jpg, spectralkmeans.png


I've committed a new display example that attempts to push the standard mixture 
of models data set through spectral k-means. After some tweaking of 
configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
k-means to completion. The display example is expecting 2-d clustered points 
and the example is producing 5-d points. Additional I/O work is needed before 
this will play with the rest of the clustering algorithms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira








Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-12-28 Thread Shannon Quinn
Sorry for replying via the dev list, but I am without Internet access beyond my 
phone. Yes, unless anyone testing can find issues with this patch (or with the 
one Grant posted earlier, as mine contains his), it is meant to be committed. 

Due to the aforementioned lack of Internet, if someone could commit this for me 
that's be fantastic.

On Dec 28, 2011, at 17:13, "Jeff Eastman (Commented) (JIRA)"  
wrote:

> 
>[ 
> https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176876#comment-13176876
>  ] 
> 
> Jeff Eastman commented on MAHOUT-524:
> -
> 
> Shannon, is this patch ready to commit? I've installed it and verified that 
> DisplaySpectralKMeans is indeed finding clusters. By increasing the 
> numClusters from 2 to 3 it now does a credible job of finding the 3 clusters 
> present in the generated data.
> 
>> DisplaySpectralKMeans example fails
>> ---
>> 
>>Key: MAHOUT-524
>>URL: https://issues.apache.org/jira/browse/MAHOUT-524
>>Project: Mahout
>> Issue Type: Bug
>> Components: Clustering
>>   Affects Versions: 0.4, 0.5
>>   Reporter: Jeff Eastman
>>   Assignee: Shannon Quinn
>> Labels: clustering, k-means, visualization
>>Fix For: 0.6
>> 
>>Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, 
>> MAHOUT-524.patch, MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, 
>> aff.txt, raw.txt, screenshot-1.jpg, spectralkmeans.png
>> 
>> 
>> I've committed a new display example that attempts to push the standard 
>> mixture of models data set through spectral k-means. After some tweaking of 
>> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
>> k-means to completion. The display example is expecting 2-d clustered points 
>> and the example is producing 5-d points. Additional I/O work is needed 
>> before this will play with the rest of the clustering algorithms. 
> 
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA 
> administrators: 
> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
> 
> 


[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-12-28 Thread Jeff Eastman (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176876#comment-13176876
 ] 

Jeff Eastman commented on MAHOUT-524:
-

Shannon, is this patch ready to commit? I've installed it and verified that 
DisplaySpectralKMeans is indeed finding clusters. By increasing the numClusters 
from 2 to 3 it now does a credible job of finding the 3 clusters present in the 
generated data.

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, 
> MAHOUT-524.patch, MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, 
> aff.txt, raw.txt, screenshot-1.jpg, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-12-27 Thread Dan Brickley (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176203#comment-13176203
 ] 

Dan Brickley commented on MAHOUT-524:
-

Great to see this getting wrapped up. Can you suggest what commandline(s) and 
test input others might try to verify this?

I have some py-generated afftest.txt left from previous investigations but 
forget its exact origins.

I also have some real world similarity data with labeled items; how would I use 
those?

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, 
> MAHOUT-524.patch, MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, 
> aff.txt, raw.txt, screenshot-1.jpg, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-12-07 Thread Shannon Quinn (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164871#comment-13164871
 ] 

Shannon Quinn commented on MAHOUT-524:
--

I believe you and Rozemary need to apply the patch that is attached to this 
issue to get past the "tmp/data" error. It stems from the Lanczos solver, but 
is likely a symptom of being called by SKM incorrectly.

I'm still working on the patch for this, will hopefully be done soon...

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, 
> MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, 
> screenshot-1.jpg, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-12-07 Thread Kevin Findlay (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164713#comment-13164713
 ] 

Kevin Findlay commented on MAHOUT-524:
--

Slightly confused I have checked that the Mahout-524 patches are included in my 
current build of the trunk.

However I still get the file not fount error "tmp/data" as decribed in the 
Subtask.

Have I got the versions right?

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, 
> MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, 
> screenshot-1.jpg, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-12-06 Thread Rozemary Scarlat (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163860#comment-13163860
 ] 

Rozemary Scarlat commented on MAHOUT-524:
-

Hi! I am new to Mahout and I have been trying to use the K-means Spectral 
Clustering, but I ran into the problem described in the comments above: the 
Lancsoz solver tries to input the output of the VectorMatrixMultipliction as a 
"calculations/laplacian-166/tmp/data" file, instead of the 
"calculations/laplacian-166/part-m-0".
I was wondering if currently there is a way to a run the Spectral Clustering.

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, 
> MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, 
> screenshot-1.jpg, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-12-06 Thread Rozemary Scarlat (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163861#comment-13163861
 ] 

Rozemary Scarlat commented on MAHOUT-524:
-

Hi! I am new to Mahout and I have been trying to use the K-means Spectral 
Clustering, but I ran into the problem described in the comments above: the 
Lancsoz solver tries to input the output of the VectorMatrixMultipliction as a 
"calculations/laplacian-166/tmp/data" file, instead of the 
"calculations/laplacian-166/part-m-0".
I was wondering if currently there is a way to a run the Spectral Clustering.

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, 
> MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, 
> screenshot-1.jpg, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-22 Thread Shannon Quinn (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155306#comment-13155306
 ] 

Shannon Quinn commented on MAHOUT-524:
--

Unknown. Still coding up a way of coloring the dots rather than drawing circles.

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, 
> MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, 
> screenshot-1.jpg, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-22 Thread Dan Brickley (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155303#comment-13155303
 ] 

Dan Brickley commented on MAHOUT-524:
-

(hmm this issue seems something of a proxy for general code rot and problems 
with the spectral piece of Mahout)
 
Where are we with this? I see "a symptom of us calling the job wrong", and 
"throwing off the final results". Is the problem purely in the displaying of 
spectral k-means, or something deeper e.g. if I want eigenvectors and values of 
laplacian re-representation of an affinity matrix, is the underlying code in a 
happy state?


> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, 
> MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, 
> screenshot-1.jpg, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-03 Thread Grant Ingersoll (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13143512#comment-13143512
 ] 

Grant Ingersoll commented on MAHOUT-524:


bq. If at all possible, my suggestion would be colored dots to indicate the 
clusters.

There is no requirement that we have to draw circles or leverage the old code, 
we just need something that works.

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, 
> MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, 
> screenshot-1.jpg, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-03 Thread Shannon Quinn (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13143510#comment-13143510
 ] 

Shannon Quinn commented on MAHOUT-524:
--

After implementing the same code in Python, my suspicions are actually that the 
results of the K-means at the conclusion of the spectral algorithm is throwing 
off the results. Regular K-means is running on the spectral data: the top 
k-eigenvectors of the affinities, rather than the original data. I don't know 
K-means well enough to know for sure, but my guess is that all the distance 
measurements that come back in its output format are relative to the spectral 
data, rather than the original data. So what you see in the end-result graph 
are circles around where the spectral data are.

That'd be my first guess, anyway. I'm working on a couple things to help with 
this: a sequential version of spectral k-means, and a job to read raw data 
(text format: whitespace or comma-separated n-dimensional points) and convert 
it to affinities (a la issue 518, finally!). Hopefully these will help diagnose 
spectral k-means.

But if it is a data issue, I'm not sure how we can translate the distance 
measurements on the spectral data back onto the original data for the 
DisplaySKM code. I would argue, though, that since spectral k-means doesn't 
operate on the same GMM-type basis that regular K-means does, overlaying K 
gaussians isn't really what we want here, anyway. If at all possible, my 
suggestion would be colored dots to indicate the clusters. 

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, 
> MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, 
> screenshot-1.jpg, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Grant Ingersoll (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142635#comment-13142635
 ] 

Grant Ingersoll commented on MAHOUT-524:


bq. I applied your patch but I'm having trouble following where you fixed the 
Lanczos issue

I put in a sanity check at
{code}
int size = ejCol.size();
for (int j = 0; j < size; j++) {
{code}
so that we don't overrun the basis vector size.  

however, based on Jake's comments, I'd say that is a symptom of us calling the 
job wrong.

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, 
> MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, 
> screenshot-1.jpg, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Grant Ingersoll (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142633#comment-13142633
 ] 

Grant Ingersoll commented on MAHOUT-524:


bq. I applied your patch but I'm having trouble following where you fixed the 
Lanczos issue (though from within Eclipse I'm getting OutOfMemory errors...).

Yeah, up your heap to 1024M

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, 
> MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, 
> screenshot-1.jpg, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Jake Mannix (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142629#comment-13142629
 ] 

Jake Mannix commented on MAHOUT-524:


I don't really know anything about the way that SKMD works, so all I can weight 
in is what's going on in Lanczos:

You take an input matrix with some number of rows (this number doesn't matter, 
doesn't show up anywhere) and numCols columns (this number matters a lot).  You 
want desiredRank eigenvectors to pop out in the end.  So you start with some 
initial basisVector (number 0), and you iterate again and again taking your 
input corpus.timesSquared(basisIminusOne) (resultant vector is of size 
numCols), do some orthogonalization against previous vectors, hang onto this 
vector.

Eventually you have desiredRank basisVectors, arranged in the LanczosState 
object in a Map (it could be a Matrix, certainly, it is, but 
we're just hanging onto it before building a matrix soon enough).  Meanwhile, 
we're building up a desiredRank x desiredRank tri-diagonal (ie very sparse) 
matrix using these basis vectors and their inner products.

Now we ask COLT to get the eigenvectors and eigenvalues of the tridiagonal 
matrix, there will be desiredRank eigenvalues, and desiredRank eigenVectors 
(each of dimension desiredRank).

Here we get to where you're getting an NPE.  We walk along the desiredRank^2 
values in the eigenvector matrix ("eigenVects"), and for each of 0... 
desiredRank, we grab the basisVector (we have desiredRank of them, each of size 
numCols) and add a linear multiple of it onto something which will be the final 
eigenvector we'll return at the end of the day.

What is SKMD doing?  

[code]
LanczosState state = new LanczosState(L, overshoot, numDims, 
solver.getInitialVector(L));
Path lanczosSeqFiles = new Path(outputCalc, "eigenvectors-" + 
(System.nanoTime() & 0xFF));
solver.runJob(conf,
  state,
  overshoot,
  true,
  lanczosSeqFiles.toString());
[code]

We're making a LanczosState with specifying numCols = overshoot, desiredRank = 
numDims.

Then we run the solver with desiredRank = overshoot.

Looks like this is inconsistent, the desiredRank should be the same?

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, 
> MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, 
> screenshot-1.jpg, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Jeff Eastman (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142624#comment-13142624
 ] 

Jeff Eastman commented on MAHOUT-524:
-

This result looks like the original result I got when it worked for a while. 
I'm treating the SDMD output as though it were clusters like the other Display 
routines. I think this is not correct but I don't understand what is wrong.

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, 
> MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, 
> screenshot-1.jpg, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Shannon Quinn (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142620#comment-13142620
 ] 

Shannon Quinn commented on MAHOUT-524:
--

Similar results were actually what this issue was *originally* created to 
solve, before code rot created the other problems. The fact that I got actual 
clustering results when I was testing this code two summers ago would seem to 
imply that it's an API issue; DisplaySKM vs SKMDriver data format clashes would 
be my first guess.

I applied your patch but I'm having trouble following where you fixed the 
Lanczos issue (though from within Eclipse I'm getting OutOfMemory errors...).

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, 
> MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, 
> screenshot-1.jpg, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Grant Ingersoll (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142588#comment-13142588
 ] 

Grant Ingersoll commented on MAHOUT-524:


Seems the numDims == 1100 there is supposed to be the size of the affinity 
matrix, which is what we have generated from the sample data, so I guess that 
makes sense.

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, 
> SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Grant Ingersoll (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142583#comment-13142583
 ] 

Grant Ingersoll commented on MAHOUT-524:


in this particular case, the state has 4 basis vectors, but the "size" that j 
is being iterated over is 1100.  Someone isn't going to be happy.  I can see 
the easy fix (don't loop past that), but I don't know enough about Lanczos or 
SKMD to know whether what we are seeing is an artifact of SKMD or if this is a 
bug in Lanzcos.

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, 
> SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Grant Ingersoll (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142586#comment-13142586
 ] 

Grant Ingersoll commented on MAHOUT-524:


I guess the 1100 comes from how we are calling all of this:
{code}SpectralKMeansDriver.run(new Configuration(), affinities, output, 1100, 
2, measure, convergenceDelta, maxIter);{code}

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, 
> SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Grant Ingersoll (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142557#comment-13142557
 ] 

Grant Ingersoll commented on MAHOUT-524:


The NPE is from one of the rowJ values being null (the 4th one).  Line 156 in 
Lanzcos:
{code} Vector rowJ = state.getBasisVector(j);{code}

This looks like an issue in Lanzcos.  Namely, we are assuming the size of the 
basis vectors from the state matches the same size of the ejCol stuff.  Of 
course, this might mean SKMD is doing something wrong.  Perhaps Jake can weigh 
in here.

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, 
> SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Shannon Quinn (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142539#comment-13142539
 ] 

Shannon Quinn commented on MAHOUT-524:
--

I'm just now getting in on this (my environment completely died after a failed 
attempt to upgrade from Ubuntu 10.04 to 10.10...). Could the 
NullPointerException have anything to do SKMD invoking the runJob() in the 
LanczosSolver that I alluded to in my previous comment, i.e. the one for which 
SKMD is the only caller?

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, 
> SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Grant Ingersoll (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142530#comment-13142530
 ] 

Grant Ingersoll commented on MAHOUT-524:


Making this change does indeed get us well past that problem and leads to:

{quote}
Exception in thread "main" java.lang.NullPointerException
at org.apache.mahout.math.DenseVector.assign(DenseVector.java:133)
at 
org.apache.mahout.math.decomposer.lanczos.LanczosSolver.solve(LanczosSolver.java:160)
at 
org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.runJob(DistributedLanczosSolver.java:72)
at 
org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:155)
at 
org.apache.mahout.clustering.display.DisplaySpectralKMeans.main(DisplaySpectralKMeans.java:72)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
{quote}

Not sure if that is a direct correlation to my change or not, but continue to 
debug

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, 
> SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Grant Ingersoll (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142510#comment-13142510
 ] 

Grant Ingersoll commented on MAHOUT-524:


REalizing now that Jeff already said that above.  Digging deeper, however, it 
seems to me that the issue is Hadoop is not expecting there to be a directory 
(tmp) in that directory.  From the looks of it, we just want the part-m- 
file in there, but file status is also returning the tmp dir that gets created 
when we do:
{code}
DistributedRowMatrix L =
VectorMatrixMultiplicationJob.runJob(affSeqFiles, D,
new Path(outputCalc, "laplacian-" + (System.nanoTime() & 0xFF)));
{code}
on line 142 of SpectralKMeansDriver.  I wonder if we simply put that tmp 
directory elsewhere, or make sure that it is deleted when that job is done and 
all will be well?

Perhaps a red herring, testing more.

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, 
> SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Grant Ingersoll (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142477#comment-13142477
 ] 

Grant Ingersoll commented on MAHOUT-524:


Tracing into the Hadoop code, this "data" dir gets appended via a MapFile.  For 
some reason it thinks it has a MapFile here, so it points to something is not 
getting configured correctly.

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, 
> SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Grant Ingersoll (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142461#comment-13142461
 ] 

Grant Ingersoll commented on MAHOUT-524:


bq.  Is there any way we could simplify TimesSquaredJob

Seems like there is an awful log of deprecated Hadoop stuff in there.

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, 
> SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Dan Brickley (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141973#comment-13141973
 ] 

Dan Brickley commented on MAHOUT-524:
-

Shannon, "I'll investigate the manipulation of Configuration objects in SKMD" 
... did you get a chance to do that? 

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, 
> SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-10-22 Thread Shannon Quinn (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133524#comment-13133524
 ] 

Shannon Quinn commented on MAHOUT-524:
--

If there are two DLS.runJob() methods and the spectral code is the only bit of 
code that calls one of the two runJob() methods, then in the interest of making 
the codebase just a tiny bit more maintainable I would vote for switching out 
the runJob() invoked by the spectral code and deleting the other one in DLS 
entirely.

Regarding your tracing of the DRM.times() method, I was having the same 
problem: the fact that there exist so many chained job constructors makes it 
difficult to follow. Is there any way we could simplify TimesSquaredJob? Are 
each of those job creation methods called multiple times throughout the code 
base?

Regarding this issue, it sounds like the problem either resides in TimesSquared 
not correctly setting the path as you mentioned (but this begs the question why 
no other algorithm which uses DRM.times() is running into the same problem), or 
the Configuration voodoo in SKMD is causing problems.

I'll investigate the manipulation of Configuration objects in SKMD this week. 
If you have any thoughts on the other points, please let me know.

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, 
> SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-10-21 Thread Jeff Eastman (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132870#comment-13132870
 ] 

Jeff Eastman commented on MAHOUT-524:
-

I'm running in the Eclipse debugger, debugging DisplaySpectralKMeans. This runs 
in local mode, and fails as reported above.

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, 
> SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-10-21 Thread Dan Brickley (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132867#comment-13132867
 ] 

Dan Brickley commented on MAHOUT-524:
-

re Sean's "I'd restart your cluster."; should it be fine to run the whole thing 
in MAHOUT_LOCAL=true mode, and bypass any complexity / issues from having a 
separate Hadoop cluster / pseudo-cluster?

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, 
> SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-10-21 Thread Jeff Eastman (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132810#comment-13132810
 ] 

Jeff Eastman commented on MAHOUT-524:
-

All of this is buried inside of DistributedLanczosSolver. Either the problem 
resides in there and should impact all users of DLS or it is in the 
SpectralKMeansDriver setup which invokes the DLS. Turns out the DLS.runJob(...) 
method employed (line 65) is only called by spectral clustering (KMeans and 
Eigencuts). The one other caller, DLS.runJob(...) (line 80) is itself never 
called.

Just looking at the invocation site (SpectralKMeansDriver.run() line 155, I see 
two file paths being passed into DLS.runJob(...): the lanczosSeqFiles path is 
output/calculations/eigenvectors-17, the desired output path, and the 
LanczosState is constructed with L, a DRM with inputPath 
examples/output/calculations/laplacian-89. This is the input path which is 
failing in getFileStatus and causing the exception. Both of these look 
reasonable to me.

There are; however, several different Configuration objects being manipulated 
by SKMD. I'm suspicious there is something horked in one of them which is 
causing the DLS file not found.

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, 
> SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-10-20 Thread Jeff Eastman (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132104#comment-13132104
 ] 

Jeff Eastman commented on MAHOUT-524:
-

I've found where the /data is being added to the input path: its in 
SequenceFileInputFormat.listStatus(JobConf). Here is where 
MapFile.DATA_FILE_NAME is appended to get the dataFile path. This seems to not 
be the source of the problem; however, rather I'm looking in DRM.times() where 
it calls TimesSquaredJob.createTimesJobConf(...). Looks to me like this method 
is setting the conf feature "DistributedMatrix.times.inputVector" to the 
correct file path 
(examples/output/calculations/laplacian-25/tmp//DistributedMatrix.times.inputVector/),
 but is not setting the job's input paths, since 
FileInputFormat.getInputPaths(new JobConf(conf)) returns only 
"examples/output/calculations/laplacian-25".

By the time the thread gets to listStatus() after kicking off DRM.times(), the 
JobConf input paths contain only 
"examples/output/calculations/laplacian-113/tmp" and /data is appended to that.

The whole handling of Configurations and JobConfs is very twisted and difficult 
to follow.

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, 
> SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-09-20 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109296#comment-13109296
 ] 

Sean Owen commented on MAHOUT-524:
--

That again looks like an environment issue; the reducer couldn't get data off 
the mapper. I don't know why in this case; you'd have to dig in to logs. I'd 
restart your cluster.

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, 
> SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-09-20 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109295#comment-13109295
 ] 

Lance Norskog commented on MAHOUT-524:
--

Yes, it was MAVEN_OPTS; that one helps.

With today's patch (Sep. 20, 2011 setting the job jars), I get (eventually) 
this error:
{code}
11/09/20 23:10:35 INFO mapred.JobClient: Running job: job_201109191821_0016
11/09/20 23:10:36 INFO mapred.JobClient:  map 0% reduce 0%
11/09/20 23:11:01 INFO mapred.JobClient:  map 100% reduce 0%
11/09/20 23:11:15 INFO mapred.JobClient: Task Id : 
attempt_201109191821_0016_r_00_0, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
11/09/20 23:11:15 WARN mapred.JobClient: Error reading task outputHost is down
11/09/20 23:11:15 WARN mapred.JobClient: Error reading task outputHost is down
11/09/20 23:12:00 INFO mapred.JobClient: Task Id : 
attempt_201109191821_0016_r_00_1, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
{code}

I stripped aff.txt down to a file with 20 nodes, and get the above error. This 
is on a single-node cluster on my laptop. Is it possible to run this job on 
such a small device? (If not, then DisplaySpectralKMeans as a Swing app might 
not be realistic :).


> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, 
> SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-09-20 Thread Shannon Quinn (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108742#comment-13108742
 ] 

Shannon Quinn commented on MAHOUT-524:
--

The full fix is MAHOUT-518 (in progress), where you no longer have to input 
affinity but instead raw data. I can certainly edit the affinity input for the 
time being, but once 518 is finished this point will be moot.

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, 
> SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-09-20 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108739#comment-13108739
 ] 

Sean Owen commented on MAHOUT-524:
--

OK is there an easy fix for your first point? Seems like a matter of input 
parsing

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, 
> SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-09-20 Thread Shannon Quinn (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108721#comment-13108721
 ] 

Shannon Quinn commented on MAHOUT-524:
--

Sean: #4 is actually an off-by-one error that is the result of specifying 
"--dimensions 37" when they are indexed in the input file as 1-37, when the 
program is expecting 0-36. Changing the input parameter to "--dimensions 38" is 
kind of a fix, although it will result in the first row and first column of 
Mahout's internal representation of the affinity matrix to be all 0s.

Regarding the jobs, I have no idea how they ran previously; I never ran into 
that problem when first writing the jobs. Apparently there's a widely-employed 
use-case I simply didn't test?

Beyond that, still can't find the source of the error in the attached 
EclipseLog; wherever that "/tmp" is being appended at the end, it isn't in any 
of the core Mahout code.

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, 
> SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-09-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108486#comment-13108486
 ] 

Hudson commented on MAHOUT-524:
---

Integrated in Mahout-Quality #1051 (See 
[https://builds.apache.org/job/Mahout-Quality/1051/])
MAHOUT-524 added danbri's setJarByClass() patch and logging

srowen : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1172995
Files : 
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/spectral/common/AffinityMatrixInputJob.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/spectral/common/AffinityMatrixInputMapper.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/spectral/common/MatrixDiagonalizeJob.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/spectral/common/UnitVectorizerJob.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/spectral/common/VectorCache.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/spectral/common/VectorMatrixMultiplicationJob.java


> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, 
> SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-09-19 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108386#comment-13108386
 ] 

Sean Owen commented on MAHOUT-524:
--

Lance, in your command line you use "MAVENOPTS" and not "MAVEN_OPTS". Is that 
the issue?

I think I agree with Dan's patch, but wonder how these jobs ever worked 
otherwise? But yes everything needs to call setJar() or setJarByClass(). 
AbstractJob takes care of this for almost all the M/Rs in the project; these 
are not using it.

I think you're welcome to propose patches for your improvements #2 and #3. I 
don't know the answer for #4: if it's OK for there to be nothing in the vector 
cache at this point, the code shouldn't assume there is. And if the cache 
should have something I don't know why there isn't.

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, 
> SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-09-19 Thread Dan Brickley (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108353#comment-13108353
 ] 

Dan Brickley commented on MAHOUT-524:
-

re job jar error, see MAHOUT-428 MAHOUT-197.

draft patch: 
https://raw.github.com/gist/1200439/4ad433b51e9d963cff5d500d974fa5cb6904b9c3/gistfile1.txt


I posted a patch that got me past those errors in the recent mailing list 
thread 'Spectral clustering - a bundle of issues'. I'll paste the relevant 
chunk of my email below. see 
http://comments.gmane.org/gmane.comp.apache.mahout.user/9319


-

Trying to run https://cwiki.apache.org/MAHOUT/spectral-clustering.html
... seems perhaps some code rot?

Can anyone else report success with Spectral clustering against recent trunk?

Trying bin/mahout spectralkmeans -k 2 -i speccy -o specout --maxIter
10 --dimensions 37

...with the small example affinity file we discussed yesterday, I hit
a series of problems.

data: http://danbri.org/2011/mahout/afftest.txt

1. As I mentioned in comments in
http://spectrallyclustered.wordpress.com/2010/07/14/sprint-3-quick-update/
(both for local pseudo-cluster, and a real one) I had to patch in
calls to job.setJarByClass before job.waitForCompletion. This problem
occured for others elsewhere in Mahout, e.g. MAHOUT-428 and
MAHOUT-197, but I presume it can't be hitting everyone. From grepping
around, this might not be the only component missing setJarByClass
calls. Or is this just me, somehow?

2. Newlines in the input data made it fail, but the associated warning
from AffinityMatrixInputMapper was very vague. I'd suggest allowing
those and #-comments, but maybe not a good idea to make per-component
syntax designs? Suggest also it's worth printing the problem line (see
patch below) when complaining.

3. Failing to load the affinity matrix (surely a requirement for
further progress?) does not seem to halt the job, I see exceptions
mixed in with ongoing processing (until a later problem hits us).
Transcript: https://gist.github.com/1200455 ... actually it wasn't
clear if the newline problem was more of a warning, and other rows
from the input data were accepted. In which case, reporting them as
java.io.IOException seems a bit draconian. So maybe bits of the input
file were in fact loaded. It would be great to clarify what expected
behaviour is.


4. After all that, the job still fails. Full transcript here:
https://gist.github.com/1200428

Excerpt: (I've added a bit more reporting output in a few places)

11/09/07 14:25:06 INFO common.VectorCache: Loading vector from:
specout/calculations/diagonal/part-r-0
Exception in thread "main" java.util.NoSuchElementException
   at 
com.google.common.collect.AbstractIterator.next(AbstractIterator.java:152)
   at 
org.apache.mahout.clustering.spectral.common.VectorCache.load(VectorCache.java:121)

However that file does exist in hdfs, and seqdumper seems to accept
it; it just seems empty:

Input Path: specout/calculations/diagonal/part-r-0
Key class: class org.apache.hadoop.io.NullWritable Value Class: class
org.apache.mahout.math.VectorWritable
Count: 0

I've posted an informal composite patch at
https://raw.github.com/gist/1200439/4ad433b51e9d963cff5d500d974fa5cb6904b9c3/gistfile1.txt
 ... if you can confirm the above issues and a breakdown into JIRAs,
I'll attach cleaner patches where appropriate.


> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, 
> SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-09-19 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108324#comment-13108324
 ] 

Lance Norskog commented on MAHOUT-524:
--

1) I hiked it up and up. Is it possible the option did not transmit to the JVM 
that runs the job?
2) I did not have this problem under Eclipse.

In a separate investigation, running the _spectralkmeans_ gives the attached 
command-line failure log attached as SpectralKMeans_fail_20110919.txt. Yes, 
this is the 'get jars out to the hadoop executor' problem. The 'job' jar does 
not seem to do what it needs. Again, note that one failure does not cause the 
whole job to exit. I submit that there are multiple problems inside the job, 
and somehow there is a problem where the main job configurations do not get 
transmitted to a subsidiary executor.





> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, aff.txt, raw.txt, 
> spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-09-19 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107668#comment-13107668
 ] 

Sean Owen commented on MAHOUT-524:
--

This is just an OutOfMemoryError. You have to tell Maven to use more memory for 
its JVM or else most M/R jobs will fail like this locally. Use 
MAVEN_OPTS=-Xmx1g . I'm afraid this isn't the issue.

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, aff.txt, raw.txt, 
> spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-09-18 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107640#comment-13107640
 ] 

Lance Norskog commented on MAHOUT-524:
--

As for 5-d points v.s. 2-d points, SVD does a great job, followed by random 
projection.

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, aff.txt, raw.txt, 
> spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-09-18 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107635#comment-13107635
 ] 

Lance Norskog commented on MAHOUT-524:
--

For completeness, the log when running under Eclipse is attached as 
EclipseLog_20110918.txt

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-09-18 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107633#comment-13107633
 ] 

Lance Norskog commented on MAHOUT-524:
--

Possibly a little help. When run from the command line via mvn exec, this is 
the error log. Note that
a) an exception happens in an early m/r pass, and 
b) the exception is ignored by the full job executor.
(MacOS X "Kitty Liver")

_lance$ MAVENOPTS=Xmx1000m mvn -q exec:java 
-Dexec.mainClass="org.apache.mahout.clustering.display.DisplaySpectralKMeans"_
{code}
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/Users/lancenorskog/.m2/repository/org/slf4j/slf4j-log4j12/1.6.1/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/Users/lancenorskog/.m2/repository/org/slf4j/slf4j-jcl/1.6.1/slf4j-jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
11/09/18 22:25:26 INFO common.HadoopUtil: Deleting samples
11/09/18 22:25:26 INFO common.HadoopUtil: Deleting output
11/09/18 22:25:26 INFO display.DisplayClustering: Generating 500 samples 
m=[1.0, 1.0] sd=3.0
11/09/18 22:25:26 INFO display.DisplayClustering: Generating 300 samples 
m=[1.0, 0.0] sd=0.5
11/09/18 22:25:26 INFO display.DisplayClustering: Generating 300 samples 
m=[0.0, 2.0] sd=0.1
11/09/18 22:25:28 WARN mapred.JobClient: Use GenericOptionsParser for parsing 
the arguments. Applications should implement Tool for the same.
11/09/18 22:25:28 WARN mapred.JobClient: No job jar file set.  User classes may 
not be found. See JobConf(Class) or JobConf#setJar(String).
11/09/18 22:25:28 INFO input.FileInputFormat: Total input paths to process : 1
11/09/18 22:25:28 INFO mapred.JobClient: Running job: job_local_0001
11/09/18 22:25:28 INFO mapred.MapTask: io.sort.mb = 100
*11/09/18 22:25:29 WARN mapred.LocalJobRunner: job_local_0001
java.lang.OutOfMemoryError: Java heap space
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:949)
at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:674)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)*
11/09/18 22:25:29 INFO mapred.JobClient:  map 0% reduce 0%
11/09/18 22:25:29 INFO mapred.JobClient: Job complete: job_local_0001
{code}

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-09-13 Thread Dan Brickley (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103497#comment-13103497
 ] 

Dan Brickley commented on MAHOUT-524:
-

I had a look around, failed to find a string in the mahout java source 
corresponding to that path; presume it's coming from an included module or 
config file.

Hadoop btw has 

./io/MapFile.java:  public static final String DATA_FILE_NAME = "data";

though I don't see any direct use of MapFile or DATA_FILE_NAME, I'm only 
grepping around textually; Eclipse might have smarter tooling. 

http://lucene.472066.n3.nabble.com/Overhauled-org-apache-mahout-cf-taste-hadoop-item-td745286.html
 suggests mapfile isn't so much used any more, so this might be a false lead.

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-09-11 Thread Shannon Quinn (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102404#comment-13102404
 ] 

Shannon Quinn commented on MAHOUT-524:
--

Just for grins, I tried:

FileInputFormat.getInputPaths(conf).length

right before the TimesSquareJob started, and it was 1, not 2. Ever more 
confused.

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-09-11 Thread Shannon Quinn (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102380#comment-13102380
 ] 

Shannon Quinn commented on MAHOUT-524:
--

I've been tooling around with this code for a few hours now and cannot figure 
out where the pesky "/data" is being appended to the overall path...or why the 
second Path that Lance mentioned isn't what is actually being used. It has to 
be somewhere in the Lanczos solver code (filtering into the 
DistributedRowMatrix and its TimesSquaredJob, as the latter is what is actually 
causing the exception), but in all my searching and println()-ing of paths I 
can't seem to find it.

I'm going to keep looking, but any help in finding this bug would be greatly 
appreciated.

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-09-10 Thread Dan Brickley (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102171#comment-13102171
 ] 

Dan Brickley commented on MAHOUT-524:
-

Not sure if you're mixing me and Danny Bickson, but I've certainly seen these 
errors mentioning tmp/data paths, ... but the problem was when attempting 
spectral clustering; I didn't get as far as having any results to display.

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-09-10 Thread Shannon Quinn (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102165#comment-13102165
 ] 

Shannon Quinn commented on MAHOUT-524:
--

I believe this is the exact problem Dan Bickson picked up on his thread to the 
users list; I'm working on this. The problem is somewhere in the 
SpectralKMeansDriver in how I set up the Paths that are used. Will update this 
week.

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-08-31 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095021#comment-13095021
 ] 

Lance Norskog commented on MAHOUT-524:
--

Running DisplaySpectralKMeans gives this error:

FileNotFound:Exception
examples/output/calculations/laplacian-48/tmp/data not found

In fact, the data is stored here:

examples/output/calculations/laplacian-48/tmp/1314835934416372000/DistributedMatrix.times.inputVector/

Any hints on exactly which API call is wrong?

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-08-28 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092642#comment-13092642
 ] 

Lance Norskog commented on MAHOUT-524:
--

+1

I'm documenting the Display outputs and it would be nice to have all of them :)

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-08-17 Thread Jeff Eastman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13086595#comment-13086595
 ] 

Jeff Eastman commented on MAHOUT-524:
-

The original example was extracting 5 eigenvectors and thus returned 5-d 
results. I changed it to extract 2 vectors and it used to run but displayed 
incorrect results.

I'm (still since pre 0.5 testing, IIRC) getting a FileNotFoundException in the 
bowels of DRM.times while running this in local Hadoop mode. I wonder if it is 
possible to add a --method sequential implementation for SpectralKMeans to help 
separate the algorithmetic issues from the file bookkeeping ones?

We have a sequential Lanczos implementation...

Exception in thread "main" java.lang.IllegalStateException: 
java.io.FileNotFoundException: File 
file:/home/dev/workspace/mahout/examples/output/calculations/laplacian-33/tmp/data
 does not exist.
at 
org.apache.mahout.math.hadoop.DistributedRowMatrix.times(DistributedRowMatrix.java:222)
at 
org.apache.mahout.math.decomposer.lanczos.LanczosSolver.solve(LanczosSolver.java:104)
at 
org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.runJob(DistributedLanczosSolver.java:72)
at 
org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:155)
at 
org.apache.mahout.clustering.display.DisplaySpectralKMeans.main(DisplaySpectralKMeans.java:72)
Caused by: java.io.FileNotFoundException: File 
file:/home/dev/workspace/mahout/examples/output/calculations/laplacian-33/tmp/data
 does not exist.
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371)
at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
at 
org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:51)
at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:211)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:929)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:921)
at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:838)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:791)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:791)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:765)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1200)
at 
org.apache.mahout.math.hadoop.DistributedRowMatrix.times(DistributedRowMatrix.java:214)
... 4 more


> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Jeff Eastman
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-05-25 Thread Shannon Quinn
> Let me see if I'm following you. In the display example, there are 1100, 2d
> vectors generated as raw data D which is 1100x2. Then, the preprocessing
> step uses a distance measure to produce A, which is 1100x1100. They are not
> really affinities, more like distances, so I may have missed the boat on
> that step. Since the distance measure is reducing the [2] dimensionality of
> the Di and Dj vectors with a scalar (aij), I don't see how to reconstruct D
> from A.
>
>
You don't necessarily need to be able to reconstruct D from A, so I suppose
this is where the fourier transform analogy breaks down. A is indexed by row
and column according to the original data, so as long as you know know the
order from which the rows and columns of A were derived from D, then you can
transiently identify the points in D by index.


> KMeans will cluster all the input vectors in an arbitrary order if on a N>1
> cluster and so Di and Dj will lose their index positions in the result. If
> the D vectors are NamedVectors, with their index as the name, then this will
> flow through to the clustered points at the output. The order of those
> points won't bear much relation to the order of the input, but the names
> will be preserved. KMeans does not mess with the order of the elements
> within each D vector. I don't know if this is sufficient or if Lanczos does
> anything similar.
>

Like Ted mentioned, NamedVector may be the key here to identifying the
original points from the clustered projected data. That's probably the right
way to go.


>
> -Original Message-
> From: [email protected] [mailto:[email protected]] On Behalf
> Of Shannon Quinn
> Sent: Tuesday, May 24, 2011 2:10 PM
> To: [email protected]
> Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example
> fails
>
> You're right, that would give you the affinity matrix. However, the
> affinity
> matrix is an easier beast to tame since the matrix is constructed with all
> the points' orders preserved: aff[i][j] is the relationship between
> original_point[i] and original_point[j], so for all practical purposes I
> treat this as the "original data" (since it's easy to go back and forth
> between the two).
>
> Problem is, I'm not sure if the Lanczos solver or K-Means preserve this
> ordering of indices. Does the nth point with label y from the result of
> K-means correspond to the nth row of the column matrix of eigenvectors? If
> so, then does that nth row from the eigenvector matrix also correspond to
> the nth original data point (the one represented by proxy by row n and
> column n of the affinity matrix)? If both these conditions are true, then
> and only then can we say that original_point[n]'s cluster is y.
>
> On Tue, May 24, 2011 at 4:39 PM, Jeff Eastman  wrote:
>
> > Would that give you the original data matrix, the clustered data matrix,
> or
> > the clustered affinity matrix? Even with the analogy in mind I'm having
> > trouble connecting the dots. Seems like I lost the original data matrix
> in
> > step 1 when I used a distance measure to produce A from it. If the
> returned
> > eigenvectors define Q, then what is the significance of QAQ^-1? And, more
> > importantly, if the Q eigenvectors define the clusters in eigenspace,
> what
> > is the inverse transformation?
> >
> > -----Original Message-
> > From: [email protected] [mailto:[email protected]] On Behalf
> > Of Shannon Quinn
> > Sent: Tuesday, May 24, 2011 12:07 PM
> > To: [email protected]
> > Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans
> example
> > fails
> >
> > That's an excellent analogy! Employing that strategy, would it be
> possible
> > (and not too expensive) to do the QAQ^-1 operation to get the original
> data
> > matrix, after we've clustered the points in eigenspace?
> >
> > On Tue, May 24, 2011 at 2:59 PM, Jeff Eastman 
> wrote:
> >
> > > For the display example, it is not necessary to cluster the original
> > > points. The other clustering display examples only train the clusters
> and
> > do
> > > not classify the points. They are drawn first and the cluster centers &
> > > radii are superimposed afterwards. Thus I think it is only necessary to
> > > back-transform the clusters.
> > >
> > > My EE gut tells me this is like Fourier transforms between time- and
> > > frequency-domains. If this is true then what we need is the inverse
> > > transform. Is this a correct analogy?
> > >
> > > -Original Message-
> > &

Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-05-24 Thread Ted Dunning
Nahh... the names are the key.

On Tue, May 24, 2011 at 2:49 PM, Jeff Eastman  wrote:

> If the D vectors are NamedVectors, with their index as the name, then this
> will flow through to the clustered points at the output. The order of those
> points won't bear much relation to the order of the input, but the names
> will be preserved. KMeans does not mess with the order of the elements
> within each D vector. I don't know if this is sufficient or if Lanczos does
> anything similar.


Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-05-24 Thread Ted Dunning
Well, it isn't entirely simple, but for suitable distances, D can often be
reverse engineered subject to various isometries like rotation and inversion
that don't change distance.

On Tue, May 24, 2011 at 2:49 PM, Jeff Eastman  wrote:

> Since the distance measure is reducing the [2] dimensionality of the Di and
> Dj vectors with a scalar (aij), I don't see how to reconstruct D from A.


RE: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-05-24 Thread Jeff Eastman
Let me see if I'm following you. In the display example, there are 1100, 2d 
vectors generated as raw data D which is 1100x2. Then, the preprocessing step 
uses a distance measure to produce A, which is 1100x1100. They are not really 
affinities, more like distances, so I may have missed the boat on that step. 
Since the distance measure is reducing the [2] dimensionality of the Di and Dj 
vectors with a scalar (aij), I don't see how to reconstruct D from A.

KMeans will cluster all the input vectors in an arbitrary order if on a N>1 
cluster and so Di and Dj will lose their index positions in the result. If the 
D vectors are NamedVectors, with their index as the name, then this will flow 
through to the clustered points at the output. The order of those points won't 
bear much relation to the order of the input, but the names will be preserved. 
KMeans does not mess with the order of the elements within each D vector. I 
don't know if this is sufficient or if Lanczos does anything similar.

-Original Message-
From: [email protected] [mailto:[email protected]] On Behalf Of 
Shannon Quinn
Sent: Tuesday, May 24, 2011 2:10 PM
To: [email protected]
Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

You're right, that would give you the affinity matrix. However, the affinity
matrix is an easier beast to tame since the matrix is constructed with all
the points' orders preserved: aff[i][j] is the relationship between
original_point[i] and original_point[j], so for all practical purposes I
treat this as the "original data" (since it's easy to go back and forth
between the two).

Problem is, I'm not sure if the Lanczos solver or K-Means preserve this
ordering of indices. Does the nth point with label y from the result of
K-means correspond to the nth row of the column matrix of eigenvectors? If
so, then does that nth row from the eigenvector matrix also correspond to
the nth original data point (the one represented by proxy by row n and
column n of the affinity matrix)? If both these conditions are true, then
and only then can we say that original_point[n]'s cluster is y.

On Tue, May 24, 2011 at 4:39 PM, Jeff Eastman  wrote:

> Would that give you the original data matrix, the clustered data matrix, or
> the clustered affinity matrix? Even with the analogy in mind I'm having
> trouble connecting the dots. Seems like I lost the original data matrix in
> step 1 when I used a distance measure to produce A from it. If the returned
> eigenvectors define Q, then what is the significance of QAQ^-1? And, more
> importantly, if the Q eigenvectors define the clusters in eigenspace, what
> is the inverse transformation?
>
> -Original Message-
> From: [email protected] [mailto:[email protected]] On Behalf
> Of Shannon Quinn
> Sent: Tuesday, May 24, 2011 12:07 PM
> To: [email protected]
> Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example
> fails
>
> That's an excellent analogy! Employing that strategy, would it be possible
> (and not too expensive) to do the QAQ^-1 operation to get the original data
> matrix, after we've clustered the points in eigenspace?
>
> On Tue, May 24, 2011 at 2:59 PM, Jeff Eastman  wrote:
>
> > For the display example, it is not necessary to cluster the original
> > points. The other clustering display examples only train the clusters and
> do
> > not classify the points. They are drawn first and the cluster centers &
> > radii are superimposed afterwards. Thus I think it is only necessary to
> > back-transform the clusters.
> >
> > My EE gut tells me this is like Fourier transforms between time- and
> > frequency-domains. If this is true then what we need is the inverse
> > transform. Is this a correct analogy?
> >
> > -----Original Message-
> > From: [email protected] [mailto:[email protected]] On Behalf
> > Of Shannon Quinn
> > Sent: Tuesday, May 24, 2011 11:39 AM
> > To: [email protected]
> > Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans
> example
> > fails
> >
> > This is actually something I could use a little expert Hadoop assistance
> > on.
> > The general idea is that the points that are clustered in eigenspace have
> a
> > 1-to-1 correspondence with the original points (which is how you get your
> > cluster assignments), but this back-mapping after clustering isn't
> > explicitly implemented yet, since that's the core of the IO issue.
> >
> > My block on this is my lack of understanding in how the actual ordering
> of
> > the points change (or not?) from when they are projected into eigenspace
> > (the Lanczos solver) a

Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-05-24 Thread Ted Dunning
Ordering matters less than labeling.

And another way to put it is that the affinity or distance matrix A should
have the same labels on the rows AND on the columns as were on the rows of
the original matrix.  Thus, the labels on the rows of Q should be the same
as the original labels.

Forming Q' A Q (not QAQ', btw) only gives us the diagonalized form of A
which is just the affinity matrix of the eigen-representations.  That isn't
all that interesting.


On Tue, May 24, 2011 at 2:09 PM, Shannon Quinn  wrote:

> Problem is, I'm not sure if the Lanczos solver or K-Means preserve this
> ordering of indices. Does the nth point with label y from the result of
> K-means correspond to the nth row of the column matrix of eigenvectors?
>


Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-05-24 Thread Shannon Quinn
You're right, that would give you the affinity matrix. However, the affinity
matrix is an easier beast to tame since the matrix is constructed with all
the points' orders preserved: aff[i][j] is the relationship between
original_point[i] and original_point[j], so for all practical purposes I
treat this as the "original data" (since it's easy to go back and forth
between the two).

Problem is, I'm not sure if the Lanczos solver or K-Means preserve this
ordering of indices. Does the nth point with label y from the result of
K-means correspond to the nth row of the column matrix of eigenvectors? If
so, then does that nth row from the eigenvector matrix also correspond to
the nth original data point (the one represented by proxy by row n and
column n of the affinity matrix)? If both these conditions are true, then
and only then can we say that original_point[n]'s cluster is y.

On Tue, May 24, 2011 at 4:39 PM, Jeff Eastman  wrote:

> Would that give you the original data matrix, the clustered data matrix, or
> the clustered affinity matrix? Even with the analogy in mind I'm having
> trouble connecting the dots. Seems like I lost the original data matrix in
> step 1 when I used a distance measure to produce A from it. If the returned
> eigenvectors define Q, then what is the significance of QAQ^-1? And, more
> importantly, if the Q eigenvectors define the clusters in eigenspace, what
> is the inverse transformation?
>
> -Original Message-
> From: [email protected] [mailto:[email protected]] On Behalf
> Of Shannon Quinn
> Sent: Tuesday, May 24, 2011 12:07 PM
> To: [email protected]
> Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example
> fails
>
> That's an excellent analogy! Employing that strategy, would it be possible
> (and not too expensive) to do the QAQ^-1 operation to get the original data
> matrix, after we've clustered the points in eigenspace?
>
> On Tue, May 24, 2011 at 2:59 PM, Jeff Eastman  wrote:
>
> > For the display example, it is not necessary to cluster the original
> > points. The other clustering display examples only train the clusters and
> do
> > not classify the points. They are drawn first and the cluster centers &
> > radii are superimposed afterwards. Thus I think it is only necessary to
> > back-transform the clusters.
> >
> > My EE gut tells me this is like Fourier transforms between time- and
> > frequency-domains. If this is true then what we need is the inverse
> > transform. Is this a correct analogy?
> >
> > -Original Message-
> > From: [email protected] [mailto:[email protected]] On Behalf
> > Of Shannon Quinn
> > Sent: Tuesday, May 24, 2011 11:39 AM
> > To: [email protected]
> > Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans
> example
> > fails
> >
> > This is actually something I could use a little expert Hadoop assistance
> > on.
> > The general idea is that the points that are clustered in eigenspace have
> a
> > 1-to-1 correspondence with the original points (which is how you get your
> > cluster assignments), but this back-mapping after clustering isn't
> > explicitly implemented yet, since that's the core of the IO issue.
> >
> > My block on this is my lack of understanding in how the actual ordering
> of
> > the points change (or not?) from when they are projected into eigenspace
> > (the Lanczos solver) and when K-means makes its cluster assignments. On a
> > one-node setup the original ordering appears to be preserved through all
> > the
> > operations, so the labels of the original points can be assigned by
> giving
> > original_point[i] the label of projected_point[i], hence the cluster
> > assignments are easy to determine. For multi-node setups, however, I
> simply
> > don't know if this heuristic holds.
> >
> > But I believe the immediate issue here is that we're feeding the
> projected
> > points to the display, when it should be the original points *annotated*
> > with the cluster assignments from the corresponding projected points. The
> > question is how to shift those assignments over robustly; right now it's
> > just a hack job in the SpectralKMeansDriver...or maybe (hopefully!) it's
> > just the version I have locally :o)
> >
> > On Tue, May 24, 2011 at 2:13 PM, Jeff Eastman 
> wrote:
> >
> > > Yes, I expect it is pilot error on my part. The original implementation
> > was
> > > failing in this manner because I was requesting 5 eigenvectors
> > (clusters). I
> > > changed it to 2 and now it dis

RE: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-05-24 Thread Jeff Eastman
Would that give you the original data matrix, the clustered data matrix, or the 
clustered affinity matrix? Even with the analogy in mind I'm having trouble 
connecting the dots. Seems like I lost the original data matrix in step 1 when 
I used a distance measure to produce A from it. If the returned eigenvectors 
define Q, then what is the significance of QAQ^-1? And, more importantly, if 
the Q eigenvectors define the clusters in eigenspace, what is the inverse 
transformation?

-Original Message-
From: [email protected] [mailto:[email protected]] On Behalf Of 
Shannon Quinn
Sent: Tuesday, May 24, 2011 12:07 PM
To: [email protected]
Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

That's an excellent analogy! Employing that strategy, would it be possible
(and not too expensive) to do the QAQ^-1 operation to get the original data
matrix, after we've clustered the points in eigenspace?

On Tue, May 24, 2011 at 2:59 PM, Jeff Eastman  wrote:

> For the display example, it is not necessary to cluster the original
> points. The other clustering display examples only train the clusters and do
> not classify the points. They are drawn first and the cluster centers &
> radii are superimposed afterwards. Thus I think it is only necessary to
> back-transform the clusters.
>
> My EE gut tells me this is like Fourier transforms between time- and
> frequency-domains. If this is true then what we need is the inverse
> transform. Is this a correct analogy?
>
> -Original Message-
> From: [email protected] [mailto:[email protected]] On Behalf
> Of Shannon Quinn
> Sent: Tuesday, May 24, 2011 11:39 AM
> To: [email protected]
> Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example
> fails
>
> This is actually something I could use a little expert Hadoop assistance
> on.
> The general idea is that the points that are clustered in eigenspace have a
> 1-to-1 correspondence with the original points (which is how you get your
> cluster assignments), but this back-mapping after clustering isn't
> explicitly implemented yet, since that's the core of the IO issue.
>
> My block on this is my lack of understanding in how the actual ordering of
> the points change (or not?) from when they are projected into eigenspace
> (the Lanczos solver) and when K-means makes its cluster assignments. On a
> one-node setup the original ordering appears to be preserved through all
> the
> operations, so the labels of the original points can be assigned by giving
> original_point[i] the label of projected_point[i], hence the cluster
> assignments are easy to determine. For multi-node setups, however, I simply
> don't know if this heuristic holds.
>
> But I believe the immediate issue here is that we're feeding the projected
> points to the display, when it should be the original points *annotated*
> with the cluster assignments from the corresponding projected points. The
> question is how to shift those assignments over robustly; right now it's
> just a hack job in the SpectralKMeansDriver...or maybe (hopefully!) it's
> just the version I have locally :o)
>
> On Tue, May 24, 2011 at 2:13 PM, Jeff Eastman  wrote:
>
> > Yes, I expect it is pilot error on my part. The original implementation
> was
> > failing in this manner because I was requesting 5 eigenvectors
> (clusters). I
> > changed it to 2 and now it displays something but it is not even close to
> > correct. I think this is because I have not transformed back from eigen
> > space to vector space. This all relates to the IO issue for the spectral
> > clustering code which I don't grok.
> >
> > The display driver begins with the sample points and generates the
> affinity
> > matrix using a distance measure. Not clear this is even a correct
> > interpretation of that matrix. Then spectral kmeans runs and produces 2
> > clusters which I display directly. Seems like this number should be more
> > like the k in kmeans, and 5 was more realistic given the data. I believe
> > there is a missing output transformation to recover the clusters from the
> > eigenvectors but I don't know how to do that.
> >
> > I bet you do :)
> >
> > -Original Message-
> > From: Shannon Quinn (JIRA) [mailto:[email protected]]
> > Sent: Tuesday, May 24, 2011 8:07 AM
> > To: [email protected]
> > Subject: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example
> > fails
> >
> >
> >[
> >
> https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comm

Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-05-24 Thread Shannon Quinn
More or less follow the data through the pipeline?

On Tue, May 24, 2011 at 3:08 PM, Ted Dunning  wrote:

> Yes.  That can be done, but you probably can just remember the references.
>
> On Tue, May 24, 2011 at 12:06 PM, Shannon Quinn  wrote:
>
> > That's an excellent analogy! Employing that strategy, would it be
> possible
> > (and not too expensive) to do the QAQ^-1 operation to get the original
> data
> > matrix, after we've clustered the points in eigenspace?
> >
>


Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-05-24 Thread Ted Dunning
Yes.  That can be done, but you probably can just remember the references.

On Tue, May 24, 2011 at 12:06 PM, Shannon Quinn  wrote:

> That's an excellent analogy! Employing that strategy, would it be possible
> (and not too expensive) to do the QAQ^-1 operation to get the original data
> matrix, after we've clustered the points in eigenspace?
>


Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-05-24 Thread Shannon Quinn
That's an excellent analogy! Employing that strategy, would it be possible
(and not too expensive) to do the QAQ^-1 operation to get the original data
matrix, after we've clustered the points in eigenspace?

On Tue, May 24, 2011 at 2:59 PM, Jeff Eastman  wrote:

> For the display example, it is not necessary to cluster the original
> points. The other clustering display examples only train the clusters and do
> not classify the points. They are drawn first and the cluster centers &
> radii are superimposed afterwards. Thus I think it is only necessary to
> back-transform the clusters.
>
> My EE gut tells me this is like Fourier transforms between time- and
> frequency-domains. If this is true then what we need is the inverse
> transform. Is this a correct analogy?
>
> -Original Message-
> From: [email protected] [mailto:[email protected]] On Behalf
> Of Shannon Quinn
> Sent: Tuesday, May 24, 2011 11:39 AM
> To: [email protected]
> Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example
> fails
>
> This is actually something I could use a little expert Hadoop assistance
> on.
> The general idea is that the points that are clustered in eigenspace have a
> 1-to-1 correspondence with the original points (which is how you get your
> cluster assignments), but this back-mapping after clustering isn't
> explicitly implemented yet, since that's the core of the IO issue.
>
> My block on this is my lack of understanding in how the actual ordering of
> the points change (or not?) from when they are projected into eigenspace
> (the Lanczos solver) and when K-means makes its cluster assignments. On a
> one-node setup the original ordering appears to be preserved through all
> the
> operations, so the labels of the original points can be assigned by giving
> original_point[i] the label of projected_point[i], hence the cluster
> assignments are easy to determine. For multi-node setups, however, I simply
> don't know if this heuristic holds.
>
> But I believe the immediate issue here is that we're feeding the projected
> points to the display, when it should be the original points *annotated*
> with the cluster assignments from the corresponding projected points. The
> question is how to shift those assignments over robustly; right now it's
> just a hack job in the SpectralKMeansDriver...or maybe (hopefully!) it's
> just the version I have locally :o)
>
> On Tue, May 24, 2011 at 2:13 PM, Jeff Eastman  wrote:
>
> > Yes, I expect it is pilot error on my part. The original implementation
> was
> > failing in this manner because I was requesting 5 eigenvectors
> (clusters). I
> > changed it to 2 and now it displays something but it is not even close to
> > correct. I think this is because I have not transformed back from eigen
> > space to vector space. This all relates to the IO issue for the spectral
> > clustering code which I don't grok.
> >
> > The display driver begins with the sample points and generates the
> affinity
> > matrix using a distance measure. Not clear this is even a correct
> > interpretation of that matrix. Then spectral kmeans runs and produces 2
> > clusters which I display directly. Seems like this number should be more
> > like the k in kmeans, and 5 was more realistic given the data. I believe
> > there is a missing output transformation to recover the clusters from the
> > eigenvectors but I don't know how to do that.
> >
> > I bet you do :)
> >
> > -Original Message-
> > From: Shannon Quinn (JIRA) [mailto:[email protected]]
> > Sent: Tuesday, May 24, 2011 8:07 AM
> > To: [email protected]
> > Subject: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example
> > fails
> >
> >
> >[
> >
> https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038608#comment-13038608
> ]
> >
> > Shannon Quinn commented on MAHOUT-524:
> > --
> >
> > +1, I'm on it.
> >
> > I'm a little unclear as to the context of the initial Hudson comment: the
> > display method is expecting 2D vectors, but getting 5D ones?
> >
> > > DisplaySpectralKMeans example fails
> > > ---
> > >
> > > Key: MAHOUT-524
> > > URL: https://issues.apache.org/jira/browse/MAHOUT-524
> > > Project: Mahout
> > >  Issue Type: Bug
> > >  Components: Clustering
> > >Affects Versi

RE: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-05-24 Thread Jeff Eastman
For the display example, it is not necessary to cluster the original points. 
The other clustering display examples only train the clusters and do not 
classify the points. They are drawn first and the cluster centers & radii are 
superimposed afterwards. Thus I think it is only necessary to back-transform 
the clusters. 

My EE gut tells me this is like Fourier transforms between time- and 
frequency-domains. If this is true then what we need is the inverse transform. 
Is this a correct analogy?

-Original Message-
From: [email protected] [mailto:[email protected]] On Behalf Of 
Shannon Quinn
Sent: Tuesday, May 24, 2011 11:39 AM
To: [email protected]
Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

This is actually something I could use a little expert Hadoop assistance on.
The general idea is that the points that are clustered in eigenspace have a
1-to-1 correspondence with the original points (which is how you get your
cluster assignments), but this back-mapping after clustering isn't
explicitly implemented yet, since that's the core of the IO issue.

My block on this is my lack of understanding in how the actual ordering of
the points change (or not?) from when they are projected into eigenspace
(the Lanczos solver) and when K-means makes its cluster assignments. On a
one-node setup the original ordering appears to be preserved through all the
operations, so the labels of the original points can be assigned by giving
original_point[i] the label of projected_point[i], hence the cluster
assignments are easy to determine. For multi-node setups, however, I simply
don't know if this heuristic holds.

But I believe the immediate issue here is that we're feeding the projected
points to the display, when it should be the original points *annotated*
with the cluster assignments from the corresponding projected points. The
question is how to shift those assignments over robustly; right now it's
just a hack job in the SpectralKMeansDriver...or maybe (hopefully!) it's
just the version I have locally :o)

On Tue, May 24, 2011 at 2:13 PM, Jeff Eastman  wrote:

> Yes, I expect it is pilot error on my part. The original implementation was
> failing in this manner because I was requesting 5 eigenvectors (clusters). I
> changed it to 2 and now it displays something but it is not even close to
> correct. I think this is because I have not transformed back from eigen
> space to vector space. This all relates to the IO issue for the spectral
> clustering code which I don't grok.
>
> The display driver begins with the sample points and generates the affinity
> matrix using a distance measure. Not clear this is even a correct
> interpretation of that matrix. Then spectral kmeans runs and produces 2
> clusters which I display directly. Seems like this number should be more
> like the k in kmeans, and 5 was more realistic given the data. I believe
> there is a missing output transformation to recover the clusters from the
> eigenvectors but I don't know how to do that.
>
> I bet you do :)
>
> -Original Message-
> From: Shannon Quinn (JIRA) [mailto:[email protected]]
> Sent: Tuesday, May 24, 2011 8:07 AM
> To: [email protected]
> Subject: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example
> fails
>
>
>[
> https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038608#comment-13038608]
>
> Shannon Quinn commented on MAHOUT-524:
> --
>
> +1, I'm on it.
>
> I'm a little unclear as to the context of the initial Hudson comment: the
> display method is expecting 2D vectors, but getting 5D ones?
>
> > DisplaySpectralKMeans example fails
> > ---
> >
> > Key: MAHOUT-524
> > URL: https://issues.apache.org/jira/browse/MAHOUT-524
> > Project: Mahout
> >  Issue Type: Bug
> >  Components: Clustering
> >Affects Versions: 0.4, 0.5
> >Reporter: Jeff Eastman
> >Assignee: Jeff Eastman
> >  Labels: clustering, k-means, visualization
> > Fix For: 0.6
> >
> > Attachments: aff.txt, raw.txt, spectralkmeans.png
> >
> >
> > I've committed a new display example that attempts to push the standard
> mixture of models data set through spectral k-means. After some tweaking of
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral
> k-means to completion. The display example is expecting 2-d clustered points
> and the example is producing 5-d points. Additional I/O work is needed
> before this will play with the rest of the clustering algorithms.
>
> --
> This message is automatically generated by JIRA.
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>


Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-05-24 Thread Shannon Quinn
This is actually something I could use a little expert Hadoop assistance on.
The general idea is that the points that are clustered in eigenspace have a
1-to-1 correspondence with the original points (which is how you get your
cluster assignments), but this back-mapping after clustering isn't
explicitly implemented yet, since that's the core of the IO issue.

My block on this is my lack of understanding in how the actual ordering of
the points change (or not?) from when they are projected into eigenspace
(the Lanczos solver) and when K-means makes its cluster assignments. On a
one-node setup the original ordering appears to be preserved through all the
operations, so the labels of the original points can be assigned by giving
original_point[i] the label of projected_point[i], hence the cluster
assignments are easy to determine. For multi-node setups, however, I simply
don't know if this heuristic holds.

But I believe the immediate issue here is that we're feeding the projected
points to the display, when it should be the original points *annotated*
with the cluster assignments from the corresponding projected points. The
question is how to shift those assignments over robustly; right now it's
just a hack job in the SpectralKMeansDriver...or maybe (hopefully!) it's
just the version I have locally :o)

On Tue, May 24, 2011 at 2:13 PM, Jeff Eastman  wrote:

> Yes, I expect it is pilot error on my part. The original implementation was
> failing in this manner because I was requesting 5 eigenvectors (clusters). I
> changed it to 2 and now it displays something but it is not even close to
> correct. I think this is because I have not transformed back from eigen
> space to vector space. This all relates to the IO issue for the spectral
> clustering code which I don't grok.
>
> The display driver begins with the sample points and generates the affinity
> matrix using a distance measure. Not clear this is even a correct
> interpretation of that matrix. Then spectral kmeans runs and produces 2
> clusters which I display directly. Seems like this number should be more
> like the k in kmeans, and 5 was more realistic given the data. I believe
> there is a missing output transformation to recover the clusters from the
> eigenvectors but I don't know how to do that.
>
> I bet you do :)
>
> -Original Message-
> From: Shannon Quinn (JIRA) [mailto:[email protected]]
> Sent: Tuesday, May 24, 2011 8:07 AM
> To: [email protected]
> Subject: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example
> fails
>
>
>[
> https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038608#comment-13038608]
>
> Shannon Quinn commented on MAHOUT-524:
> --
>
> +1, I'm on it.
>
> I'm a little unclear as to the context of the initial Hudson comment: the
> display method is expecting 2D vectors, but getting 5D ones?
>
> > DisplaySpectralKMeans example fails
> > ---
> >
> > Key: MAHOUT-524
> > URL: https://issues.apache.org/jira/browse/MAHOUT-524
> > Project: Mahout
> >  Issue Type: Bug
> >  Components: Clustering
> >Affects Versions: 0.4, 0.5
> >Reporter: Jeff Eastman
> >Assignee: Jeff Eastman
> >  Labels: clustering, k-means, visualization
> > Fix For: 0.6
> >
> > Attachments: aff.txt, raw.txt, spectralkmeans.png
> >
> >
> > I've committed a new display example that attempts to push the standard
> mixture of models data set through spectral k-means. After some tweaking of
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral
> k-means to completion. The display example is expecting 2-d clustered points
> and the example is producing 5-d points. Additional I/O work is needed
> before this will play with the rest of the clustering algorithms.
>
> --
> This message is automatically generated by JIRA.
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>


RE: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-05-24 Thread Jeff Eastman
Yes, I expect it is pilot error on my part. The original implementation was 
failing in this manner because I was requesting 5 eigenvectors (clusters). I 
changed it to 2 and now it displays something but it is not even close to 
correct. I think this is because I have not transformed back from eigen space 
to vector space. This all relates to the IO issue for the spectral clustering 
code which I don't grok.

The display driver begins with the sample points and generates the affinity 
matrix using a distance measure. Not clear this is even a correct 
interpretation of that matrix. Then spectral kmeans runs and produces 2 
clusters which I display directly. Seems like this number should be more like 
the k in kmeans, and 5 was more realistic given the data. I believe there is a 
missing output transformation to recover the clusters from the eigenvectors but 
I don't know how to do that.

I bet you do :)

-Original Message-
From: Shannon Quinn (JIRA) [mailto:[email protected]] 
Sent: Tuesday, May 24, 2011 8:07 AM
To: [email protected]
Subject: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails


[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038608#comment-13038608
 ] 

Shannon Quinn commented on MAHOUT-524:
--

+1, I'm on it.

I'm a little unclear as to the context of the initial Hudson comment: the 
display method is expecting 2D vectors, but getting 5D ones?

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Jeff Eastman
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-05-24 Thread Shannon Quinn (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038608#comment-13038608
 ] 

Shannon Quinn commented on MAHOUT-524:
--

+1, I'm on it.

I'm a little unclear as to the context of the initial Hudson comment: the 
display method is expecting 2D vectors, but getting 5D ones?

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Jeff Eastman
>  Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (MAHOUT-524) DisplaySpectralKMeans example fails

2011-01-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982912#action_12982912
 ] 

Hudson commented on MAHOUT-524:
---

Integrated in Mahout-Quality #567 (See 
[https://hudson.apache.org/hudson/job/Mahout-Quality/567/])


> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4
>Reporter: Jeff Eastman
> Fix For: 0.5
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-524) DisplaySpectralKMeans example fails

2011-01-17 Thread Jeff Eastman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982737#action_12982737
 ] 

Jeff Eastman commented on MAHOUT-524:
-

The Display algorithm now runs without errors but the 2 clusters it produces 
are clearly not what I was expecting. Probably a gross misunderstanding on my 
part and a final output processing step that needs to be invented. 

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4
>Reporter: Jeff Eastman
> Fix For: 0.5
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-524) DisplaySpectralKMeans example fails

2011-01-17 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982681#action_12982681
 ] 

Sean Owen commented on MAHOUT-524:
--

Jeff sounds like there is no outstanding issue here at the moment, or something 
more to track here?

> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4
>Reporter: Jeff Eastman
> Fix For: 0.5
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-524) DisplaySpectralKMeans example fails

2010-10-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920226#action_12920226
 ] 

Hudson commented on MAHOUT-524:
---

Integrated in Mahout-Quality #392 (See 
[https://hudson.apache.org/hudson/job/Mahout-Quality/392/])
MAHOUT-524: Moved numEigensWritten initialization out of loop. 
SpectralKMeans now runs to completion but display routing is expecting a 2-d 
vector and is getting a 5-d vector. Not clustering the original input points. 
More to test but CleanEigensJob is working.


> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.4
>Reporter: Jeff Eastman
> Fix For: 0.4
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments it gets remarkably far through, finally failing on 
> W.transpose() after the eigen cleanup. I can't imagine this would all be 
> pilot error so I'm opening an issue to track it. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.