[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206416#comment-13206416 ] Dan Brickley commented on MAHOUT-524: - Shannon informs me I'm getting this error because node IDs must be counted from zero. I've updated the wiki to say this more explicitly. So this JIRA can stay closed, phew. > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, > MAHOUT-524.patch, MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, > aff.txt, raw.txt, screenshot-1.jpg, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206256#comment-13206256 ] Dan Brickley commented on MAHOUT-524: - I just tried spectral k-means with some wikipedia/dbpedia data (1.0 affinities for every page and topic category URL pair in the Wiki. Data came from http://downloads.dbpedia.org/3.7/en/article_categories_en.nt.bz2 and is dropped in the Web at http://danbri.org/2012/spectral/dbpedia/ (I posted .csv plus an int-to-URL dictionary file). My best guess at commandline (running this w/ today's trunk + a fresh 0.20.203.0 hadoop pseudo-cluster) was this: mahout spectralkmeans -i wiki/ -o output1 -k 20 -d 4192499 --maxIter 10 (where hdfs wiki/ subdir contains the .csv data file) Unfortunately I'm hitting one of the various problems discussed above. If anyone else can reproduce this, perhaps a fresh JIRA is needed. It gets stuck after the first job, with an essentially empty seqfile. Full transcript here: https://gist.github.com/1804016 (checked with "mahout seqdumper --seqFile output1/calculations/diagonal/part-r-0") This is essentially the same experience I had back in Sept (see above) running a similar test. > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, > MAHOUT-524.patch, MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, > aff.txt, raw.txt, screenshot-1.jpg, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177385#comment-13177385 ] Hudson commented on MAHOUT-524: --- Integrated in Mahout-Quality #1279 (See [https://builds.apache.org/job/Mahout-Quality/1279/]) MAHOUT-524: committing patch since Shannon has no internet. All tests run jeastman : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1225596 Files : * /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/spectral/common/VectorMatrixMultiplicationJob.java * /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/spectral/kmeans/SpectralKMeansDriver.java * /mahout/trunk/examples/src/main/java/org/apache/mahout/clustering/display/DisplayClustering.java * /mahout/trunk/examples/src/main/java/org/apache/mahout/clustering/display/DisplaySpectralKMeans.java * /mahout/trunk/math/src/main/java/org/apache/mahout/math/decomposer/lanczos/LanczosSolver.java * /mahout/trunk/math/src/main/java/org/apache/mahout/math/decomposer/lanczos/LanczosState.java > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, > MAHOUT-524.patch, MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, > aff.txt, raw.txt, screenshot-1.jpg, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
I just committed the patch with numClusters = 3 in DisplaySpectralKMeans. Jeff On 12/28/11 9:25 PM, Shannon Quinn wrote: Sorry for replying via the dev list, but I am without Internet access beyond my phone. Yes, unless anyone testing can find issues with this patch (or with the one Grant posted earlier, as mine contains his), it is meant to be committed. Due to the aforementioned lack of Internet, if someone could commit this for me that's be fantastic. On Dec 28, 2011, at 17:13, "Jeff Eastman (Commented) (JIRA)" wrote: [ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176876#comment-13176876 ] Jeff Eastman commented on MAHOUT-524: - Shannon, is this patch ready to commit? I've installed it and verified that DisplaySpectralKMeans is indeed finding clusters. By increasing the numClusters from 2 to 3 it now does a credible job of finding the 3 clusters present in the generated data. DisplaySpectralKMeans example fails --- Key: MAHOUT-524 URL: https://issues.apache.org/jira/browse/MAHOUT-524 Project: Mahout Issue Type: Bug Components: Clustering Affects Versions: 0.4, 0.5 Reporter: Jeff Eastman Assignee: Shannon Quinn Labels: clustering, k-means, visualization Fix For: 0.6 Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, MAHOUT-524.patch, MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, screenshot-1.jpg, spectralkmeans.png I've committed a new display example that attempts to push the standard mixture of models data set through spectral k-means. After some tweaking of configuration arguments and a bug fix in EigenCleanupJob it runs spectral k-means to completion. The display example is expecting 2-d clustered points and the example is producing 5-d points. Additional I/O work is needed before this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
Sorry for replying via the dev list, but I am without Internet access beyond my phone. Yes, unless anyone testing can find issues with this patch (or with the one Grant posted earlier, as mine contains his), it is meant to be committed. Due to the aforementioned lack of Internet, if someone could commit this for me that's be fantastic. On Dec 28, 2011, at 17:13, "Jeff Eastman (Commented) (JIRA)" wrote: > >[ > https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176876#comment-13176876 > ] > > Jeff Eastman commented on MAHOUT-524: > - > > Shannon, is this patch ready to commit? I've installed it and verified that > DisplaySpectralKMeans is indeed finding clusters. By increasing the > numClusters from 2 to 3 it now does a credible job of finding the 3 clusters > present in the generated data. > >> DisplaySpectralKMeans example fails >> --- >> >>Key: MAHOUT-524 >>URL: https://issues.apache.org/jira/browse/MAHOUT-524 >>Project: Mahout >> Issue Type: Bug >> Components: Clustering >> Affects Versions: 0.4, 0.5 >> Reporter: Jeff Eastman >> Assignee: Shannon Quinn >> Labels: clustering, k-means, visualization >>Fix For: 0.6 >> >>Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, >> MAHOUT-524.patch, MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, >> aff.txt, raw.txt, screenshot-1.jpg, spectralkmeans.png >> >> >> I've committed a new display example that attempts to push the standard >> mixture of models data set through spectral k-means. After some tweaking of >> configuration arguments and a bug fix in EigenCleanupJob it runs spectral >> k-means to completion. The display example is expecting 2-d clustered points >> and the example is producing 5-d points. Additional I/O work is needed >> before this will play with the rest of the clustering algorithms. > > -- > This message is automatically generated by JIRA. > If you think it was sent incorrectly, please contact your JIRA > administrators: > https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa > For more information on JIRA, see: http://www.atlassian.com/software/jira > >
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176876#comment-13176876 ] Jeff Eastman commented on MAHOUT-524: - Shannon, is this patch ready to commit? I've installed it and verified that DisplaySpectralKMeans is indeed finding clusters. By increasing the numClusters from 2 to 3 it now does a credible job of finding the 3 clusters present in the generated data. > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, > MAHOUT-524.patch, MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, > aff.txt, raw.txt, screenshot-1.jpg, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176203#comment-13176203 ] Dan Brickley commented on MAHOUT-524: - Great to see this getting wrapped up. Can you suggest what commandline(s) and test input others might try to verify this? I have some py-generated afftest.txt left from previous investigations but forget its exact origins. I also have some real world similarity data with labeled items; how would I use those? > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, > MAHOUT-524.patch, MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, > aff.txt, raw.txt, screenshot-1.jpg, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164871#comment-13164871 ] Shannon Quinn commented on MAHOUT-524: -- I believe you and Rozemary need to apply the patch that is attached to this issue to get past the "tmp/data" error. It stems from the Lanczos solver, but is likely a symptom of being called by SKM incorrectly. I'm still working on the patch for this, will hopefully be done soon... > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, > MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, > screenshot-1.jpg, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164713#comment-13164713 ] Kevin Findlay commented on MAHOUT-524: -- Slightly confused I have checked that the Mahout-524 patches are included in my current build of the trunk. However I still get the file not fount error "tmp/data" as decribed in the Subtask. Have I got the versions right? > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, > MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, > screenshot-1.jpg, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163860#comment-13163860 ] Rozemary Scarlat commented on MAHOUT-524: - Hi! I am new to Mahout and I have been trying to use the K-means Spectral Clustering, but I ran into the problem described in the comments above: the Lancsoz solver tries to input the output of the VectorMatrixMultipliction as a "calculations/laplacian-166/tmp/data" file, instead of the "calculations/laplacian-166/part-m-0". I was wondering if currently there is a way to a run the Spectral Clustering. > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, > MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, > screenshot-1.jpg, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163861#comment-13163861 ] Rozemary Scarlat commented on MAHOUT-524: - Hi! I am new to Mahout and I have been trying to use the K-means Spectral Clustering, but I ran into the problem described in the comments above: the Lancsoz solver tries to input the output of the VectorMatrixMultipliction as a "calculations/laplacian-166/tmp/data" file, instead of the "calculations/laplacian-166/part-m-0". I was wondering if currently there is a way to a run the Spectral Clustering. > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, > MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, > screenshot-1.jpg, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155306#comment-13155306 ] Shannon Quinn commented on MAHOUT-524: -- Unknown. Still coding up a way of coloring the dots rather than drawing circles. > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, > MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, > screenshot-1.jpg, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155303#comment-13155303 ] Dan Brickley commented on MAHOUT-524: - (hmm this issue seems something of a proxy for general code rot and problems with the spectral piece of Mahout) Where are we with this? I see "a symptom of us calling the job wrong", and "throwing off the final results". Is the problem purely in the displaying of spectral k-means, or something deeper e.g. if I want eigenvectors and values of laplacian re-representation of an affinity matrix, is the underlying code in a happy state? > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, > MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, > screenshot-1.jpg, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13143512#comment-13143512 ] Grant Ingersoll commented on MAHOUT-524: bq. If at all possible, my suggestion would be colored dots to indicate the clusters. There is no requirement that we have to draw circles or leverage the old code, we just need something that works. > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, > MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, > screenshot-1.jpg, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13143510#comment-13143510 ] Shannon Quinn commented on MAHOUT-524: -- After implementing the same code in Python, my suspicions are actually that the results of the K-means at the conclusion of the spectral algorithm is throwing off the results. Regular K-means is running on the spectral data: the top k-eigenvectors of the affinities, rather than the original data. I don't know K-means well enough to know for sure, but my guess is that all the distance measurements that come back in its output format are relative to the spectral data, rather than the original data. So what you see in the end-result graph are circles around where the spectral data are. That'd be my first guess, anyway. I'm working on a couple things to help with this: a sequential version of spectral k-means, and a job to read raw data (text format: whitespace or comma-separated n-dimensional points) and convert it to affinities (a la issue 518, finally!). Hopefully these will help diagnose spectral k-means. But if it is a data issue, I'm not sure how we can translate the distance measurements on the spectral data back onto the original data for the DisplaySKM code. I would argue, though, that since spectral k-means doesn't operate on the same GMM-type basis that regular K-means does, overlaying K gaussians isn't really what we want here, anyway. If at all possible, my suggestion would be colored dots to indicate the clusters. > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, > MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, > screenshot-1.jpg, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142635#comment-13142635
]
Grant Ingersoll commented on MAHOUT-524:
bq. I applied your patch but I'm having trouble following where you fixed the
Lanczos issue
I put in a sanity check at
{code}
int size = ejCol.size();
for (int j = 0; j < size; j++) {
{code}
so that we don't overrun the basis vector size.
however, based on Jake's comments, I'd say that is a symptom of us calling the
job wrong.
> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
> Issue Type: Bug
> Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
> Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch,
> MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt,
> screenshot-1.jpg, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard
> mixture of models data set through spectral k-means. After some tweaking of
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral
> k-means to completion. The display example is expecting 2-d clustered points
> and the example is producing 5-d points. Additional I/O work is needed before
> this will play with the rest of the clustering algorithms.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142633#comment-13142633 ] Grant Ingersoll commented on MAHOUT-524: bq. I applied your patch but I'm having trouble following where you fixed the Lanczos issue (though from within Eclipse I'm getting OutOfMemory errors...). Yeah, up your heap to 1024M > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, > MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, > screenshot-1.jpg, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142629#comment-13142629
]
Jake Mannix commented on MAHOUT-524:
I don't really know anything about the way that SKMD works, so all I can weight
in is what's going on in Lanczos:
You take an input matrix with some number of rows (this number doesn't matter,
doesn't show up anywhere) and numCols columns (this number matters a lot). You
want desiredRank eigenvectors to pop out in the end. So you start with some
initial basisVector (number 0), and you iterate again and again taking your
input corpus.timesSquared(basisIminusOne) (resultant vector is of size
numCols), do some orthogonalization against previous vectors, hang onto this
vector.
Eventually you have desiredRank basisVectors, arranged in the LanczosState
object in a Map (it could be a Matrix, certainly, it is, but
we're just hanging onto it before building a matrix soon enough). Meanwhile,
we're building up a desiredRank x desiredRank tri-diagonal (ie very sparse)
matrix using these basis vectors and their inner products.
Now we ask COLT to get the eigenvectors and eigenvalues of the tridiagonal
matrix, there will be desiredRank eigenvalues, and desiredRank eigenVectors
(each of dimension desiredRank).
Here we get to where you're getting an NPE. We walk along the desiredRank^2
values in the eigenvector matrix ("eigenVects"), and for each of 0...
desiredRank, we grab the basisVector (we have desiredRank of them, each of size
numCols) and add a linear multiple of it onto something which will be the final
eigenvector we'll return at the end of the day.
What is SKMD doing?
[code]
LanczosState state = new LanczosState(L, overshoot, numDims,
solver.getInitialVector(L));
Path lanczosSeqFiles = new Path(outputCalc, "eigenvectors-" +
(System.nanoTime() & 0xFF));
solver.runJob(conf,
state,
overshoot,
true,
lanczosSeqFiles.toString());
[code]
We're making a LanczosState with specifying numCols = overshoot, desiredRank =
numDims.
Then we run the solver with desiredRank = overshoot.
Looks like this is inconsistent, the desiredRank should be the same?
> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
> Issue Type: Bug
> Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
> Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch,
> MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt,
> screenshot-1.jpg, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard
> mixture of models data set through spectral k-means. After some tweaking of
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral
> k-means to completion. The display example is expecting 2-d clustered points
> and the example is producing 5-d points. Additional I/O work is needed before
> this will play with the rest of the clustering algorithms.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142624#comment-13142624 ] Jeff Eastman commented on MAHOUT-524: - This result looks like the original result I got when it worked for a while. I'm treating the SDMD output as though it were clusters like the other Display routines. I think this is not correct but I don't understand what is wrong. > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, > MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, > screenshot-1.jpg, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142620#comment-13142620 ] Shannon Quinn commented on MAHOUT-524: -- Similar results were actually what this issue was *originally* created to solve, before code rot created the other problems. The fact that I got actual clustering results when I was testing this code two summers ago would seem to imply that it's an API issue; DisplaySKM vs SKMDriver data format clashes would be my first guess. I applied your patch but I'm having trouble following where you fixed the Lanczos issue (though from within Eclipse I'm getting OutOfMemory errors...). > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, > MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, > screenshot-1.jpg, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142588#comment-13142588 ] Grant Ingersoll commented on MAHOUT-524: Seems the numDims == 1100 there is supposed to be the size of the affinity matrix, which is what we have generated from the sample data, so I guess that makes sense. > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, > SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142583#comment-13142583 ] Grant Ingersoll commented on MAHOUT-524: in this particular case, the state has 4 basis vectors, but the "size" that j is being iterated over is 1100. Someone isn't going to be happy. I can see the easy fix (don't loop past that), but I don't know enough about Lanczos or SKMD to know whether what we are seeing is an artifact of SKMD or if this is a bug in Lanzcos. > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, > SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142586#comment-13142586
]
Grant Ingersoll commented on MAHOUT-524:
I guess the 1100 comes from how we are calling all of this:
{code}SpectralKMeansDriver.run(new Configuration(), affinities, output, 1100,
2, measure, convergenceDelta, maxIter);{code}
> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
> Issue Type: Bug
> Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
> Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch,
> SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard
> mixture of models data set through spectral k-means. After some tweaking of
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral
> k-means to completion. The display example is expecting 2-d clustered points
> and the example is producing 5-d points. Additional I/O work is needed before
> this will play with the rest of the clustering algorithms.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142557#comment-13142557
]
Grant Ingersoll commented on MAHOUT-524:
The NPE is from one of the rowJ values being null (the 4th one). Line 156 in
Lanzcos:
{code} Vector rowJ = state.getBasisVector(j);{code}
This looks like an issue in Lanzcos. Namely, we are assuming the size of the
basis vectors from the state matches the same size of the ejCol stuff. Of
course, this might mean SKMD is doing something wrong. Perhaps Jake can weigh
in here.
> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
> Issue Type: Bug
> Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
> Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch,
> SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard
> mixture of models data set through spectral k-means. After some tweaking of
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral
> k-means to completion. The display example is expecting 2-d clustered points
> and the example is producing 5-d points. Additional I/O work is needed before
> this will play with the rest of the clustering algorithms.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142539#comment-13142539 ] Shannon Quinn commented on MAHOUT-524: -- I'm just now getting in on this (my environment completely died after a failed attempt to upgrade from Ubuntu 10.04 to 10.10...). Could the NullPointerException have anything to do SKMD invoking the runJob() in the LanczosSolver that I alluded to in my previous comment, i.e. the one for which SKMD is the only caller? > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, > SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142530#comment-13142530
]
Grant Ingersoll commented on MAHOUT-524:
Making this change does indeed get us well past that problem and leads to:
{quote}
Exception in thread "main" java.lang.NullPointerException
at org.apache.mahout.math.DenseVector.assign(DenseVector.java:133)
at
org.apache.mahout.math.decomposer.lanczos.LanczosSolver.solve(LanczosSolver.java:160)
at
org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.runJob(DistributedLanczosSolver.java:72)
at
org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:155)
at
org.apache.mahout.clustering.display.DisplaySpectralKMeans.main(DisplaySpectralKMeans.java:72)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
{quote}
Not sure if that is a direct correlation to my change or not, but continue to
debug
> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
> Issue Type: Bug
> Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
> Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt,
> SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard
> mixture of models data set through spectral k-means. After some tweaking of
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral
> k-means to completion. The display example is expecting 2-d clustered points
> and the example is producing 5-d points. Additional I/O work is needed before
> this will play with the rest of the clustering algorithms.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142510#comment-13142510
]
Grant Ingersoll commented on MAHOUT-524:
REalizing now that Jeff already said that above. Digging deeper, however, it
seems to me that the issue is Hadoop is not expecting there to be a directory
(tmp) in that directory. From the looks of it, we just want the part-m-
file in there, but file status is also returning the tmp dir that gets created
when we do:
{code}
DistributedRowMatrix L =
VectorMatrixMultiplicationJob.runJob(affSeqFiles, D,
new Path(outputCalc, "laplacian-" + (System.nanoTime() & 0xFF)));
{code}
on line 142 of SpectralKMeansDriver. I wonder if we simply put that tmp
directory elsewhere, or make sure that it is deleted when that job is done and
all will be well?
Perhaps a red herring, testing more.
> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
> Issue Type: Bug
> Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
> Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt,
> SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard
> mixture of models data set through spectral k-means. After some tweaking of
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral
> k-means to completion. The display example is expecting 2-d clustered points
> and the example is producing 5-d points. Additional I/O work is needed before
> this will play with the rest of the clustering algorithms.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142477#comment-13142477 ] Grant Ingersoll commented on MAHOUT-524: Tracing into the Hadoop code, this "data" dir gets appended via a MapFile. For some reason it thinks it has a MapFile here, so it points to something is not getting configured correctly. > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, > SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142461#comment-13142461 ] Grant Ingersoll commented on MAHOUT-524: bq. Is there any way we could simplify TimesSquaredJob Seems like there is an awful log of deprecated Hadoop stuff in there. > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, > SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141973#comment-13141973 ] Dan Brickley commented on MAHOUT-524: - Shannon, "I'll investigate the manipulation of Configuration objects in SKMD" ... did you get a chance to do that? > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, > SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133524#comment-13133524 ] Shannon Quinn commented on MAHOUT-524: -- If there are two DLS.runJob() methods and the spectral code is the only bit of code that calls one of the two runJob() methods, then in the interest of making the codebase just a tiny bit more maintainable I would vote for switching out the runJob() invoked by the spectral code and deleting the other one in DLS entirely. Regarding your tracing of the DRM.times() method, I was having the same problem: the fact that there exist so many chained job constructors makes it difficult to follow. Is there any way we could simplify TimesSquaredJob? Are each of those job creation methods called multiple times throughout the code base? Regarding this issue, it sounds like the problem either resides in TimesSquared not correctly setting the path as you mentioned (but this begs the question why no other algorithm which uses DRM.times() is running into the same problem), or the Configuration voodoo in SKMD is causing problems. I'll investigate the manipulation of Configuration objects in SKMD this week. If you have any thoughts on the other points, please let me know. > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, > SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132870#comment-13132870 ] Jeff Eastman commented on MAHOUT-524: - I'm running in the Eclipse debugger, debugging DisplaySpectralKMeans. This runs in local mode, and fails as reported above. > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, > SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132867#comment-13132867 ] Dan Brickley commented on MAHOUT-524: - re Sean's "I'd restart your cluster."; should it be fine to run the whole thing in MAHOUT_LOCAL=true mode, and bypass any complexity / issues from having a separate Hadoop cluster / pseudo-cluster? > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, > SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132810#comment-13132810 ] Jeff Eastman commented on MAHOUT-524: - All of this is buried inside of DistributedLanczosSolver. Either the problem resides in there and should impact all users of DLS or it is in the SpectralKMeansDriver setup which invokes the DLS. Turns out the DLS.runJob(...) method employed (line 65) is only called by spectral clustering (KMeans and Eigencuts). The one other caller, DLS.runJob(...) (line 80) is itself never called. Just looking at the invocation site (SpectralKMeansDriver.run() line 155, I see two file paths being passed into DLS.runJob(...): the lanczosSeqFiles path is output/calculations/eigenvectors-17, the desired output path, and the LanczosState is constructed with L, a DRM with inputPath examples/output/calculations/laplacian-89. This is the input path which is failing in getFileStatus and causing the exception. Both of these look reasonable to me. There are; however, several different Configuration objects being manipulated by SKMD. I'm suspicious there is something horked in one of them which is causing the DLS file not found. > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, > SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132104#comment-13132104 ] Jeff Eastman commented on MAHOUT-524: - I've found where the /data is being added to the input path: its in SequenceFileInputFormat.listStatus(JobConf). Here is where MapFile.DATA_FILE_NAME is appended to get the dataFile path. This seems to not be the source of the problem; however, rather I'm looking in DRM.times() where it calls TimesSquaredJob.createTimesJobConf(...). Looks to me like this method is setting the conf feature "DistributedMatrix.times.inputVector" to the correct file path (examples/output/calculations/laplacian-25/tmp//DistributedMatrix.times.inputVector/), but is not setting the job's input paths, since FileInputFormat.getInputPaths(new JobConf(conf)) returns only "examples/output/calculations/laplacian-25". By the time the thread gets to listStatus() after kicking off DRM.times(), the JobConf input paths contain only "examples/output/calculations/laplacian-113/tmp" and /data is appended to that. The whole handling of Configurations and JobConfs is very twisted and difficult to follow. > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, > SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109296#comment-13109296 ] Sean Owen commented on MAHOUT-524: -- That again looks like an environment issue; the reducer couldn't get data off the mapper. I don't know why in this case; you'd have to dig in to logs. I'd restart your cluster. > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, > SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109295#comment-13109295
]
Lance Norskog commented on MAHOUT-524:
--
Yes, it was MAVEN_OPTS; that one helps.
With today's patch (Sep. 20, 2011 setting the job jars), I get (eventually)
this error:
{code}
11/09/20 23:10:35 INFO mapred.JobClient: Running job: job_201109191821_0016
11/09/20 23:10:36 INFO mapred.JobClient: map 0% reduce 0%
11/09/20 23:11:01 INFO mapred.JobClient: map 100% reduce 0%
11/09/20 23:11:15 INFO mapred.JobClient: Task Id :
attempt_201109191821_0016_r_00_0, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
11/09/20 23:11:15 WARN mapred.JobClient: Error reading task outputHost is down
11/09/20 23:11:15 WARN mapred.JobClient: Error reading task outputHost is down
11/09/20 23:12:00 INFO mapred.JobClient: Task Id :
attempt_201109191821_0016_r_00_1, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
{code}
I stripped aff.txt down to a file with 20 nodes, and get the above error. This
is on a single-node cluster on my laptop. Is it possible to run this job on
such a small device? (If not, then DisplaySpectralKMeans as a Swing app might
not be realistic :).
> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
> Issue Type: Bug
> Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
> Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: EclipseLog_20110918.txt,
> SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard
> mixture of models data set through spectral k-means. After some tweaking of
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral
> k-means to completion. The display example is expecting 2-d clustered points
> and the example is producing 5-d points. Additional I/O work is needed before
> this will play with the rest of the clustering algorithms.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108742#comment-13108742 ] Shannon Quinn commented on MAHOUT-524: -- The full fix is MAHOUT-518 (in progress), where you no longer have to input affinity but instead raw data. I can certainly edit the affinity input for the time being, but once 518 is finished this point will be moot. > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, > SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108739#comment-13108739 ] Sean Owen commented on MAHOUT-524: -- OK is there an easy fix for your first point? Seems like a matter of input parsing > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, > SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108721#comment-13108721 ] Shannon Quinn commented on MAHOUT-524: -- Sean: #4 is actually an off-by-one error that is the result of specifying "--dimensions 37" when they are indexed in the input file as 1-37, when the program is expecting 0-36. Changing the input parameter to "--dimensions 38" is kind of a fix, although it will result in the first row and first column of Mahout's internal representation of the affinity matrix to be all 0s. Regarding the jobs, I have no idea how they ran previously; I never ran into that problem when first writing the jobs. Apparently there's a widely-employed use-case I simply didn't test? Beyond that, still can't find the source of the error in the attached EclipseLog; wherever that "/tmp" is being appended at the end, it isn't in any of the core Mahout code. > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, > SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108486#comment-13108486 ] Hudson commented on MAHOUT-524: --- Integrated in Mahout-Quality #1051 (See [https://builds.apache.org/job/Mahout-Quality/1051/]) MAHOUT-524 added danbri's setJarByClass() patch and logging srowen : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1172995 Files : * /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/spectral/common/AffinityMatrixInputJob.java * /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/spectral/common/AffinityMatrixInputMapper.java * /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/spectral/common/MatrixDiagonalizeJob.java * /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/spectral/common/UnitVectorizerJob.java * /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/spectral/common/VectorCache.java * /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/spectral/common/VectorMatrixMultiplicationJob.java > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, > SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108386#comment-13108386 ] Sean Owen commented on MAHOUT-524: -- Lance, in your command line you use "MAVENOPTS" and not "MAVEN_OPTS". Is that the issue? I think I agree with Dan's patch, but wonder how these jobs ever worked otherwise? But yes everything needs to call setJar() or setJarByClass(). AbstractJob takes care of this for almost all the M/Rs in the project; these are not using it. I think you're welcome to propose patches for your improvements #2 and #3. I don't know the answer for #4: if it's OK for there to be nothing in the vector cache at this point, the code shouldn't assume there is. And if the cache should have something I don't know why there isn't. > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, > SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108353#comment-13108353 ] Dan Brickley commented on MAHOUT-524: - re job jar error, see MAHOUT-428 MAHOUT-197. draft patch: https://raw.github.com/gist/1200439/4ad433b51e9d963cff5d500d974fa5cb6904b9c3/gistfile1.txt I posted a patch that got me past those errors in the recent mailing list thread 'Spectral clustering - a bundle of issues'. I'll paste the relevant chunk of my email below. see http://comments.gmane.org/gmane.comp.apache.mahout.user/9319 - Trying to run https://cwiki.apache.org/MAHOUT/spectral-clustering.html ... seems perhaps some code rot? Can anyone else report success with Spectral clustering against recent trunk? Trying bin/mahout spectralkmeans -k 2 -i speccy -o specout --maxIter 10 --dimensions 37 ...with the small example affinity file we discussed yesterday, I hit a series of problems. data: http://danbri.org/2011/mahout/afftest.txt 1. As I mentioned in comments in http://spectrallyclustered.wordpress.com/2010/07/14/sprint-3-quick-update/ (both for local pseudo-cluster, and a real one) I had to patch in calls to job.setJarByClass before job.waitForCompletion. This problem occured for others elsewhere in Mahout, e.g. MAHOUT-428 and MAHOUT-197, but I presume it can't be hitting everyone. From grepping around, this might not be the only component missing setJarByClass calls. Or is this just me, somehow? 2. Newlines in the input data made it fail, but the associated warning from AffinityMatrixInputMapper was very vague. I'd suggest allowing those and #-comments, but maybe not a good idea to make per-component syntax designs? Suggest also it's worth printing the problem line (see patch below) when complaining. 3. Failing to load the affinity matrix (surely a requirement for further progress?) does not seem to halt the job, I see exceptions mixed in with ongoing processing (until a later problem hits us). Transcript: https://gist.github.com/1200455 ... actually it wasn't clear if the newline problem was more of a warning, and other rows from the input data were accepted. In which case, reporting them as java.io.IOException seems a bit draconian. So maybe bits of the input file were in fact loaded. It would be great to clarify what expected behaviour is. 4. After all that, the job still fails. Full transcript here: https://gist.github.com/1200428 Excerpt: (I've added a bit more reporting output in a few places) 11/09/07 14:25:06 INFO common.VectorCache: Loading vector from: specout/calculations/diagonal/part-r-0 Exception in thread "main" java.util.NoSuchElementException at com.google.common.collect.AbstractIterator.next(AbstractIterator.java:152) at org.apache.mahout.clustering.spectral.common.VectorCache.load(VectorCache.java:121) However that file does exist in hdfs, and seqdumper seems to accept it; it just seems empty: Input Path: specout/calculations/diagonal/part-r-0 Key class: class org.apache.hadoop.io.NullWritable Value Class: class org.apache.mahout.math.VectorWritable Count: 0 I've posted an informal composite patch at https://raw.github.com/gist/1200439/4ad433b51e9d963cff5d500d974fa5cb6904b9c3/gistfile1.txt ... if you can confirm the above issues and a breakdown into JIRAs, I'll attach cleaner patches where appropriate. > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, > SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108324#comment-13108324 ] Lance Norskog commented on MAHOUT-524: -- 1) I hiked it up and up. Is it possible the option did not transmit to the JVM that runs the job? 2) I did not have this problem under Eclipse. In a separate investigation, running the _spectralkmeans_ gives the attached command-line failure log attached as SpectralKMeans_fail_20110919.txt. Yes, this is the 'get jars out to the hadoop executor' problem. The 'job' jar does not seem to do what it needs. Again, note that one failure does not cause the whole job to exit. I submit that there are multiple problems inside the job, and somehow there is a problem where the main job configurations do not get transmitted to a subsidiary executor. > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, aff.txt, raw.txt, > spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107668#comment-13107668 ] Sean Owen commented on MAHOUT-524: -- This is just an OutOfMemoryError. You have to tell Maven to use more memory for its JVM or else most M/R jobs will fail like this locally. Use MAVEN_OPTS=-Xmx1g . I'm afraid this isn't the issue. > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, aff.txt, raw.txt, > spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107640#comment-13107640 ] Lance Norskog commented on MAHOUT-524: -- As for 5-d points v.s. 2-d points, SVD does a great job, followed by random projection. > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, aff.txt, raw.txt, > spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107635#comment-13107635 ] Lance Norskog commented on MAHOUT-524: -- For completeness, the log when running under Eclipse is attached as EclipseLog_20110918.txt > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: aff.txt, raw.txt, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107633#comment-13107633
]
Lance Norskog commented on MAHOUT-524:
--
Possibly a little help. When run from the command line via mvn exec, this is
the error log. Note that
a) an exception happens in an early m/r pass, and
b) the exception is ignored by the full job executor.
(MacOS X "Kitty Liver")
_lance$ MAVENOPTS=Xmx1000m mvn -q exec:java
-Dexec.mainClass="org.apache.mahout.clustering.display.DisplaySpectralKMeans"_
{code}
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/Users/lancenorskog/.m2/repository/org/slf4j/slf4j-log4j12/1.6.1/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/Users/lancenorskog/.m2/repository/org/slf4j/slf4j-jcl/1.6.1/slf4j-jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
11/09/18 22:25:26 INFO common.HadoopUtil: Deleting samples
11/09/18 22:25:26 INFO common.HadoopUtil: Deleting output
11/09/18 22:25:26 INFO display.DisplayClustering: Generating 500 samples
m=[1.0, 1.0] sd=3.0
11/09/18 22:25:26 INFO display.DisplayClustering: Generating 300 samples
m=[1.0, 0.0] sd=0.5
11/09/18 22:25:26 INFO display.DisplayClustering: Generating 300 samples
m=[0.0, 2.0] sd=0.1
11/09/18 22:25:28 WARN mapred.JobClient: Use GenericOptionsParser for parsing
the arguments. Applications should implement Tool for the same.
11/09/18 22:25:28 WARN mapred.JobClient: No job jar file set. User classes may
not be found. See JobConf(Class) or JobConf#setJar(String).
11/09/18 22:25:28 INFO input.FileInputFormat: Total input paths to process : 1
11/09/18 22:25:28 INFO mapred.JobClient: Running job: job_local_0001
11/09/18 22:25:28 INFO mapred.MapTask: io.sort.mb = 100
*11/09/18 22:25:29 WARN mapred.LocalJobRunner: job_local_0001
java.lang.OutOfMemoryError: Java heap space
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:949)
at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:674)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)*
11/09/18 22:25:29 INFO mapred.JobClient: map 0% reduce 0%
11/09/18 22:25:29 INFO mapred.JobClient: Job complete: job_local_0001
{code}
> DisplaySpectralKMeans example fails
> ---
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
> Issue Type: Bug
> Components: Clustering
>Affects Versions: 0.4, 0.5
>Reporter: Jeff Eastman
>Assignee: Shannon Quinn
> Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard
> mixture of models data set through spectral k-means. After some tweaking of
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral
> k-means to completion. The display example is expecting 2-d clustered points
> and the example is producing 5-d points. Additional I/O work is needed before
> this will play with the rest of the clustering algorithms.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103497#comment-13103497 ] Dan Brickley commented on MAHOUT-524: - I had a look around, failed to find a string in the mahout java source corresponding to that path; presume it's coming from an included module or config file. Hadoop btw has ./io/MapFile.java: public static final String DATA_FILE_NAME = "data"; though I don't see any direct use of MapFile or DATA_FILE_NAME, I'm only grepping around textually; Eclipse might have smarter tooling. http://lucene.472066.n3.nabble.com/Overhauled-org-apache-mahout-cf-taste-hadoop-item-td745286.html suggests mapfile isn't so much used any more, so this might be a false lead. > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: aff.txt, raw.txt, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102404#comment-13102404 ] Shannon Quinn commented on MAHOUT-524: -- Just for grins, I tried: FileInputFormat.getInputPaths(conf).length right before the TimesSquareJob started, and it was 1, not 2. Ever more confused. > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: aff.txt, raw.txt, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102380#comment-13102380 ] Shannon Quinn commented on MAHOUT-524: -- I've been tooling around with this code for a few hours now and cannot figure out where the pesky "/data" is being appended to the overall path...or why the second Path that Lance mentioned isn't what is actually being used. It has to be somewhere in the Lanczos solver code (filtering into the DistributedRowMatrix and its TimesSquaredJob, as the latter is what is actually causing the exception), but in all my searching and println()-ing of paths I can't seem to find it. I'm going to keep looking, but any help in finding this bug would be greatly appreciated. > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: aff.txt, raw.txt, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102171#comment-13102171 ] Dan Brickley commented on MAHOUT-524: - Not sure if you're mixing me and Danny Bickson, but I've certainly seen these errors mentioning tmp/data paths, ... but the problem was when attempting spectral clustering; I didn't get as far as having any results to display. > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: aff.txt, raw.txt, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102165#comment-13102165 ] Shannon Quinn commented on MAHOUT-524: -- I believe this is the exact problem Dan Bickson picked up on his thread to the users list; I'm working on this. The problem is somewhere in the SpectralKMeansDriver in how I set up the Paths that are used. Will update this week. > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: aff.txt, raw.txt, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095021#comment-13095021 ] Lance Norskog commented on MAHOUT-524: -- Running DisplaySpectralKMeans gives this error: FileNotFound:Exception examples/output/calculations/laplacian-48/tmp/data not found In fact, the data is stored here: examples/output/calculations/laplacian-48/tmp/1314835934416372000/DistributedMatrix.times.inputVector/ Any hints on exactly which API call is wrong? > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: aff.txt, raw.txt, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092642#comment-13092642 ] Lance Norskog commented on MAHOUT-524: -- +1 I'm documenting the Display outputs and it would be nice to have all of them :) > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: aff.txt, raw.txt, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13086595#comment-13086595 ] Jeff Eastman commented on MAHOUT-524: - The original example was extracting 5 eigenvectors and thus returned 5-d results. I changed it to extract 2 vectors and it used to run but displayed incorrect results. I'm (still since pre 0.5 testing, IIRC) getting a FileNotFoundException in the bowels of DRM.times while running this in local Hadoop mode. I wonder if it is possible to add a --method sequential implementation for SpectralKMeans to help separate the algorithmetic issues from the file bookkeeping ones? We have a sequential Lanczos implementation... Exception in thread "main" java.lang.IllegalStateException: java.io.FileNotFoundException: File file:/home/dev/workspace/mahout/examples/output/calculations/laplacian-33/tmp/data does not exist. at org.apache.mahout.math.hadoop.DistributedRowMatrix.times(DistributedRowMatrix.java:222) at org.apache.mahout.math.decomposer.lanczos.LanczosSolver.solve(LanczosSolver.java:104) at org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.runJob(DistributedLanczosSolver.java:72) at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:155) at org.apache.mahout.clustering.display.DisplaySpectralKMeans.main(DisplaySpectralKMeans.java:72) Caused by: java.io.FileNotFoundException: File file:/home/dev/workspace/mahout/examples/output/calculations/laplacian-33/tmp/data does not exist. at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:51) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:211) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:929) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:921) at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:838) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:791) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:791) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:765) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1200) at org.apache.mahout.math.hadoop.DistributedRowMatrix.times(DistributedRowMatrix.java:214) ... 4 more > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Jeff Eastman > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: aff.txt, raw.txt, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
> Let me see if I'm following you. In the display example, there are 1100, 2d > vectors generated as raw data D which is 1100x2. Then, the preprocessing > step uses a distance measure to produce A, which is 1100x1100. They are not > really affinities, more like distances, so I may have missed the boat on > that step. Since the distance measure is reducing the [2] dimensionality of > the Di and Dj vectors with a scalar (aij), I don't see how to reconstruct D > from A. > > You don't necessarily need to be able to reconstruct D from A, so I suppose this is where the fourier transform analogy breaks down. A is indexed by row and column according to the original data, so as long as you know know the order from which the rows and columns of A were derived from D, then you can transiently identify the points in D by index. > KMeans will cluster all the input vectors in an arbitrary order if on a N>1 > cluster and so Di and Dj will lose their index positions in the result. If > the D vectors are NamedVectors, with their index as the name, then this will > flow through to the clustered points at the output. The order of those > points won't bear much relation to the order of the input, but the names > will be preserved. KMeans does not mess with the order of the elements > within each D vector. I don't know if this is sufficient or if Lanczos does > anything similar. > Like Ted mentioned, NamedVector may be the key here to identifying the original points from the clustered projected data. That's probably the right way to go. > > -Original Message- > From: [email protected] [mailto:[email protected]] On Behalf > Of Shannon Quinn > Sent: Tuesday, May 24, 2011 2:10 PM > To: [email protected] > Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example > fails > > You're right, that would give you the affinity matrix. However, the > affinity > matrix is an easier beast to tame since the matrix is constructed with all > the points' orders preserved: aff[i][j] is the relationship between > original_point[i] and original_point[j], so for all practical purposes I > treat this as the "original data" (since it's easy to go back and forth > between the two). > > Problem is, I'm not sure if the Lanczos solver or K-Means preserve this > ordering of indices. Does the nth point with label y from the result of > K-means correspond to the nth row of the column matrix of eigenvectors? If > so, then does that nth row from the eigenvector matrix also correspond to > the nth original data point (the one represented by proxy by row n and > column n of the affinity matrix)? If both these conditions are true, then > and only then can we say that original_point[n]'s cluster is y. > > On Tue, May 24, 2011 at 4:39 PM, Jeff Eastman wrote: > > > Would that give you the original data matrix, the clustered data matrix, > or > > the clustered affinity matrix? Even with the analogy in mind I'm having > > trouble connecting the dots. Seems like I lost the original data matrix > in > > step 1 when I used a distance measure to produce A from it. If the > returned > > eigenvectors define Q, then what is the significance of QAQ^-1? And, more > > importantly, if the Q eigenvectors define the clusters in eigenspace, > what > > is the inverse transformation? > > > > -----Original Message- > > From: [email protected] [mailto:[email protected]] On Behalf > > Of Shannon Quinn > > Sent: Tuesday, May 24, 2011 12:07 PM > > To: [email protected] > > Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans > example > > fails > > > > That's an excellent analogy! Employing that strategy, would it be > possible > > (and not too expensive) to do the QAQ^-1 operation to get the original > data > > matrix, after we've clustered the points in eigenspace? > > > > On Tue, May 24, 2011 at 2:59 PM, Jeff Eastman > wrote: > > > > > For the display example, it is not necessary to cluster the original > > > points. The other clustering display examples only train the clusters > and > > do > > > not classify the points. They are drawn first and the cluster centers & > > > radii are superimposed afterwards. Thus I think it is only necessary to > > > back-transform the clusters. > > > > > > My EE gut tells me this is like Fourier transforms between time- and > > > frequency-domains. If this is true then what we need is the inverse > > > transform. Is this a correct analogy? > > > > > > -Original Message- > > &
Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
Nahh... the names are the key. On Tue, May 24, 2011 at 2:49 PM, Jeff Eastman wrote: > If the D vectors are NamedVectors, with their index as the name, then this > will flow through to the clustered points at the output. The order of those > points won't bear much relation to the order of the input, but the names > will be preserved. KMeans does not mess with the order of the elements > within each D vector. I don't know if this is sufficient or if Lanczos does > anything similar.
Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
Well, it isn't entirely simple, but for suitable distances, D can often be reverse engineered subject to various isometries like rotation and inversion that don't change distance. On Tue, May 24, 2011 at 2:49 PM, Jeff Eastman wrote: > Since the distance measure is reducing the [2] dimensionality of the Di and > Dj vectors with a scalar (aij), I don't see how to reconstruct D from A.
RE: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
Let me see if I'm following you. In the display example, there are 1100, 2d vectors generated as raw data D which is 1100x2. Then, the preprocessing step uses a distance measure to produce A, which is 1100x1100. They are not really affinities, more like distances, so I may have missed the boat on that step. Since the distance measure is reducing the [2] dimensionality of the Di and Dj vectors with a scalar (aij), I don't see how to reconstruct D from A. KMeans will cluster all the input vectors in an arbitrary order if on a N>1 cluster and so Di and Dj will lose their index positions in the result. If the D vectors are NamedVectors, with their index as the name, then this will flow through to the clustered points at the output. The order of those points won't bear much relation to the order of the input, but the names will be preserved. KMeans does not mess with the order of the elements within each D vector. I don't know if this is sufficient or if Lanczos does anything similar. -Original Message- From: [email protected] [mailto:[email protected]] On Behalf Of Shannon Quinn Sent: Tuesday, May 24, 2011 2:10 PM To: [email protected] Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails You're right, that would give you the affinity matrix. However, the affinity matrix is an easier beast to tame since the matrix is constructed with all the points' orders preserved: aff[i][j] is the relationship between original_point[i] and original_point[j], so for all practical purposes I treat this as the "original data" (since it's easy to go back and forth between the two). Problem is, I'm not sure if the Lanczos solver or K-Means preserve this ordering of indices. Does the nth point with label y from the result of K-means correspond to the nth row of the column matrix of eigenvectors? If so, then does that nth row from the eigenvector matrix also correspond to the nth original data point (the one represented by proxy by row n and column n of the affinity matrix)? If both these conditions are true, then and only then can we say that original_point[n]'s cluster is y. On Tue, May 24, 2011 at 4:39 PM, Jeff Eastman wrote: > Would that give you the original data matrix, the clustered data matrix, or > the clustered affinity matrix? Even with the analogy in mind I'm having > trouble connecting the dots. Seems like I lost the original data matrix in > step 1 when I used a distance measure to produce A from it. If the returned > eigenvectors define Q, then what is the significance of QAQ^-1? And, more > importantly, if the Q eigenvectors define the clusters in eigenspace, what > is the inverse transformation? > > -Original Message- > From: [email protected] [mailto:[email protected]] On Behalf > Of Shannon Quinn > Sent: Tuesday, May 24, 2011 12:07 PM > To: [email protected] > Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example > fails > > That's an excellent analogy! Employing that strategy, would it be possible > (and not too expensive) to do the QAQ^-1 operation to get the original data > matrix, after we've clustered the points in eigenspace? > > On Tue, May 24, 2011 at 2:59 PM, Jeff Eastman wrote: > > > For the display example, it is not necessary to cluster the original > > points. The other clustering display examples only train the clusters and > do > > not classify the points. They are drawn first and the cluster centers & > > radii are superimposed afterwards. Thus I think it is only necessary to > > back-transform the clusters. > > > > My EE gut tells me this is like Fourier transforms between time- and > > frequency-domains. If this is true then what we need is the inverse > > transform. Is this a correct analogy? > > > > -----Original Message- > > From: [email protected] [mailto:[email protected]] On Behalf > > Of Shannon Quinn > > Sent: Tuesday, May 24, 2011 11:39 AM > > To: [email protected] > > Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans > example > > fails > > > > This is actually something I could use a little expert Hadoop assistance > > on. > > The general idea is that the points that are clustered in eigenspace have > a > > 1-to-1 correspondence with the original points (which is how you get your > > cluster assignments), but this back-mapping after clustering isn't > > explicitly implemented yet, since that's the core of the IO issue. > > > > My block on this is my lack of understanding in how the actual ordering > of > > the points change (or not?) from when they are projected into eigenspace > > (the Lanczos solver) a
Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
Ordering matters less than labeling. And another way to put it is that the affinity or distance matrix A should have the same labels on the rows AND on the columns as were on the rows of the original matrix. Thus, the labels on the rows of Q should be the same as the original labels. Forming Q' A Q (not QAQ', btw) only gives us the diagonalized form of A which is just the affinity matrix of the eigen-representations. That isn't all that interesting. On Tue, May 24, 2011 at 2:09 PM, Shannon Quinn wrote: > Problem is, I'm not sure if the Lanczos solver or K-Means preserve this > ordering of indices. Does the nth point with label y from the result of > K-means correspond to the nth row of the column matrix of eigenvectors? >
Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
You're right, that would give you the affinity matrix. However, the affinity matrix is an easier beast to tame since the matrix is constructed with all the points' orders preserved: aff[i][j] is the relationship between original_point[i] and original_point[j], so for all practical purposes I treat this as the "original data" (since it's easy to go back and forth between the two). Problem is, I'm not sure if the Lanczos solver or K-Means preserve this ordering of indices. Does the nth point with label y from the result of K-means correspond to the nth row of the column matrix of eigenvectors? If so, then does that nth row from the eigenvector matrix also correspond to the nth original data point (the one represented by proxy by row n and column n of the affinity matrix)? If both these conditions are true, then and only then can we say that original_point[n]'s cluster is y. On Tue, May 24, 2011 at 4:39 PM, Jeff Eastman wrote: > Would that give you the original data matrix, the clustered data matrix, or > the clustered affinity matrix? Even with the analogy in mind I'm having > trouble connecting the dots. Seems like I lost the original data matrix in > step 1 when I used a distance measure to produce A from it. If the returned > eigenvectors define Q, then what is the significance of QAQ^-1? And, more > importantly, if the Q eigenvectors define the clusters in eigenspace, what > is the inverse transformation? > > -Original Message- > From: [email protected] [mailto:[email protected]] On Behalf > Of Shannon Quinn > Sent: Tuesday, May 24, 2011 12:07 PM > To: [email protected] > Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example > fails > > That's an excellent analogy! Employing that strategy, would it be possible > (and not too expensive) to do the QAQ^-1 operation to get the original data > matrix, after we've clustered the points in eigenspace? > > On Tue, May 24, 2011 at 2:59 PM, Jeff Eastman wrote: > > > For the display example, it is not necessary to cluster the original > > points. The other clustering display examples only train the clusters and > do > > not classify the points. They are drawn first and the cluster centers & > > radii are superimposed afterwards. Thus I think it is only necessary to > > back-transform the clusters. > > > > My EE gut tells me this is like Fourier transforms between time- and > > frequency-domains. If this is true then what we need is the inverse > > transform. Is this a correct analogy? > > > > -Original Message- > > From: [email protected] [mailto:[email protected]] On Behalf > > Of Shannon Quinn > > Sent: Tuesday, May 24, 2011 11:39 AM > > To: [email protected] > > Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans > example > > fails > > > > This is actually something I could use a little expert Hadoop assistance > > on. > > The general idea is that the points that are clustered in eigenspace have > a > > 1-to-1 correspondence with the original points (which is how you get your > > cluster assignments), but this back-mapping after clustering isn't > > explicitly implemented yet, since that's the core of the IO issue. > > > > My block on this is my lack of understanding in how the actual ordering > of > > the points change (or not?) from when they are projected into eigenspace > > (the Lanczos solver) and when K-means makes its cluster assignments. On a > > one-node setup the original ordering appears to be preserved through all > > the > > operations, so the labels of the original points can be assigned by > giving > > original_point[i] the label of projected_point[i], hence the cluster > > assignments are easy to determine. For multi-node setups, however, I > simply > > don't know if this heuristic holds. > > > > But I believe the immediate issue here is that we're feeding the > projected > > points to the display, when it should be the original points *annotated* > > with the cluster assignments from the corresponding projected points. The > > question is how to shift those assignments over robustly; right now it's > > just a hack job in the SpectralKMeansDriver...or maybe (hopefully!) it's > > just the version I have locally :o) > > > > On Tue, May 24, 2011 at 2:13 PM, Jeff Eastman > wrote: > > > > > Yes, I expect it is pilot error on my part. The original implementation > > was > > > failing in this manner because I was requesting 5 eigenvectors > > (clusters). I > > > changed it to 2 and now it dis
RE: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
Would that give you the original data matrix, the clustered data matrix, or the clustered affinity matrix? Even with the analogy in mind I'm having trouble connecting the dots. Seems like I lost the original data matrix in step 1 when I used a distance measure to produce A from it. If the returned eigenvectors define Q, then what is the significance of QAQ^-1? And, more importantly, if the Q eigenvectors define the clusters in eigenspace, what is the inverse transformation? -Original Message- From: [email protected] [mailto:[email protected]] On Behalf Of Shannon Quinn Sent: Tuesday, May 24, 2011 12:07 PM To: [email protected] Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails That's an excellent analogy! Employing that strategy, would it be possible (and not too expensive) to do the QAQ^-1 operation to get the original data matrix, after we've clustered the points in eigenspace? On Tue, May 24, 2011 at 2:59 PM, Jeff Eastman wrote: > For the display example, it is not necessary to cluster the original > points. The other clustering display examples only train the clusters and do > not classify the points. They are drawn first and the cluster centers & > radii are superimposed afterwards. Thus I think it is only necessary to > back-transform the clusters. > > My EE gut tells me this is like Fourier transforms between time- and > frequency-domains. If this is true then what we need is the inverse > transform. Is this a correct analogy? > > -Original Message- > From: [email protected] [mailto:[email protected]] On Behalf > Of Shannon Quinn > Sent: Tuesday, May 24, 2011 11:39 AM > To: [email protected] > Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example > fails > > This is actually something I could use a little expert Hadoop assistance > on. > The general idea is that the points that are clustered in eigenspace have a > 1-to-1 correspondence with the original points (which is how you get your > cluster assignments), but this back-mapping after clustering isn't > explicitly implemented yet, since that's the core of the IO issue. > > My block on this is my lack of understanding in how the actual ordering of > the points change (or not?) from when they are projected into eigenspace > (the Lanczos solver) and when K-means makes its cluster assignments. On a > one-node setup the original ordering appears to be preserved through all > the > operations, so the labels of the original points can be assigned by giving > original_point[i] the label of projected_point[i], hence the cluster > assignments are easy to determine. For multi-node setups, however, I simply > don't know if this heuristic holds. > > But I believe the immediate issue here is that we're feeding the projected > points to the display, when it should be the original points *annotated* > with the cluster assignments from the corresponding projected points. The > question is how to shift those assignments over robustly; right now it's > just a hack job in the SpectralKMeansDriver...or maybe (hopefully!) it's > just the version I have locally :o) > > On Tue, May 24, 2011 at 2:13 PM, Jeff Eastman wrote: > > > Yes, I expect it is pilot error on my part. The original implementation > was > > failing in this manner because I was requesting 5 eigenvectors > (clusters). I > > changed it to 2 and now it displays something but it is not even close to > > correct. I think this is because I have not transformed back from eigen > > space to vector space. This all relates to the IO issue for the spectral > > clustering code which I don't grok. > > > > The display driver begins with the sample points and generates the > affinity > > matrix using a distance measure. Not clear this is even a correct > > interpretation of that matrix. Then spectral kmeans runs and produces 2 > > clusters which I display directly. Seems like this number should be more > > like the k in kmeans, and 5 was more realistic given the data. I believe > > there is a missing output transformation to recover the clusters from the > > eigenvectors but I don't know how to do that. > > > > I bet you do :) > > > > -Original Message- > > From: Shannon Quinn (JIRA) [mailto:[email protected]] > > Sent: Tuesday, May 24, 2011 8:07 AM > > To: [email protected] > > Subject: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example > > fails > > > > > >[ > > > https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comm
Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
More or less follow the data through the pipeline? On Tue, May 24, 2011 at 3:08 PM, Ted Dunning wrote: > Yes. That can be done, but you probably can just remember the references. > > On Tue, May 24, 2011 at 12:06 PM, Shannon Quinn wrote: > > > That's an excellent analogy! Employing that strategy, would it be > possible > > (and not too expensive) to do the QAQ^-1 operation to get the original > data > > matrix, after we've clustered the points in eigenspace? > > >
Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
Yes. That can be done, but you probably can just remember the references. On Tue, May 24, 2011 at 12:06 PM, Shannon Quinn wrote: > That's an excellent analogy! Employing that strategy, would it be possible > (and not too expensive) to do the QAQ^-1 operation to get the original data > matrix, after we've clustered the points in eigenspace? >
Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
That's an excellent analogy! Employing that strategy, would it be possible (and not too expensive) to do the QAQ^-1 operation to get the original data matrix, after we've clustered the points in eigenspace? On Tue, May 24, 2011 at 2:59 PM, Jeff Eastman wrote: > For the display example, it is not necessary to cluster the original > points. The other clustering display examples only train the clusters and do > not classify the points. They are drawn first and the cluster centers & > radii are superimposed afterwards. Thus I think it is only necessary to > back-transform the clusters. > > My EE gut tells me this is like Fourier transforms between time- and > frequency-domains. If this is true then what we need is the inverse > transform. Is this a correct analogy? > > -Original Message- > From: [email protected] [mailto:[email protected]] On Behalf > Of Shannon Quinn > Sent: Tuesday, May 24, 2011 11:39 AM > To: [email protected] > Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example > fails > > This is actually something I could use a little expert Hadoop assistance > on. > The general idea is that the points that are clustered in eigenspace have a > 1-to-1 correspondence with the original points (which is how you get your > cluster assignments), but this back-mapping after clustering isn't > explicitly implemented yet, since that's the core of the IO issue. > > My block on this is my lack of understanding in how the actual ordering of > the points change (or not?) from when they are projected into eigenspace > (the Lanczos solver) and when K-means makes its cluster assignments. On a > one-node setup the original ordering appears to be preserved through all > the > operations, so the labels of the original points can be assigned by giving > original_point[i] the label of projected_point[i], hence the cluster > assignments are easy to determine. For multi-node setups, however, I simply > don't know if this heuristic holds. > > But I believe the immediate issue here is that we're feeding the projected > points to the display, when it should be the original points *annotated* > with the cluster assignments from the corresponding projected points. The > question is how to shift those assignments over robustly; right now it's > just a hack job in the SpectralKMeansDriver...or maybe (hopefully!) it's > just the version I have locally :o) > > On Tue, May 24, 2011 at 2:13 PM, Jeff Eastman wrote: > > > Yes, I expect it is pilot error on my part. The original implementation > was > > failing in this manner because I was requesting 5 eigenvectors > (clusters). I > > changed it to 2 and now it displays something but it is not even close to > > correct. I think this is because I have not transformed back from eigen > > space to vector space. This all relates to the IO issue for the spectral > > clustering code which I don't grok. > > > > The display driver begins with the sample points and generates the > affinity > > matrix using a distance measure. Not clear this is even a correct > > interpretation of that matrix. Then spectral kmeans runs and produces 2 > > clusters which I display directly. Seems like this number should be more > > like the k in kmeans, and 5 was more realistic given the data. I believe > > there is a missing output transformation to recover the clusters from the > > eigenvectors but I don't know how to do that. > > > > I bet you do :) > > > > -Original Message- > > From: Shannon Quinn (JIRA) [mailto:[email protected]] > > Sent: Tuesday, May 24, 2011 8:07 AM > > To: [email protected] > > Subject: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example > > fails > > > > > >[ > > > https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038608#comment-13038608 > ] > > > > Shannon Quinn commented on MAHOUT-524: > > -- > > > > +1, I'm on it. > > > > I'm a little unclear as to the context of the initial Hudson comment: the > > display method is expecting 2D vectors, but getting 5D ones? > > > > > DisplaySpectralKMeans example fails > > > --- > > > > > > Key: MAHOUT-524 > > > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > > > Project: Mahout > > > Issue Type: Bug > > > Components: Clustering > > >Affects Versi
RE: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
For the display example, it is not necessary to cluster the original points. The other clustering display examples only train the clusters and do not classify the points. They are drawn first and the cluster centers & radii are superimposed afterwards. Thus I think it is only necessary to back-transform the clusters. My EE gut tells me this is like Fourier transforms between time- and frequency-domains. If this is true then what we need is the inverse transform. Is this a correct analogy? -Original Message- From: [email protected] [mailto:[email protected]] On Behalf Of Shannon Quinn Sent: Tuesday, May 24, 2011 11:39 AM To: [email protected] Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails This is actually something I could use a little expert Hadoop assistance on. The general idea is that the points that are clustered in eigenspace have a 1-to-1 correspondence with the original points (which is how you get your cluster assignments), but this back-mapping after clustering isn't explicitly implemented yet, since that's the core of the IO issue. My block on this is my lack of understanding in how the actual ordering of the points change (or not?) from when they are projected into eigenspace (the Lanczos solver) and when K-means makes its cluster assignments. On a one-node setup the original ordering appears to be preserved through all the operations, so the labels of the original points can be assigned by giving original_point[i] the label of projected_point[i], hence the cluster assignments are easy to determine. For multi-node setups, however, I simply don't know if this heuristic holds. But I believe the immediate issue here is that we're feeding the projected points to the display, when it should be the original points *annotated* with the cluster assignments from the corresponding projected points. The question is how to shift those assignments over robustly; right now it's just a hack job in the SpectralKMeansDriver...or maybe (hopefully!) it's just the version I have locally :o) On Tue, May 24, 2011 at 2:13 PM, Jeff Eastman wrote: > Yes, I expect it is pilot error on my part. The original implementation was > failing in this manner because I was requesting 5 eigenvectors (clusters). I > changed it to 2 and now it displays something but it is not even close to > correct. I think this is because I have not transformed back from eigen > space to vector space. This all relates to the IO issue for the spectral > clustering code which I don't grok. > > The display driver begins with the sample points and generates the affinity > matrix using a distance measure. Not clear this is even a correct > interpretation of that matrix. Then spectral kmeans runs and produces 2 > clusters which I display directly. Seems like this number should be more > like the k in kmeans, and 5 was more realistic given the data. I believe > there is a missing output transformation to recover the clusters from the > eigenvectors but I don't know how to do that. > > I bet you do :) > > -Original Message- > From: Shannon Quinn (JIRA) [mailto:[email protected]] > Sent: Tuesday, May 24, 2011 8:07 AM > To: [email protected] > Subject: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example > fails > > >[ > https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038608#comment-13038608] > > Shannon Quinn commented on MAHOUT-524: > -- > > +1, I'm on it. > > I'm a little unclear as to the context of the initial Hudson comment: the > display method is expecting 2D vectors, but getting 5D ones? > > > DisplaySpectralKMeans example fails > > --- > > > > Key: MAHOUT-524 > > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > > Project: Mahout > > Issue Type: Bug > > Components: Clustering > >Affects Versions: 0.4, 0.5 > >Reporter: Jeff Eastman > >Assignee: Jeff Eastman > > Labels: clustering, k-means, visualization > > Fix For: 0.6 > > > > Attachments: aff.txt, raw.txt, spectralkmeans.png > > > > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed > before this will play with the rest of the clustering algorithms. > > -- > This message is automatically generated by JIRA. > For more information on JIRA, see: http://www.atlassian.com/software/jira >
Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
This is actually something I could use a little expert Hadoop assistance on. The general idea is that the points that are clustered in eigenspace have a 1-to-1 correspondence with the original points (which is how you get your cluster assignments), but this back-mapping after clustering isn't explicitly implemented yet, since that's the core of the IO issue. My block on this is my lack of understanding in how the actual ordering of the points change (or not?) from when they are projected into eigenspace (the Lanczos solver) and when K-means makes its cluster assignments. On a one-node setup the original ordering appears to be preserved through all the operations, so the labels of the original points can be assigned by giving original_point[i] the label of projected_point[i], hence the cluster assignments are easy to determine. For multi-node setups, however, I simply don't know if this heuristic holds. But I believe the immediate issue here is that we're feeding the projected points to the display, when it should be the original points *annotated* with the cluster assignments from the corresponding projected points. The question is how to shift those assignments over robustly; right now it's just a hack job in the SpectralKMeansDriver...or maybe (hopefully!) it's just the version I have locally :o) On Tue, May 24, 2011 at 2:13 PM, Jeff Eastman wrote: > Yes, I expect it is pilot error on my part. The original implementation was > failing in this manner because I was requesting 5 eigenvectors (clusters). I > changed it to 2 and now it displays something but it is not even close to > correct. I think this is because I have not transformed back from eigen > space to vector space. This all relates to the IO issue for the spectral > clustering code which I don't grok. > > The display driver begins with the sample points and generates the affinity > matrix using a distance measure. Not clear this is even a correct > interpretation of that matrix. Then spectral kmeans runs and produces 2 > clusters which I display directly. Seems like this number should be more > like the k in kmeans, and 5 was more realistic given the data. I believe > there is a missing output transformation to recover the clusters from the > eigenvectors but I don't know how to do that. > > I bet you do :) > > -Original Message- > From: Shannon Quinn (JIRA) [mailto:[email protected]] > Sent: Tuesday, May 24, 2011 8:07 AM > To: [email protected] > Subject: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example > fails > > >[ > https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038608#comment-13038608] > > Shannon Quinn commented on MAHOUT-524: > -- > > +1, I'm on it. > > I'm a little unclear as to the context of the initial Hudson comment: the > display method is expecting 2D vectors, but getting 5D ones? > > > DisplaySpectralKMeans example fails > > --- > > > > Key: MAHOUT-524 > > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > > Project: Mahout > > Issue Type: Bug > > Components: Clustering > >Affects Versions: 0.4, 0.5 > >Reporter: Jeff Eastman > >Assignee: Jeff Eastman > > Labels: clustering, k-means, visualization > > Fix For: 0.6 > > > > Attachments: aff.txt, raw.txt, spectralkmeans.png > > > > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed > before this will play with the rest of the clustering algorithms. > > -- > This message is automatically generated by JIRA. > For more information on JIRA, see: http://www.atlassian.com/software/jira >
RE: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
Yes, I expect it is pilot error on my part. The original implementation was failing in this manner because I was requesting 5 eigenvectors (clusters). I changed it to 2 and now it displays something but it is not even close to correct. I think this is because I have not transformed back from eigen space to vector space. This all relates to the IO issue for the spectral clustering code which I don't grok. The display driver begins with the sample points and generates the affinity matrix using a distance measure. Not clear this is even a correct interpretation of that matrix. Then spectral kmeans runs and produces 2 clusters which I display directly. Seems like this number should be more like the k in kmeans, and 5 was more realistic given the data. I believe there is a missing output transformation to recover the clusters from the eigenvectors but I don't know how to do that. I bet you do :) -Original Message- From: Shannon Quinn (JIRA) [mailto:[email protected]] Sent: Tuesday, May 24, 2011 8:07 AM To: [email protected] Subject: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails [ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038608#comment-13038608 ] Shannon Quinn commented on MAHOUT-524: -- +1, I'm on it. I'm a little unclear as to the context of the initial Hudson comment: the display method is expecting 2D vectors, but getting 5D ones? > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Jeff Eastman > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: aff.txt, raw.txt, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038608#comment-13038608 ] Shannon Quinn commented on MAHOUT-524: -- +1, I'm on it. I'm a little unclear as to the context of the initial Hudson comment: the display method is expecting 2D vectors, but getting 5D ones? > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4, 0.5 >Reporter: Jeff Eastman >Assignee: Jeff Eastman > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: aff.txt, raw.txt, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982912#action_12982912 ] Hudson commented on MAHOUT-524: --- Integrated in Mahout-Quality #567 (See [https://hudson.apache.org/hudson/job/Mahout-Quality/567/]) > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4 >Reporter: Jeff Eastman > Fix For: 0.5 > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982737#action_12982737 ] Jeff Eastman commented on MAHOUT-524: - The Display algorithm now runs without errors but the 2 clusters it produces are clearly not what I was expecting. Probably a gross misunderstanding on my part and a final output processing step that needs to be invented. > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4 >Reporter: Jeff Eastman > Fix For: 0.5 > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982681#action_12982681 ] Sean Owen commented on MAHOUT-524: -- Jeff sounds like there is no outstanding issue here at the moment, or something more to track here? > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4 >Reporter: Jeff Eastman > Fix For: 0.5 > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920226#action_12920226 ] Hudson commented on MAHOUT-524: --- Integrated in Mahout-Quality #392 (See [https://hudson.apache.org/hudson/job/Mahout-Quality/392/]) MAHOUT-524: Moved numEigensWritten initialization out of loop. SpectralKMeans now runs to completion but display routing is expecting a 2-d vector and is getting a 5-d vector. Not clustering the original input points. More to test but CleanEigensJob is working. > DisplaySpectralKMeans example fails > --- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.4 >Reporter: Jeff Eastman > Fix For: 0.4 > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments it gets remarkably far through, finally failing on > W.transpose() after the eigen cleanup. I can't imagine this would all be > pilot error so I'm opening an issue to track it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
