Thanks, It was a permission issue. I had to change the group owner to the current user's group, it's now building. I moved the build from one server to another (which caused the user sync problem).
2010/9/30 Jeff Eastman <[email protected]>: > Don't think so. Try "mvn clean install" and let me know what happens. > > On 9/30/10 12:48 PM, Matt Tanquary wrote: >> >> Hi Jeff, >> >> Thanks for your reply. I just got trunk and started the install. It >> ended with this error: >> >> Error loading supplemental data models: Cannot create file-based resource. >> org.codehaus.plexus.resource.loader.FileResourceCreationException: >> Cannot create file-based resource. >> >> >> A lot built, so I went ahead and tried your command-line example, but got: >> >> ERROR: Could not find mahout-examples-*.job in >> /mnt/install/tools/mahout or >> /mnt/install/tools/mahout/examples/target, please run 'mvn install' to >> create the .job file >> >> I retrieved trunk as follows: svn co >> http://svn.apache.org/repos/asf/mahout/trunk >> >> Then ran 'mvn install' in the trunk folder. >> >> Any issues with trunk today? >> >> Thanks, >> Matt >> >> On Wed, Sep 29, 2010 at 12:29 PM, Jeff Eastman >> <[email protected]> wrote: >>> >>> Hi Matt, >>> >>> From your command arguments, it looks like you are running 0.3. Due to >>> the >>> rate of change in Mahout we recommend you check out trunk and use that >>> instead. With a little tweaking (added a --charset ASCII on seqdirectory) >>> I >>> was able to get as far as you did on trunk but seq2sparse is not what you >>> want to use. >>> >>> The utilities you are using are intended for text preprocessing, to get >>> documents word-counted, into term vector sequenceFiles and then running >>> TF >>> and/or TF-IDF processing on the results to produce VectorWritable >>> sequence >>> files suitable for clustering. For your problem, I suggest you instead >>> look >>> at the Synthetic Control clustering examples, starting with Canopy. These >>> use an InputDriver to process text files containing space-delimited >>> numbers >>> like your data.dat file and produce the VectorWritable sequence files >>> directly. >>> >>> I was able to run this on your data using trunk and it produced 3 >>> clusters. >>> You should be able to run the other synthetic control jobs on it too: >>> >>> CommandLine: >>> ./bin/mahout org.apache.mahout.clustering.syntheticcontrol.canopy.Job \ >>> -i data \ >>> -o output \ >>> -t1 3 \ >>> -t2 2 \ >>> -ow \ >>> -dm org.apache.mahout.common.distance.EuclideanDistanceMeasure >>> >>> Clusters output: >>> C-0{n=1 c=[22.000, 21.000] r=[0.000, 0.000]} >>> Weight: Point: >>> 1.0: [22.000, 21.000] >>> C-1{n=2 c=[18.250, 21.500] r=[0.250, 0.500]} >>> Weight: Point: >>> 1.0: [19.000, 20.000] >>> 1.0: [18.000, 22.000] >>> C-2{n=2 c=[2.500, 2.250] r=[0.500, 0.250]} >>> Weight: Point: >>> 1.0: [1.000, 3.000] >>> 1.0: [3.000, 2.000] >>> >>> >>> Good hunting, >>> Jeff >>> >>> On 9/29/10 2:26 PM, Matt Tanquary wrote: >>>> >>>> I was able to run the tutorials, etc. Now I would like to generate my >>>> own small test. >>>> >>>> I have created a data.dat file and put these contents: >>>> 22 21 >>>> 19 20 >>>> 18 22 >>>> 1 3 >>>> 3 2 >>>> >>>> Then I ran: mahout seqdirectory -i ~/data/kmeans/data.dat -o >>>> kmeans/seqdir >>>> >>>> This created kmeans/seqdir/chunk-o in my dfs with the following content: >>>> ź/% >>>> /data.dat22 21 >>>> 19 20 >>>> 18 22 >>>> 1 3 >>>> 3 2 >>>> >>>> Next I ran: mahout seq2sparse -i kmeans/seqdir -o kmeans/input >>>> >>>> This generated several things in kmeans/input including the >>>> 'tfidf/vectors' folder. Inside the vectors folder I get: part-00000 >>>> which contains: >>>> řĎân >>>> >>>> /data.dat7org.apache.mahout.math.RandomAccessSparseVectorWritable >>>> /data.dat@@ >>>> >>>> It does not seem to have the numeric data at this point. >>>> >>>> I am hoping someone can shed some light on how I can get my datapoint >>>> file into the proper vector format for running mahout kmeans. >>>> >>>> Just fyi, when I run kmeans against that file (mahout kmeans -i >>>> kmeans/input/tfidf/vectors -c kmeans/clusters -o kmeans/output -k 2 >>>> -w) I get: >>>> >>>> Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: >>>> 1, Size: 1 >>>> at java.util.ArrayList.RangeCheck(ArrayList.java:547) >>>> >>>> which tells me it was unable to find even 1 vector in the given input >>>> folder. >>>> >>>> Thanks for any comments you provide. >>>> -M@ >>> >> >> > > -- Have you thanked a teacher today? ---> http://www.liftateacher.org
