I know it's more complicated since there are multiple jobs within one run
of kmeans clustering, but with other hadoop jobs, I've done something along
the lines of:
for(Job job : parallelJobs){
job.submit()
}
And then I just watch that list of jobs and wait for them all to complete.
That's the sort of thing I want to be able to do with KMeans on multiple
separate datasets.
On Tue, Nov 20, 2012 at 11:58 AM, Matt Molek <[email protected]> wrote:
> I've given up on the CLI and I'm trying to do this in java now, but it
> looks like I can't launch multiple KMeans drivers at once since
> KMeansDriver and many of its underlying classes are static. Am I right that
> that will cause problems? (Sorry for the beginner question. I'm not too
> familiar with concurrency in java).
>
> I'd really like to be able to launch multiple clustering runs at the same
> time since launching them one at a time and waiting for each to finish is
> killing my overall performance.
>
>
>
> On Thu, Nov 8, 2012 at 1:48 PM, Matt Molek <[email protected]> wrote:
>
>> When doing top down clustering, I'm running a first pass of kmeans, and
>> then splitting the different clusters off into their own directories with
>> clusterpp. So I have a bunch of input directories that I want to run kmeans
>> jobs on at the same time.
>>
>> Can I do that from a bash script? Right now I'm running over each input
>> directory with a for loop, and each kmeans job is waiting for completion
>> before the next one starts.
>>
>> If I can't do it with a script, could I do it in Java without having to
>> modify the mahout source?
>>
>> Thanks for the help!
>>
>
>