Hi all,

I'm trying to run the mahout canopy clustering algorithm through a
Python-embedded Pig script. The embedded Pig part of the script works (using
compileFromFile, bind, runSingle), but I can't figure out how to run mahout
from the same script. Originally I tried running mahout via subprocess.call,
but when trying to import subprocess, I get:

ImportError: No module named subprocess

Similar errors occur when I try to import sys or os modules.

Next I tried just instantiating the CanopyClustering class, but got a
similar error when using the following import statement:

from org.apache.mahout.clustering.canopy import CanopyDriver

#=> ImportError: No module named mahout

The ImportErrors don't occur when I run Python interactively. Is this a
Jython problem? Am I not setting some path properly?

Other possibly useful info:
- I'm including the mahout jars in the pig.additional.jars property.
- I'm running the script using Pig, i.e., `pig myscript.py`

Thanks,
Chun

Reply via email to