Hi all, I'm trying to run the mahout canopy clustering algorithm through a Python-embedded Pig script. The embedded Pig part of the script works (using compileFromFile, bind, runSingle), but I can't figure out how to run mahout from the same script. Originally I tried running mahout via subprocess.call, but when trying to import subprocess, I get:
ImportError: No module named subprocess Similar errors occur when I try to import sys or os modules. Next I tried just instantiating the CanopyClustering class, but got a similar error when using the following import statement: from org.apache.mahout.clustering.canopy import CanopyDriver #=> ImportError: No module named mahout The ImportErrors don't occur when I run Python interactively. Is this a Jython problem? Am I not setting some path properly? Other possibly useful info: - I'm including the mahout jars in the pig.additional.jars property. - I'm running the script using Pig, i.e., `pig myscript.py` Thanks, Chun
