The ShellBolt looks for "scancount.py" in the resources/ directory in your JAR, which will be extracted to each worker machine. It then simply invokes "python scancount.py" in that directory. So you need to make sure the scancount.py file will be on the classpath under resources/, as well the storm.py interop library it depends upon.
Based on the official word count example <https://github.com/apache/incubator-storm/blob/master/examples/storm-starter/src/jvm/storm/starter/WordCountTopology.java#L40-L56>, your Java ShellBolt definition looks OK. The storm.py interop library that you're probably using then communicates with the rest of Storm via the Multi-lang Protocol <https://storm.incubator.apache.org/documentation/Multilang-protocol.html>. This means your Python process is really sending JSON messages over stdout and receiving JSON messages over stdin. That's the relationship between Python & Java (& Storm) in this case. A library I'm working on with my team, streamparse <https://github.com/Parsely/streamparse>, makes this workflow easier by bundling upon a command-line tool for building/submitting/running Python topologies. For example, getting a Storm + Python "wordcount" example to run locally is just a matter of: sparse quickstart wordcount cd wordcount sparse run It also eliminates the need to write the Java glue code you're putting together here. It's still in early development but we're already using it for real Storm 0.8 and 0.9 production clusters & local development. --- Andrew Montalenti Co-Founder & CTO http://parse.ly On Mon, Jun 2, 2014 at 12:37 PM, Ashu Goel <a...@shopkick.com> wrote: > Hi all, > > I am experimenting with writing bolts in Python and was wondering how the > relationship between the Java and Python code works. For example, I have a > Python bolt that looks like this: > > class ScanCountBolt(storm.BasicBolt): > > def __init__(self): > #super(ScanCountBolt, self).__init__(script='scancount.py') > self._count = defaultdict(int) > > def process(self, tup): > product = tup.values[0] > self._count[product] += 1 > storm.emit([product, self._count[product]]) > > ScanCountBolt().run() > > > And my corresponding Java code looks like this: > > public static class ScanCount extends ShellBolt implements IRichBolt { > > public ScanCount() { > super("python", "scancount.py"); > } > > @Override > public void declareOutputFields(OutputFieldsDeclarer declarer) { > declarer.declare(new Fields("product", "scans")); > } > > @Override > public Map<String, Object> getComponentConfiguration() { > return null; > } > } > > Is that all I need to make it work or do I need to declare the data > structures in the Java code as well. I am a bit confused... > > -Ashu