Hi Sebastian, Thanks for coming back to me. > Adding > set -x > to bin/nutch and then running bin/crawl with a sample crawl which includes > all steps > should log all commands with a full list of arguments.
Yes, that's a great idea. Thanks. > But on EMR it should be possible to directly reference the Nutch job file > by a s3:// URL. (but haven't tried it this way) Yes, that is possible. You add an S3 URL to the Jar= argument in your step definition of the create-cluster command. > aws emr terminate-cluster ... Ah, yes. I did wonder if the master instance had appropriate instance role privilege to do this. I'll try. Unfortunately, it still doesn't solve the iteration issue. Short of defining many many repeated sets of steps, I don't see how I would get multiple rounds. What am I missing? Thanks, Jim

