Hello, I'm looking for a way to use R in Nutch, particularly HTML parser, but usage in the other parts can be intresting as well. For each parsed document I would like to run a script and provide the results back to the system e.g. topic detection of the document. NB I'm not looking for a way of scaling R to Hadoop or HDFS like Microsoft R server. This way uses Hadoop as an execution engine after the crawling process. In other words, first the computationally intensive full crawling after that another computationally intensive R/Hadoop process. Instead I'm looking for a way of calling R scripts directly from java code of map or reduce jobs. Any ideas how to make it? One way to do it is "Rserve - Binary R server", but I'm looking for alternatives, to compare efficiency.
Semyon.

