Hi Tony, As i remember some phases in Nutch(INJECT, GENERATE, ...) set a specific mark(marker field) - for example on inject phase "mk:_injmrk_" is set, for GENERATE phase - "mk:_gnmrk_". It is also worth to point that phases depends on results of execution of previous phases(e.g. FETCH will only fetch urls that were successfully processed by GENERATE phase(gen mark is set)). Check that you have such marks on your entries in collection. If you have only inject mark it means that GENERATE phase didn't choose url to be fetched. In this case you should check that you pass "curTime" parameter with current timestamp after you did INJECT.
>From my experience - it is better to download Nutch sources and check what it is doing from the code. Hope that helps Regards Best Regards, Dzmitry On Fri, Jun 26, 2015 at 11:00 PM, Tony Colletti <[email protected]> wrote: > After searching your site and then having to resort to S/O, I've finally > figured out how to create a full crawl using each command to the REST > endpoint. However, I've noticed that after my final step is done > (UPDATEDB), I check my db and there are many fields missing. The ones I'm > most concerned about is the "status" and "baseUrl" field. I'm not even sure > if the crawl is actually being executed or not. I'm assuming it's something > I have wrong. I've followed the examples in this< > https://docs.google.com/document/d/1OGg22ATohapP2ycewIaTcUnENc2FeyYzni0ED_Jjxz8/edit> > document that I found on another mailing list topic. What am I doing wrong? > I'm using Nutch 2.3 and tying it into MongoDB as my database. > > Also, I've found that even after just running the command to INJECT the > seedlist, my db already has a new collection with information in it. That > information is the same information in the end, so it never changes. But > when checking the status of the other commands, they all say FINISHED and > OK. What's going on? > > Thanks for the help! > > ~ Tony > >

