Hi, If this question has already been answered please forgive me and point me to the appropriate thread.
I'd like to be able to find the ids of all new pages crawled by nutch or pages modified since a fixed point in the past. I'm using Nutch 2.1 with MySQL as the back-end and it seems like the appropriate back-end query should be something like: "select id from webpage where (prevFetchTime=null & fetchTime>="X") or (modifiedTime >= "X" ) where "X" is some point in the past. What I've found is that modifiedTime is always null. I am using the adaptive scheduler and the default md5 signature class. I've tried both re-injecting seed URLs as well as not, it seems to make no difference. modifiedTime remains null. I am most grateful for any help or advise. If my nutc-hsite.xml fiel would help I can forward it along. Thanks, jacob

