Hello,

i was trying to crawl a website listed in my urls directory,
but injecting part doesn't stop

it prints out following output endlessly.

Command :

Crawl c:\urls -thread 1 -depth 1


Output :

060118 162225 parsing
file:/C:/Documents%20and%20Settings/Sameer%20Tamsekar/My%20Documents/project/NutchNight/bin/nutch-
default.xml
060118 162225 parsing
file:/C:/Documents%20and%20Settings/Sameer%20Tamsekar/My%20Documents/project/NutchNight/bin/crawl-
tool.xml
060118 162225 parsing
file:/C:/Documents%20and%20Settings/Sameer%20Tamsekar/My%20Documents/project/NutchNight/bin/mapred-
default.xml
060118 162225 parsing
file:/C:/Documents%20and%20Settings/Sameer%20Tamsekar/My%20Documents/project/NutchNight/bin/nutch-
site.xml
060118 162225 crawl started in: crawl-20060118162225
060118 162225 rootUrlDir = c:\urls
060118 162225 threads = 10
060118 162225 depth = 5
060118 162225 parsing
file:/C:/Documents%20and%20Settings/Sameer%20Tamsekar/My%20Documents/project/NutchNight/bin/nutch-
default.xml
060118 162225 parsing
file:/C:/Documents%20and%20Settings/Sameer%20Tamsekar/My%20Documents/project/NutchNight/bin/crawl-
tool.xml
060118 162225 parsing
file:/C:/Documents%20and%20Settings/Sameer%20Tamsekar/My%20Documents/project/NutchNight/bin/nutch-
site.xml
060118 162225 Injector: starting
060118 162225 Injector: crawlDb: crawl-20060118162225\crawldb
060118 162225 Injector: urlDir: c:\urls
060118 162225 Injector: Converting injected urls to crawl db entries.
060118 162225 parsing
file:/C:/Documents%20and%20Settings/Sameer%20Tamsekar/My%20Documents/project/NutchNight/bin/nutch-
default.xml
060118 162225 parsing
file:/C:/Documents%20and%20Settings/Sameer%20Tamsekar/My%20Documents/project/NutchNight/bin/crawl-
tool.xml
060118 162225 parsing
file:/C:/Documents%20and%20Settings/Sameer%20Tamsekar/My%20Documents/project/NutchNight/bin/mapred-
default.xml
060118 162225 parsing
file:/C:/Documents%20and%20Settings/Sameer%20Tamsekar/My%20Documents/project/NutchNight/bin/mapred-
default.xml
060118 162225 parsing
file:/C:/Documents%20and%20Settings/Sameer%20Tamsekar/My%20Documents/project/NutchNight/bin/nutch-
site.xml
060118 162225 Running job: job_ipqvtf
060118 162225 parsing
file:/C:/Documents%20and%20Settings/Sameer%20Tamsekar/My%20Documents/project/NutchNight/bin/nutch-
default.xml
060118 162225 parsing
file:/C:/Documents%20and%20Settings/Sameer%20Tamsekar/My%20Documents/project/NutchNight/bin/mapred-
default.xml
060118 162225 parsing \tmp\nutch\mapred\local\localRunner\job_ipqvtf.xml
060118 162225 parsing
file:/C:/Documents%20and%20Settings/Sameer%20Tamsekar/My%20Documents/project/NutchNight/bin/nutch-
site.xml
060118 162225 Using URL normalizer: org.apache.nutch.net.BasicUrlNormalizer
060118 162225 Plugins: looking in: C:\Documents and Settings\Sameer
Tamsekar\My Documents\project\NutchNight\bin\plugins
060118 162226 Plugin Auto-activation mode: [true]
060118 162226 Registered Plugins:
060118 162226     URL Query Filter (query-url)
060118 162226     Site Query Filter (query-site)
060118 162226     Http / Https Protocol Plug-in (protocol-httpclient)
060118 162226     Html Parse Plug-in (parse-html)
060118 162226     the nutch core extension points (nutch-extensionpoints)
060118 162226     Basic Indexing Filter (index-basic)
060118 162226     Text Parse Plug-in (parse-text)
060118 162226     JavaScript Parser (parse-js)
060118 162226     Regex URL Filter (urlfilter-regex)
060118 162226     Basic Query Filter (query-basic)
060118 162226 Registered Extension-Points:
060118 162226     Nutch Protocol (org.apache.nutch.protocol.Protocol)
060118 162226     Nutch URL Filter (org.apache.nutch.net.URLFilter)
060118 162226     HTML Parse Filter (org.apache.nutch.parse.HtmlParseFilter)
060118 162226     Nutch Online Search Results Clustering Plugin (
org.apache.nutch.clustering.OnlineClusterer)
060118 162226     Nutch Indexing Filter (
org.apache.nutch.indexer.IndexingFilter)
060118 162226     Nutch Content Parser (org.apache.nutch.parse.Parser)
060118 162226     Ontology Model Loader (org.apache.nutch.ontology.Ontology)
060118 162226     Nutch Analysis (org.apache.nutch.analysis.NutchAnalyzer)
060118 162226     Nutch Query Filter (org.apache.nutch.searcher.QueryFilter)
060118 162226 found resource crawl-urlfilter.txt at
file:/C:/Documents%20and%20Settings/Sameer%20Tamsekar/My%20Documents/project/NutchNight/bin/crawl-
urlfilter.txt
060118 162226  map 0%
060118 162226 c:\urls\urllist.txt:0+20
060118 162227  map -485200%
060118 162227 c:\urls\urllist.txt:0+20
060118 162228  map -1433905%
060118 162228 c:\urls\urllist.txt:0+20
060118 162229  map -2369110%
060118 162229 c:\urls\urllist.txt:0+20
060118 162230  map -3320875%
060118 162230 c:\urls\urllist.txt:0+20
060118 162231  map -4257505%
060118 162231 c:\urls\urllist.txt:0+20
060118 162232  map -5189935%
060118 162232 c:\urls\urllist.txt:0+20
060118 162233  map -6120770%
060118 162233 c:\urls\urllist.txt:0+20
060118 162234  map -7072030%
060118 162234 c:\urls\urllist.txt:0+20
060118 162235  map -7974285%
060118 162235 c:\urls\urllist.txt:0+20
060118 162236  map -8906220%
060118 162236 c:\urls\urllist.txt:0+20
060118 162237  map -9854044%
060118 162237 c:\urls\urllist.txt:0+20
060118 162238  map -10800620%
060118 162238 c:\urls\urllist.txt:0+20
060118 162239  map -11773884%
060118 162240 c:\urls\urllist.txt:0+20
060118 162240  map -12733334%
060118 162241 c:\urls\urllist.txt:0+20
060118 162241  map -13682200%
060118 162242 c:\urls\urllist.txt:0+20
060118 162242  map -14657790%
060118 162243 c:\urls\urllist.txt:0+20
060118 162243  map -15636600%
060118 162244 c:\urls\urllist.txt:0+20
060118 162244  map -16615990%
060118 162245 c:\urls\urllist.txt:0+20
060118 162245  map -17590724%
060118 162246 c:\urls\urllist.txt:0+20
060118 162246  map -18569596%
060118 162247 c:\urls\urllist.txt:0+20
060118 162247  map -19514354%
060118 162248 c:\urls\urllist.txt:0+20
060118 162248  map -20477716%
060118 162249 c:\urls\urllist.txt:0+20
060118 162249  map -21455666%
060118 162250 c:\urls\urllist.txt:0+20
060118 162250  map -22386566%

Reply via email to