Page search2.net deleted from Nutch Wiki
Dear wiki user, You have subscribed to a wiki page Nutch Wiki for change notification. The page search2.net has been deleted by search2.net. The comment on this change is: empty page. http://wiki.apache.org/nutch/search2.net
[jira] Issue Comment Edited: (NUTCH-780) Nutch crawler did not read configuration files
[ https://issues.apache.org/jira/browse/NUTCH-780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803193#action_12803193 ] Vu Hoang edited comment on NUTCH-780 at 1/27/10 2:54 AM: - add method {code:java|title=org/apache/nutch/crawl/Crawl.java|borderStyle=solid} public static Configuration overwrite(Configuration nutchConfig) { Configuration crawlConfig = NutchConfiguration.createCrawlConfiguration(); IteratorEntryString, String entries = nutchConfig.iterator(); while (entries.hasNext()) { EntryString, String entry = (EntryString, String) entries.next(); crawlConfig.set(entry.getKey(), entry.getValue()); } return crawlConfig; } {code} add lines below into class org.apache.nutch.crawl.Crawl {code:java|title=org/apache/nutch/crawl/Crawl.java|borderStyle=solid} public static Configuration nutchConfig = null; public static void setNutchConfig(Configuration config) { nutchConfig = config; } {code} and re-configure nutch configuration inside of method main as below {code:java|title=org/apache/nutch/crawl/Crawl.java|borderStyle=solid} Configuration conf = null; if (nutchConfig != null) conf = nutchConfig; else conf = NutchConfiguration.createCrawlConfiguration(); {code} I recommend that solution :) was (Author: vushogerts): add lines below into class org.apache.nutch.crawl.Crawl {code:java|title=org/apache/nutch/crawl/Crawl.java|borderStyle=solid} public static Configuration nutchConfig = null; public static void setNutchConfig(Configuration config) { nutchConfig = config; } {code} and re-configure nutch configuration inside of method main as below {code:java|title=org/apache/nutch/crawl/Crawl.java|borderStyle=solid} Configuration conf = null; if (nutchConfig != null) conf = nutchConfig; else conf = NutchConfiguration.createCrawlConfiguration(); {code} I recommend that solution :) Nutch crawler did not read configuration files -- Key: NUTCH-780 URL: https://issues.apache.org/jira/browse/NUTCH-780 Project: Nutch Issue Type: Bug Components: fetcher Affects Versions: 1.0.0 Reporter: Vu Hoang Nutch searcher can read properties at the constructor ... {code:java|title=NutchSearcher.java|borderStyle=solid} NutchBean bean = new NutchBean(getFilesystem().getConf(), fs); ... // put search engine code here {code} ... but Nutch crawler is not, it only reads data from arguments. {code:java|title=NutchCrawler.java|borderStyle=solid} StringBuilder builder = new StringBuilder(); builder.append(domainlist + SPACE); builder.append(ARGUMENT_CRAWL_DIR); builder.append(domainlist + SUBFIX_CRAWLED + SPACE); builder.append(ARGUMENT_CRAWL_THREADS); builder.append(threads + SPACE); builder.append(ARGUMENT_CRAWL_DEPTH); builder.append(depth + SPACE); builder.append(ARGUMENT_CRAWL_TOPN); builder.append(topN + SPACE); Crawl.main(builder.toString().split(SPACE)); {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (NUTCH-780) Nutch crawler did not read configuration files
[ https://issues.apache.org/jira/browse/NUTCH-780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803193#action_12803193 ] Vu Hoang edited comment on NUTCH-780 at 1/27/10 2:55 AM: - add method {code:java|title=org/apache/nutch/crawl/Crawl.java|borderStyle=solid} public static Configuration overwrite(Configuration nutchConfig) { Configuration crawlConfig = NutchConfiguration.createCrawlConfiguration(); IteratorEntryString, String entries = nutchConfig.iterator(); while (entries.hasNext()) { EntryString, String entry = (EntryString, String) entries.next(); crawlConfig.set(entry.getKey(), entry.getValue()); } return crawlConfig; } {code} add lines below into class org.apache.nutch.crawl.Crawl {code:java|title=org/apache/nutch/crawl/Crawl.java|borderStyle=solid} public static Configuration nutchConfig = null; public static void setNutchConfig(Configuration config) { nutchConfig = config; } {code} and re-configure nutch configuration inside of method main as below {code:java|title=org/apache/nutch/crawl/Crawl.java|borderStyle=solid} Configuration conf = null; if (nutchConfig != null) conf = overwrite(nutchConfig); else conf = NutchConfiguration.createCrawlConfiguration(); {code} I recommend that solution :) was (Author: vushogerts): add method {code:java|title=org/apache/nutch/crawl/Crawl.java|borderStyle=solid} public static Configuration overwrite(Configuration nutchConfig) { Configuration crawlConfig = NutchConfiguration.createCrawlConfiguration(); IteratorEntryString, String entries = nutchConfig.iterator(); while (entries.hasNext()) { EntryString, String entry = (EntryString, String) entries.next(); crawlConfig.set(entry.getKey(), entry.getValue()); } return crawlConfig; } {code} add lines below into class org.apache.nutch.crawl.Crawl {code:java|title=org/apache/nutch/crawl/Crawl.java|borderStyle=solid} public static Configuration nutchConfig = null; public static void setNutchConfig(Configuration config) { nutchConfig = config; } {code} and re-configure nutch configuration inside of method main as below {code:java|title=org/apache/nutch/crawl/Crawl.java|borderStyle=solid} Configuration conf = null; if (nutchConfig != null) conf = nutchConfig; else conf = NutchConfiguration.createCrawlConfiguration(); {code} I recommend that solution :) Nutch crawler did not read configuration files -- Key: NUTCH-780 URL: https://issues.apache.org/jira/browse/NUTCH-780 Project: Nutch Issue Type: Bug Components: fetcher Affects Versions: 1.0.0 Reporter: Vu Hoang Nutch searcher can read properties at the constructor ... {code:java|title=NutchSearcher.java|borderStyle=solid} NutchBean bean = new NutchBean(getFilesystem().getConf(), fs); ... // put search engine code here {code} ... but Nutch crawler is not, it only reads data from arguments. {code:java|title=NutchCrawler.java|borderStyle=solid} StringBuilder builder = new StringBuilder(); builder.append(domainlist + SPACE); builder.append(ARGUMENT_CRAWL_DIR); builder.append(domainlist + SUBFIX_CRAWLED + SPACE); builder.append(ARGUMENT_CRAWL_THREADS); builder.append(threads + SPACE); builder.append(ARGUMENT_CRAWL_DEPTH); builder.append(depth + SPACE); builder.append(ARGUMENT_CRAWL_TOPN); builder.append(topN + SPACE); Crawl.main(builder.toString().split(SPACE)); {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (NUTCH-780) Nutch crawler did not read configuration files
[ https://issues.apache.org/jira/browse/NUTCH-780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vu Hoang updated NUTCH-780: --- Attachment: NUTCH-780.patch Nutch crawler did not read configuration files -- Key: NUTCH-780 URL: https://issues.apache.org/jira/browse/NUTCH-780 Project: Nutch Issue Type: Bug Components: fetcher Affects Versions: 1.0.0 Reporter: Vu Hoang Attachments: NUTCH-780.patch Nutch searcher can read properties at the constructor ... {code:java|title=NutchSearcher.java|borderStyle=solid} NutchBean bean = new NutchBean(getFilesystem().getConf(), fs); ... // put search engine code here {code} ... but Nutch crawler is not, it only reads data from arguments. {code:java|title=NutchCrawler.java|borderStyle=solid} StringBuilder builder = new StringBuilder(); builder.append(domainlist + SPACE); builder.append(ARGUMENT_CRAWL_DIR); builder.append(domainlist + SUBFIX_CRAWLED + SPACE); builder.append(ARGUMENT_CRAWL_THREADS); builder.append(threads + SPACE); builder.append(ARGUMENT_CRAWL_DEPTH); builder.append(depth + SPACE); builder.append(ARGUMENT_CRAWL_TOPN); builder.append(topN + SPACE); Crawl.main(builder.toString().split(SPACE)); {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (NUTCH-780) Nutch crawler did not read configuration files
[ https://issues.apache.org/jira/browse/NUTCH-780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vu Hoang updated NUTCH-780: --- Patch Info: [Patch Available] Nutch crawler did not read configuration files -- Key: NUTCH-780 URL: https://issues.apache.org/jira/browse/NUTCH-780 Project: Nutch Issue Type: Bug Components: fetcher Affects Versions: 1.0.0 Reporter: Vu Hoang Attachments: NUTCH-780.patch Nutch searcher can read properties at the constructor ... {code:java|title=NutchSearcher.java|borderStyle=solid} NutchBean bean = new NutchBean(getFilesystem().getConf(), fs); ... // put search engine code here {code} ... but Nutch crawler is not, it only reads data from arguments. {code:java|title=NutchCrawler.java|borderStyle=solid} StringBuilder builder = new StringBuilder(); builder.append(domainlist + SPACE); builder.append(ARGUMENT_CRAWL_DIR); builder.append(domainlist + SUBFIX_CRAWLED + SPACE); builder.append(ARGUMENT_CRAWL_THREADS); builder.append(threads + SPACE); builder.append(ARGUMENT_CRAWL_DEPTH); builder.append(depth + SPACE); builder.append(ARGUMENT_CRAWL_TOPN); builder.append(topN + SPACE); Crawl.main(builder.toString().split(SPACE)); {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.