[Nutch-dev] [jira] Commented: (NUTCH-518) Fix OpicScoringFilter to respect scoring filter chaining

2007-07-19 Thread Enis Soztutar (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513819 ] Enis Soztutar commented on NUTCH-518: - Since there is no ordering among scoring filters, if we do something

[Nutch-dev] [jira] Commented: (NUTCH-518) Fix OpicScoringFilter to respect scoring filter chaining

2007-07-19 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513821 ] Doğacan Güney commented on NUTCH-518: - This is another alternative. I am not suggesting that we use it but just

[Nutch-dev] [jira] Commented: (NUTCH-518) Fix OpicScoringFilter to respect scoring filter chaining

2007-07-19 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513823 ] Doğacan Güney commented on NUTCH-518: - Btw, I think removing initial score arguments and merging scores in

[Nutch-dev] [jira] Commented: (NUTCH-518) Fix OpicScoringFilter to respect scoring filter chaining

2007-07-19 Thread Enis Soztutar (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513826 ] Enis Soztutar commented on NUTCH-518: - I think removing initial score arguments and merging scores in

[Nutch-dev] resending this query on running nutch on nfs

2007-07-19 Thread prem kumar
I tried running hadoop on a nfs mounted home directory on a single node. But as the following tip on the wiki says, I am stuck with an issue: Don't use DFS on an NFS mount. DFS uses locks, and NFS may be configured to not allow them. How to figure out if NFS uses locks ? Is there a work around

[Nutch-dev] [jira] Created: (NUTCH-520) A common infrastructure for different index backends

2007-07-19 Thread JIRA
A common infrastructure for different index backends Key: NUTCH-520 URL: https://issues.apache.org/jira/browse/NUTCH-520 Project: Nutch Issue Type: Improvement Components:

[Nutch-dev] [jira] Commented: (NUTCH-518) Fix OpicScoringFilter to respect scoring filter chaining

2007-07-19 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513853 ] Andrzej Bialecki commented on NUTCH-518: - IMHO this change is not helpful. It takes away too much control

[Nutch-dev] [jira] Created: (NUTCH-521) Modified injector to allow newly injected CrawlDatum to overwrite original

2007-07-19 Thread Rob Young (JIRA)
Modified injector to allow newly injected CrawlDatum to overwrite original -- Key: NUTCH-521 URL: https://issues.apache.org/jira/browse/NUTCH-521 Project: Nutch Issue

[Nutch-dev] [jira] Updated: (NUTCH-521) Modified injector to allow newly injected CrawlDatum to overwrite original

2007-07-19 Thread Rob Young (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rob Young updated NUTCH-521: Attachment: inject.patch Modified injector to allow newly injected CrawlDatum to overwrite original

[Nutch-dev] Looking to fix relative path issue in linkdb

2007-07-19 Thread Robert Young
In org.apache.nutch.crawl.LinkDb on line 261 it creates a working directory (newLinkDb) based on the current working directory. This should be configurable rather than being based on where Tomcat was started. I am planning on writing a patch to pull the hadoop.tmp.dir setting if it is available,

Re: [Nutch-dev] Looking to fix relative path issue in linkdb

2007-07-19 Thread Andrzej Bialecki
Robert Young wrote: In org.apache.nutch.crawl.LinkDb on line 261 it creates a working directory (newLinkDb) based on the current working directory. This should be configurable rather than being based on where Tomcat was started. I am planning on writing a patch to pull the hadoop.tmp.dir

[Nutch-dev] [jira] Commented: (NUTCH-521) Modified injector to allow newly injected CrawlDatum to overwrite original

2007-07-19 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513870 ] Doğacan Güney commented on NUTCH-521: - AFAICS, you didn't give users a way to specify whether they want to

[Nutch-dev] [jira] Created: (NUTCH-522) Use URLValidator in the Injector

2007-07-19 Thread Emmanuel Joke (JIRA)
Use URLValidator in the Injector Key: NUTCH-522 URL: https://issues.apache.org/jira/browse/NUTCH-522 Project: Nutch Issue Type: Improvement Components: injector Reporter: Emmanuel Joke

[Nutch-dev] [jira] Updated: (NUTCH-522) Use URLValidator in the Injector

2007-07-19 Thread Emmanuel Joke (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emmanuel Joke updated NUTCH-522: Attachment: NUTCH-522.patch Patch provided Use URLValidator in the Injector

Re: [Nutch-dev] Looking to fix relative path issue in linkdb

2007-07-19 Thread Robert Young
Tomcat only comes into it because we have to start Tomcat in the searcher directory, I'm guessing it's the same however you choose to use Nutch. It would still have to do a rename across physical volumes if searcher.dir is set to something different would it not? How does this sound as a

Re: [Nutch-dev] Looking to fix relative path issue in linkdb

2007-07-19 Thread Briggs
I don't use the nutch web application, but You don't have to start nutch in the searcher directory. You can set the location of the searcher dir within the nutch-site.xml config file. Add this node and set the location of your index: property namesearcher.dir/name

Re: [Nutch-dev] OOM error during parsing with nekohtml

2007-07-19 Thread Shailendra Mudgal
Hi , After replacing it with the Throwable, it safely parsed that page, but got the same OOM Error during the parse of http://lcweb2.loc.gov/ndlpcoop/nicmoas/livn-2/liv n0181.sgm. But this time it seems that the error occured at line 78 . Here is the stacktrace. (The same page we cant parse

[Nutch-dev] [jira] Commented: (NUTCH-522) Use URLValidator in the Injector

2007-07-19 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513895 ] Doğacan Güney commented on NUTCH-522: - I like the idea, but your patch seems to have a bug. Now injector only

[Nutch-dev] 德国少女情欲水★美臀夹阴2代

2007-07-19 Thread 爱欲高潮
欢迎访问七彩谷成人用品商城 http://jow.7cv.com ・美国SizePro增大丸 ・德国火焰壮阳片   ・德国金刚片 ・大将军胶囊 ・巴西壮阳果胶囊   ・超级猛男壮阳组合  ・宝和超浓缩海狗丸  ・雪域藏獒生物胶囊  ・中华猛男王健力片  ・印度种马延时胶囊  ・蚁力回春丹   http://jow.7cv.com ★ [露乳] 火辣露乳开档―娇艳欲滴露乳裙   ★ [薄纱装] 轻透薄纱装―绝色倾城短裙  ★ [丝袜] 游戏丝袜―性感女神开裆连身袜  ★ [小裤] 情趣小裤―花解语刺绣开裆小裤  http://jow.7cv.com

Re: [Nutch-dev] Looking to fix relative path issue in linkdb

2007-07-19 Thread Robert Young
Yes, I do this for the searcher directory but in the LinkDb class it makes a reference to a Path which is relative (just for a temporary working directory). This is the problem, because if I start tomcat in a path where the java user does not have permissions to create a directory then LinkDb

Re: [Nutch-dev] Looking to fix relative path issue in linkdb

2007-07-19 Thread Briggs
Ahh, now I see what you are referring to. Thanks for the question. Now I know why I was getting garbage in my directory a while back. So, I guess you may need to edit that class. Are you using hadoop in local mode? On 7/19/07, Robert Young [EMAIL PROTECTED] wrote: Yes, I do this for the

[Nutch-dev] [jira] Updated: (NUTCH-522) Use URLValidator in the Injector

2007-07-19 Thread Emmanuel Joke (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emmanuel Joke updated NUTCH-522: Attachment: NUTCH-522_v2.patch Oops, my mistake. Please find an updated patch. Actually I've a

[Nutch-dev] 您们好

2007-07-19 Thread gfegtuytujtedsarfg
---广东粤鹏发有限公司- 致:(财务/经理)---您们好! 目前由于,我司有部分余额税票;现可向全国各地中小城市提供优惠代开。 本公司郑重承诺所开票据均可上网查询验证; 更希望能够有机会与贵司合作!此信息长期有效有需者敬请保留,以备后用---谢谢! 负责人:刘尉 手机:13692212010 - This SF.net