Re: Focused crawling with nutch

2012-02-01 Thread Vijith
I am not pretty sure about whether i did the modification right coz there were some code missing in the Fetcher.java (1.4) as compared to the Patch - 2. Dats why i didnt attach the patch to the issue. Need confirmation on that. I have some how modified the patch to reflect the nutch 1.4

Re: Focused crawling with nutch

2012-02-01 Thread Lewis John Mcgibbney
Are you running in deploy (distributed mode?) Have you rebuilt your job jar. On Wed, Feb 1, 2012 at 10:53 AM, Vijith vijithkv...@gmail.com wrote: I am not pretty sure about whether i did the modification right coz there were some code missing in the Fetcher.java (1.4) as compared to the Patch

Re: Error with solrindex

2012-02-01 Thread Markus Jelsma
check Solr's log output On Wednesday 01 February 2012 15:00:55 Joshua J Pavel wrote: When I run my solrindex: bin/nutch solrindex http://testsite:port/solr/ crawl/crawldb crawl/linkdb crawl/segments/* I get this output: SolrIndexer: starting at 2012-02-01 13:32:56

Re: why nutch dosen't crawl Arabic sites well?

2012-02-01 Thread mina
i have no error in my log, has nutch an error for crawl Arabic sites? help me. On 1/31/12, remi tassing [via Lucene] ml-node+s472066n3704067...@n3.nabble.com wrote: Check your log for any error On Tuesday, January 31, 2012, Markus Jelsma markus.jel...@openindex.io wrote: By the way, please

Re: Bad Request in nutch when i use parsechecker?

2012-02-01 Thread Markus Jelsma
bin/nutch parsechecker

Re: Bad Request in nutch when i use parsechecker?

2012-02-01 Thread mina
how i can force nutch to encoding this url? i want give this url and then nutch encode it, i want set this task to nutch. i want nutch do: 1.get url then 2.encoding it what command encode an url? On 2/1/12, Markus Jelsma-2 [via Lucene] ml-node+s472066n3706875...@n3.nabble.com wrote:

Re: Bad Request in nutch when i use parsechecker?

2012-02-01 Thread Markus Jelsma
Nutch cannot do this right now. However, there's a patch that does the encoding. https://issues.apache.org/jira/browse/NUTCH-1098 On Wednesday 01 February 2012 16:26:06 mina wrote: how i can force nutch to encoding this url? i want give this url and then nutch encode it, i want set this task

Re: Error with solrindex

2012-02-01 Thread Joshua J Pavel
Thanks, that was just the nudge I needed. :-) So, the import is now working, but one of my custom fields isn't returning in the query: http://site:port/solr/select?indent=onversion=2.2q=queryfq=start=0rows=1fl=mtime,title,Metatagsqt=wt=explainOther=hl.fl=content,titlehl=on Apparently the

Re: why nutch dosen't crawl Arabic sites well?

2012-02-01 Thread remi tassing
Try the following command. It'll export all the urls that were crawled. [1] http://wiki.apache.org/nutch/bin/nutch_readdb Remi On Wednesday, February 1, 2012, mina tahereganji...@gmail.com wrote: i have no error in my log, has nutch an error for crawl Arabic sites? help me. On 1/31/12, remi

Re: Error with solrindex

2012-02-01 Thread Joshua J Pavel
Update: I can retrieve the results if I include a fq field in the query. I would expect to be able to query it by including fl with *. Can anyone explain this behavior to me, why running a Filter Query displays my field, but listing all fields does not? Is it time to take this to solr support?

Re: invalid uri with three dots

2012-02-01 Thread remi tassing
Problem solved! I replaced all whitespaces with %20 in the url before getting the content in httpreaponse.java(Httpclient plugin). Dirty solution? Yes, but it works for me now. Remi On Thursday, January 26, 2012, remi tassing tassingr...@gmail.com wrote: Hey guys, any ideas on how to

Re: why nutch dosen't crawl all links

2012-02-01 Thread mina
hi, i use this command: bin/nutch parsechecker -dumpText http://www.irna.ir/News/30786427/سوء-استفاده-از-نام-كمیته-امداد-برای-جمع-آوری-رای-در-مناطق-محروم/سياسي/ and see log: fetching: http://www.irna.ir/News/30786427/سوء-استفاده-از-نام-كمیته-امداد-برای-جمع-آوری-رای-در-مناطق-محروم/سياسي/

org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:

2012-02-01 Thread kaveh minooie
I do apologize in advance if what I am about to ask is strictly a hadoop problem, but I get it when I am trying to parse in nutch. I am running nutch1.4 over hadoop .20.203 on 7 computers ( 7 datanode (one of them is also the namenode and tasktracker as well) and i get this usually after a

Re: Solrdedup fails due to date format

2012-02-01 Thread alxsss
Hello, I took a look to source of SolrDeleteDuplicates class. The patch is already applied. Any ideas what might be wrong? I issue this command bin/nutch solrdedup http://127.0.0.1:8983/solr/ and the solr schema is the one that comes with nutch. Thanks in advance. Alex.

Re: Focused crawling with nutch

2012-02-01 Thread Vijith
I am running it in local mode and built nutch after patching. Without patching it works fine. On Wed, Feb 1, 2012 at 6:38 PM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: Are you running in deploy (distributed mode?) Have you rebuilt your job jar. On Wed, Feb 1, 2012 at 10:53 AM,