Re: Proxy Authentication

2010-03-15 Thread Graziano Aliberti
Il 13/03/2010 22.55, Susam Pal ha scritto: On Fri, Mar 12, 2010 at 3:17 PM, Susam Pal wrote: On Fri, Mar 12, 2010 at 2:09 PM, Graziano Aliberti wrote: Il 11/03/2010 16.20, Susam Pal ha scritto: On Thu, Mar 11, 2010 at 8:24 PM, Graziano Aliberti wrote: Hi ev

Re: Content of redirected urls empty

2010-03-15 Thread Julien Nioche
Adam, Could you please tell us what the http and https entries look like in the crawlDB (using readdb -url)? J. -- DigitalPebble Ltd http://www.digitalpebble.com On 13 March 2010 04:29, BELLINI ADAM wrote: > > no one have an answer !? > > > > > > > From: mbel...@msn.com > > To: nutch-user@luc

RE: Content of redirected urls empty

2010-03-15 Thread BELLINI ADAM
Hi thx for your help, this is a fresh crwal of today: 1- HTTP: bin/nutch readdb crawl_portal/crawldb/ -url http://myDNS/index.html URL: http://myDNS/index.html Version: 7 Status: 4 (db_redir_temp) Fetch time: Mon Mar 15 12:15:52 EDT 2010 Modified time: Wed Dec 31 19:00:00 EST 1969 Retries sinc

Problem with ANT in building new Plugin for Nutch 1.0 ----- error in finding classes in packages

2010-03-15 Thread Arnaud Garcia
Hello everyone I'm trying to add a new plugin to Nutch/Solr for having new fields and finally searching about it in the terminal interface. For that , i have readen the howto WritingPluginexample 0.9 from the apache wiki and i 'm trying to doing that . I have a problem with the building of the

Re: Content of redirected urls empty

2010-03-15 Thread Julien Nioche
> > and as i said the last day, on my segment the https has an empty content. hmm it's not what you said in your previous message + I can see it has a signature in the crawlDB so it must have a content. I expect that the content would be indexed under the http:// URL thanks to *_repr_: **http:/

Re: Problem with ANT in building new Plugin for Nutch 1.0 ----- error in finding classes in packages

2010-03-15 Thread Arnaud Garcia
HELLO There is'nt anyone who know from where this problem came ? PLEASE A HELP 2010/3/15 Arnaud Garcia > Hello everyone > > I'm trying to add a new plugin to Nutch/Solr for having new fields and > finally searching about it in the terminal interface. > > For that , i have readen the howto

Re: Problem with ANT in building new Plugin for Nutch 1.0 ----- error in finding classes in packages

2010-03-15 Thread Alexander Aristov
Hi Obviously You didn't include necessary references to other JARs or source directories. It's configured in plugin.xml Check the file and add there all necessary references. Remember that each plugin is compiled separately and it doesn't know about other plugins. Compare your ANT files with files

RE: Content of redirected urls empty

2010-03-15 Thread BELLINI ADAM
Oh sorry i mistook again, and yes you are complitely right 1- The HTTPS has a content in my segment. 2- the HTTP has an empty content. in my index i have the HTTPS url with the empty content (...it's exactely what you said : it's just mixing the HTTPS url with the content of the HTTP one,)

RE: Content of redirected urls empty

2010-03-15 Thread BELLINI ADAM
hi again, i forgot to ask what does mean _repr_ ? > From: mbel...@msn.com > To: nutch-user@lucene.apache.org > Subject: RE: Content of redirected urls empty > Date: Mon, 15 Mar 2010 15:29:48 + > > > > > Oh sorry i mistook again, and yes you are complitely right > 1- The HTTPS h

Re: Content of redirected urls empty

2010-03-15 Thread Julien Nioche
> my index i have the HTTPS url with the empty content (...it's exactely > what you said : it's just mixing the HTTPS url with > the content of the HTTP one,) and i expected the other way round : the > HTTPS content *with* the HTTP URL. > strange > > i dont know if i have the HTTP url in my ind

problem crawling entire internal website

2010-03-15 Thread ksee
Hi, I'm a new nutch user. My company wants me to look into using this technology to index our internal wiki website as well as sharepoint docs (using tika). Right now, I just want nutch to index the entire wiki site but I'm having problems. I've read other people's problems with this but I haven

Re: Proxy Authentication

2010-03-15 Thread Susam Pal
On Mon, Mar 15, 2010 at 2:32 PM, Graziano Aliberti wrote: > Il 13/03/2010 22.55, Susam Pal ha scritto: >> >> On Fri, Mar 12, 2010 at 3:17 PM, Susam Pal  wrote: >> >>> >>> On Fri, Mar 12, 2010 at 2:09 PM, Graziano Aliberti >>>  wrote: >>> Il 11/03/2010 16.20, Susam Pal ha scritto: >

RE: Content of redirected urls empty

2010-03-15 Thread BELLINI ADAM
Hi, finaly i learned how to display only indexed URLs in the solr index the url is http://localhost:8080/solr/select/?q=*:*&fl=url,content q=*:* is for all entries in the index &fl=url,content display only urls and their content. Now i'm 100 % sure that i dont have the source HTTP urls in

Re: Proxy Authentication

2010-03-15 Thread Susam Pal
On Tue, Mar 16, 2010 at 12:55 AM, Susam Pal wrote: > On Mon, Mar 15, 2010 at 2:32 PM, Graziano Aliberti > wrote: >> Il 13/03/2010 22.55, Susam Pal ha scritto: >>> >>> On Fri, Mar 12, 2010 at 3:17 PM, Susam Pal  wrote: >>> On Fri, Mar 12, 2010 at 2:09 PM, Graziano Aliberti  wrote: