Il 13/03/2010 22.55, Susam Pal ha scritto:
On Fri, Mar 12, 2010 at 3:17 PM, Susam Pal wrote:
On Fri, Mar 12, 2010 at 2:09 PM, Graziano Aliberti
wrote:
Il 11/03/2010 16.20, Susam Pal ha scritto:
On Thu, Mar 11, 2010 at 8:24 PM, Graziano Aliberti
wrote:
Hi ev
Adam,
Could you please tell us what the http and https entries look like in the
crawlDB (using readdb -url)?
J.
--
DigitalPebble Ltd
http://www.digitalpebble.com
On 13 March 2010 04:29, BELLINI ADAM wrote:
>
> no one have an answer !?
>
>
>
>
>
> > From: mbel...@msn.com
> > To: nutch-user@luc
Hi
thx for your help,
this is a fresh crwal of today:
1- HTTP:
bin/nutch readdb crawl_portal/crawldb/ -url http://myDNS/index.html
URL: http://myDNS/index.html
Version: 7
Status: 4 (db_redir_temp)
Fetch time: Mon Mar 15 12:15:52 EDT 2010
Modified time: Wed Dec 31 19:00:00 EST 1969
Retries sinc
Hello everyone
I'm trying to add a new plugin to Nutch/Solr for having new fields and
finally searching about it in the terminal interface.
For that , i have readen the howto WritingPluginexample 0.9 from the apache
wiki and i 'm trying to doing that .
I have a problem with the building of the
>
> and as i said the last day, on my segment the https has an empty content.
hmm it's not what you said in your previous message + I can see it has a
signature in the crawlDB so it must have a content.
I expect that the content would be indexed under the http:// URL thanks to
*_repr_: **http:/
HELLO
There is'nt anyone who know from where this problem came ?
PLEASE A HELP
2010/3/15 Arnaud Garcia
> Hello everyone
>
> I'm trying to add a new plugin to Nutch/Solr for having new fields and
> finally searching about it in the terminal interface.
>
> For that , i have readen the howto
Hi
Obviously You didn't include necessary references to other JARs or source
directories. It's configured in plugin.xml
Check the file and add there all necessary references. Remember that each
plugin is compiled separately and it doesn't know about other plugins.
Compare your ANT files with files
Oh sorry i mistook again, and yes you are complitely right
1- The HTTPS has a content in my segment.
2- the HTTP has an empty content.
in
my index i have the HTTPS url with the empty content (...it's exactely
what you said : it's just mixing the HTTPS url with
the content of the HTTP one,)
hi again,
i forgot to ask what does mean _repr_ ?
> From: mbel...@msn.com
> To: nutch-user@lucene.apache.org
> Subject: RE: Content of redirected urls empty
> Date: Mon, 15 Mar 2010 15:29:48 +
>
>
>
>
> Oh sorry i mistook again, and yes you are complitely right
> 1- The HTTPS h
> my index i have the HTTPS url with the empty content (...it's exactely
> what you said : it's just mixing the HTTPS url with
> the content of the HTTP one,) and i expected the other way round : the
> HTTPS content *with* the HTTP URL.
>
strange
>
> i dont know if i have the HTTP url in my ind
Hi,
I'm a new nutch user. My company wants me to look into using this technology
to index our internal wiki website as well as sharepoint docs (using tika).
Right now, I just want nutch to index the entire wiki site but I'm having
problems. I've read other people's problems with this but I haven
On Mon, Mar 15, 2010 at 2:32 PM, Graziano Aliberti
wrote:
> Il 13/03/2010 22.55, Susam Pal ha scritto:
>>
>> On Fri, Mar 12, 2010 at 3:17 PM, Susam Pal wrote:
>>
>>>
>>> On Fri, Mar 12, 2010 at 2:09 PM, Graziano Aliberti
>>> wrote:
>>>
Il 11/03/2010 16.20, Susam Pal ha scritto:
>
Hi,
finaly i learned how to display only indexed URLs in the solr index
the url is http://localhost:8080/solr/select/?q=*:*&fl=url,content
q=*:* is for all entries in the index
&fl=url,content display only urls and their content.
Now i'm 100 % sure that i dont have the source HTTP urls in
On Tue, Mar 16, 2010 at 12:55 AM, Susam Pal wrote:
> On Mon, Mar 15, 2010 at 2:32 PM, Graziano Aliberti
> wrote:
>> Il 13/03/2010 22.55, Susam Pal ha scritto:
>>>
>>> On Fri, Mar 12, 2010 at 3:17 PM, Susam Pal wrote:
>>>
On Fri, Mar 12, 2010 at 2:09 PM, Graziano Aliberti
wrote:
14 matches
Mail list logo