Hi Fred,
Please ensure that the linkdb command was executed succesfully. The output
logs do not indicate this.
Looks like you've got a '-' minus character in from of the relative linkdb
directory as well.
HTH
On Wed, Oct 26, 2011 at 1:27 AM, Fred Zimmerman zimzaz@gmail.comwrote:
I'm still
Besises, the -linkdb param is 1.4 not 1.3
that's what's wrong here. Bai explicitely mentioned 1.4
Hi Fred,
Please ensure that the linkdb command was executed succesfully. The output
logs do not indicate this.
Looks like you've got a '-' minus character in from of the relative linkdb
OK, I've fixed the problem with the parameters giving incorrect paths to the
files. Now I get this:
$ bin/nutch solrindex http://search.zimzaz.com:8983/solr crawl/crawldb
crawl/linkdb crawl/segments/*
SolrIndexer: starting at 2011-10-26 12:57:57
java.io.IOException: Job failed!
that's it.
org.apache.solr.common.SolrException: ERROR:unknown field 'content'
*ERROR:unknown field 'content'*
request: http://search.zimzaz.com:8983/solr/update?wt=javabinversion=2
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:436)
Add the schema.xml from nutch/conf to your Solr core.
btw: be careful with your host and port in the mailing lists. If it's open
On Wednesday 26 October 2011 15:07:56 Fred Zimmerman wrote:
that's it.
org.apache.solr.common.SolrException: ERROR:unknown field 'content'
*ERROR:unknown
I added just the content field ... I have already modified solr's
schema.xml to accommodate some other data types.
Now when starting solr ...
INFO: SolrUpdateServlet.init() done
2011-10-26 13:29:50.849:INFO::Started SocketConnector@0.0.0.0:8983
2011-10-26 13:30:23.129:WARN::/solr/admin/
Hi Fred,
These are clearly Solr aimed questions, which I would observe are specific
to your schema. Maybe try the Solr archives for key words or else try the
Solr user lists.I think that you are much more likely to get a substantiated
response there.
Thank you
On Wed, Oct 26, 2011 at 3:31 PM,
will do. Of course I have already googled these terms without much luck.
Fred
On Wed, Oct 26, 2011 at 9:34 AM, lewis john mcgibbney
lewis.mcgibb...@gmail.com wrote:
Hi Fred,
These are clearly Solr aimed questions, which I would observe are specific
to your schema. Maybe try the Solr
On Tue, Oct 25, 2011 at 1:25 PM, Markus Jelsma
markus.jel...@openindex.iowrote:
Is there a reason to keep a segment around after it's been indexed? When
following the tutorial, I ended up sending the same segment to the solr
server multiple times because I was using segments/* as my
Gotcha. Maybe I'll see about starting a 1.4 version of the tutorial. Not
sure if I'll have time, though.
On Tue, Oct 25, 2011 at 2:14 PM, lewis john mcgibbney
lewis.mcgibb...@gmail.com wrote:
Thanks, this is now sorted out.
For refernce, you can sign up and commit your own changes to the
On Wednesday 26 October 2011 16:24:15 Bai Shen wrote:
On Tue, Oct 25, 2011 at 1:25 PM, Markus Jelsma
markus.jel...@openindex.iowrote:
Is there a reason to keep a segment around after it's been indexed?
When following the tutorial, I ended up sending the same segment to
the solr
1) I resolved the issues with solrindex. It turned out to be a matter of
adding all the nutch schema-specific fields to solr's schema.xml. there was
one gotcha which is that the latest solr schema does not have a default
fieldtype text as in Nutch 1.3/schema.xml; you must use text_general. A
1.3 will cover 1.4. The main point was regarding the change in architecture
when taking into consideration the new runtime directory structure which was
introduced in Nutch 1.3.
Feel free to join me on getting a Hadoop tutorial for 1.4. I'ts been on the
agenda but somewhat shelved.
On Wed, Oct
I've got a few very large (upwards of 3 MB) XML files I'm trying to index, and
I'm having trouble. Previously I'd had trouble with the fetch; now that seems
to be okay, but due to the size of the files the parse takes much too long.
Is there a good way to optimize this that I'm missing? Is
On Wednesday 26 October 2011 16:37:14 Fred Zimmerman wrote:
1) I resolved the issues with solrindex. It turned out to be a matter of
adding all the nutch schema-specific fields to solr's schema.xml. there
was one gotcha which is that the latest solr schema does not have a
default fieldtype
The actual parse which is producing time outs happens early in the process.
There are, to my knowledge, no Nutch settings to make this faster or change
its behaviour, it's all about the parser implementation.
Try increasing your parser.timeout setting.
On Wednesday 26 October 2011 16:45:33
Increasing parser.timeout to 3600 got me what I needed. I only have a few files
this huge, so I'll live with that.
-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io]
Sent: Wednesday, October 26, 2011 10:55 AM
To: user@nutch.apache.org
Subject: Re: Extremely long
Hi Markus,
the error resembles a problem I've observed some time ago but never managed
to open an issue. Opened right now:
https://issues.apache.org/jira/browse/NUTCH-1182
The stack you observed is the same.
Sebastian
On 10/19/2011 05:01 PM, Markus Jelsma wrote:
Hi,
We sometimes see a
18 matches
Mail list logo