Hi,
I am working with a Nutch 1.4 snapshot and having a very strange problem that
makes the system run out of memory when indexing into Solr. This does not look
like a trivial lack of memory problem that can be solved by giving more memory
to the JVM. I've increased the max memory size from 2Gb
Hi Markus,
the error resembles a problem I've observed some time ago but never managed
to open an issue. Opened right now:
https://issues.apache.org/jira/browse/NUTCH-1182
The stack you observed is the same.
Sebastian
On 10/19/2011 05:01 PM, Markus Jelsma wrote:
Hi,
We sometimes see a fetche
Increasing parser.timeout to 3600 got me what I needed. I only have a few files
this huge, so I'll live with that.
-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io]
Sent: Wednesday, October 26, 2011 10:55 AM
To: user@nutch.apache.org
Subject: Re: Extremely long p
The actual parse which is producing time outs happens early in the process.
There are, to my knowledge, no Nutch settings to make this faster or change
its behaviour, it's all about the parser implementation.
Try increasing your parser.timeout setting.
On Wednesday 26 October 2011 16:45:33 Chip
On Wednesday 26 October 2011 16:37:14 Fred Zimmerman wrote:
> 1) I resolved the issues with solrindex. It turned out to be a matter of
> adding all the nutch schema-specific fields to solr's schema.xml. there
> was one gotcha which is that the latest solr schema does not have a
> default fieldty
I've got a few very large (upwards of 3 MB) XML files I'm trying to index, and
I'm having trouble. Previously I'd had trouble with the fetch; now that seems
to be okay, but due to the size of the files the parse takes much too long.
Is there a good way to optimize this that I'm missing? Is lengt
1.3 will cover 1.4. The main point was regarding the change in architecture
when taking into consideration the new runtime directory structure which was
introduced in Nutch 1.3.
Feel free to join me on getting a Hadoop tutorial for 1.4. I'ts been on the
agenda but somewhat shelved.
On Wed, Oct 26
1) I resolved the issues with solrindex. It turned out to be a matter of
adding all the nutch schema-specific fields to solr's schema.xml. there was
one gotcha which is that the latest solr schema does not have a default
fieldtype "text" as in Nutch 1.3/schema.xml; you must use "text_general". A
On Wednesday 26 October 2011 16:24:15 Bai Shen wrote:
> On Tue, Oct 25, 2011 at 1:25 PM, Markus Jelsma
>
> wrote:
> > > Is there a reason to keep a segment around after it's been indexed?
> > > When following the tutorial, I ended up sending the same segment to
> > > the solr server multiple ti
Gotcha. Maybe I'll see about starting a 1.4 version of the tutorial. Not
sure if I'll have time, though.
On Tue, Oct 25, 2011 at 2:14 PM, lewis john mcgibbney <
lewis.mcgibb...@gmail.com> wrote:
> Thanks, this is now sorted out.
>
> For refernce, you can sign up and commit your own changes to t
On Tue, Oct 25, 2011 at 1:25 PM, Markus Jelsma
wrote:
> > Is there a reason to keep a segment around after it's been indexed? When
> > following the tutorial, I ended up sending the same segment to the solr
> > server multiple times because I was using segments/* as my argument.
>
> Only send the
will do. Of course I have already googled these terms without much luck.
Fred
On Wed, Oct 26, 2011 at 9:34 AM, lewis john mcgibbney <
lewis.mcgibb...@gmail.com> wrote:
> Hi Fred,
>
> These are clearly Solr aimed questions, which I would observe are specific
> to your schema. Maybe try the Solr
Hi Fred,
These are clearly Solr aimed questions, which I would observe are specific
to your schema. Maybe try the Solr archives for key words or else try the
Solr user lists.I think that you are much more likely to get a substantiated
response there.
Thank you
On Wed, Oct 26, 2011 at 3:31 PM, Fr
I added just the field ... I have already modified solr's
schema.xml to accommodate some other data types.
Now when starting solr ...
INFO: SolrUpdateServlet.init() done
2011-10-26 13:29:50.849:INFO::Started SocketConnector@0.0.0.0:8983
2011-10-26 13:30:23.129:WARN::/solr/admin/
java.lang.Illega
Add the schema.xml from nutch/conf to your Solr core.
btw: be careful with your host and port in the mailing lists. If it's open
On Wednesday 26 October 2011 15:07:56 Fred Zimmerman wrote:
> that's it.
>
> org.apache.solr.common.SolrException: ERROR:unknown field 'content'
>
> *ERROR:unknow
that's it.
org.apache.solr.common.SolrException: ERROR:unknown field 'content'
*ERROR:unknown field 'content'*
request: http://search.zimzaz.com:8983/solr/update?wt=javabin&version=2
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:436)
Check your hadoop.log and Solr log. If that happens there's usually i field
mismatch when indexing.
On Wednesday 26 October 2011 14:59:02 Fred Zimmerman wrote:
> OK, I've fixed the problem with the parameters giving incorrect paths to
> the files. Now I get this:
>
> $ bin/nutch solrindex http:/
OK, I've fixed the problem with the parameters giving incorrect paths to the
files. Now I get this:
$ bin/nutch solrindex http://search.zimzaz.com:8983/solr crawl/crawldb
crawl/linkdb crawl/segments/*
SolrIndexer: starting at 2011-10-26 12:57:57
java.io.IOException: Job failed!
Besises, the -linkdb param is 1.4 not 1.3
that's what's wrong here. Bai explicitely mentioned 1.4
> Hi Fred,
>
> Please ensure that the linkdb command was executed succesfully. The output
> logs do not indicate this.
> Looks like you've got a '-' minus character in from of the relative linkdb
> d
19 matches
Mail list logo