Congrats Feng. Welcome onboard.
On Tue, Mar 12, 2013 at 6:43 PM, lewis john mcgibbney lewi...@apache.orgwrote:
Hi Everyone,
On behalf of the Nutch PMC I would like to announce and welcome Feng Lu on
board as PMC and Committer on the project.
Amongst others, Feng has been an important part
Hi Folks,
I found out where the issue was. Just thought it might be useful for others.
The performance issue I was facing in parse was due to the regular
expression URL filter and funny URL. regex-URLfilter plugin. One of the
regular expression was taking long... very long to process for some
Thank you Ye for updating us with your findings. It is best to use the
latest version of Nutch since there are updates and fixes for each version
On Sun, Mar 17, 2013 at 3:48 AM, ytthet yethura.t...@gmail.com wrote:
Hi Folks,
I found out where the issue was. Just thought it might be useful
Thanks a lot to everyone for inviting me.
I'm a software engineer in China, I have been using Apache Nutch for three
years. In our team, I mainly responsible for modifying nutch 1.x to suit
the requirements of our database Mongodb. So i also write a simple database
abstraction layer to adapt
yes, the property is fetcher.timelimit.mins. if you not set this property,
the QueueFeeder will not filter the url and log output may like this
QueueFeeder finished: total 36651 records + hit by time limit :0
Do you use bin/crawl command script. it will set the time limit for
fetching to 180.
I am using bin/crawl - I'll change the timeLimitFetch to something a bit
higher.
Thanks!
On Sun, Mar 17, 2013 at 5:07 PM, feng lu amuseme...@gmail.com wrote:
yes, the property is fetcher.timelimit.mins. if you not set this property,
the QueueFeeder will not filter the url and log output may
Hi
Are there any plans to make nutch 1.x support solr cloud?
I'm using nutch 1.4 and solr 4.0.
So far I've managed to work with this because solrj CommonsHttpSolrServer
somehow works with solr cloud, though it doesn't exist in solr-4.0.
This is inconvenient because CommonsHttpSolrServer gets a
I am getting following exception when indexing documents to Solr from Nutch.
org.apache.solr.common.SolrException: An invalid XML character (Unicode:
0x) was found in the element content of the document.
Please let me know on how to resolve this.
I am using Nutch 1.6 for crawling and
Hi,
You are always encouraged to look at our Jira instance before asking
questions. It really helps both you and us solve problems efficiently.
Please check out
https://issues.apache.org/jira/browse/NUTCH-1377
And comment where you can.
When we eventually do the entire out of the box upgrade to
Hi Neeraj
schema-solr4.xml does not work with Solr 4.1.0. Maybe you can add this
patch[0] and run again.
[0] https://issues.apache.org/jira/browse/NUTCH-1486
On Mon, Mar 18, 2013 at 2:34 AM, neeraj neerajbhol...@yahoo.com wrote:
I am getting following exception when indexing documents to
Amuseme,
Thanks for the reply. I reviewed the exceptions given on the link and I
am not getting any of those. I have more than 5 million documents crawled
and was able to index 120 K documents to Solr before this exception occurred
for invalid XML character.
I was trying to investigate around
yes, NUTCH-1016 already fixed this problem.
The property parser.character.encoding.default is used when
EncodingDetctor can not detected the content encoding. It set the defaut
encoding to this page content. If this detection is wrong, sometimes it
will result unreadable code of parse content.
I am not sure whether this error is caused by this property
parser.character.encoding.default Can you trace this error back to a
specific document? So you can create a test enviroment and parserindex
that document again. See what happens.
On Mon, Mar 18, 2013 at 12:17 PM, neeraj
13 matches
Mail list logo