Not sure if it matters, but what data center are you using? Maybe the data
center region uses different characters if the native language isn't english
On Wed, Aug 8, 2012 at 7:25 AM, Niccolò Becchi niccolo.bec...@gmail.comwrote:
Hi,
I have been using Nutch for fetching english sites (UTF-8
Saravanan,
Did you add HTTP agent name to nutch-site.xml? If not add below line
property
namehttp.agent.name/name
valueMy Nutch Spider/value
/property
On Fri, Aug 3, 2012 at 7:01 AM, Saravanan S saravanat...@gmail.com wrote:
Dear all,
I am new to nutch. Just installed and
Hello,
I wanted to know at what point does Nutch stop keeping the HTML page? My
issue is I need to be able to extract certain info from a page, for example:
username
description
photo
profile link
there may be multiple profiles on each page, and my understanding is
currently Nutch has an issue
it is only done when parsing during fetch is set to
true, otherwise it is not loaded at all. A separate (re)parser job is able
to load the DOM too.
Ferdy
On Fri, Aug 3, 2012 at 1:43 PM, X3C TECH t...@x3chaos.com wrote:
Hello,
I wanted to know at what point does Nutch stop keeping the HTML
Sorry I couldn't be more help. One thing I'd suggest, is instead of Cygwin
try using VMWare player or Virtualbox. Both are free and provide a native
virtual Linux environment, so much easier to manage then Cygwin.
On Fri, Aug 3, 2012 at 5:45 AM, veryblues_cn lhn...@gmail.com wrote:
Hi X3C TECH
AHHH try replacing current schema with the one from the SVN, if I'm not
mistaken they just updated the schema due to this issue
On Wed, Aug 1, 2012 at 1:52 AM, veryblues_cn lhn...@gmail.com wrote:
Hi X3C TECH,
There is an error thrown if I replace the solr's schema.xml with nutch's
schema.xml
http://svn.apache.org/viewvc/nutch/branches/branch-1.5.1/conf/schema.xml
On Wed, Aug 1, 2012 at 12:22 PM, X3C TECH t...@x3chaos.com wrote:
AHHH try replacing current schema with the one from the SVN, if I'm not
mistaken they just updated the schema due to this issue
On Wed, Aug 1, 2012 at 1
For Cygwin, make sure the Priveleged User is set as administrator in your
Windows setup
On Mon, Jul 30, 2012 at 10:19 PM, veryblues_cn lhn...@gmail.com wrote:
Hi,
I just installed the same version Nutch and solr in cgwin, but I met an
error like that:
*ERROR security.UserGroupInformation -
Not sure if i'm correct, but worth a try
In your regex
-^http://([a-z0-9\-A-Z]*\.)*www.elaweb.org.uk/resources/type.aspx.*
to me that looks like the . at the end of aspx shouldn't be there, as it
would ignore aspx.php or aspx.? let's say, but not aspx?.
On Tue, Jul 31, 2012 at 3:59 AM, Ian Piper
What's the actual error thrown? or is it just completing without anything
added to the index?
On Tue, Jul 31, 2012 at 3:30 AM, veryblues_cn lhn...@gmail.com wrote:
My environment is win7,Tomcat 6.0 ,cygwin,nutch 1.5.1,solr 3.60
I downloaded both the nutch-1.5.1 src and bin zip package ,hadoop
check the hadoop log and see if there is any obvious exception errors...
also try running individual commands and see if something strange happens,
i.e. parse says nothing to parse. this may help for command usage
http://wiki.apache.org/nutch/NutchTutorial#A6._Integrate_Solr_with_Nutch
also I
Hello,
Has anyone been successful in hooking up Nutch 2 with Solr4?
I seem to have my config screwed up somehow. I've added the Nutch fields to
Solr's example schema and changed the field type from text' to
text_general
However when I index, I get the message
SolrIndexerJob:starting
Forgot to do Specs
VMWare Machine with CentOS 6.3
On Sun, Jul 29, 2012 at 1:53 PM, X3C TECH t...@x3chaos.com wrote:
Hello,
Has anyone been successful in hooking up Nutch 2 with Solr4?
I seem to have my config screwed up somehow. I've added the Nutch fields
to Solr's example schema
-
From: X3C TECH t...@x3chaos.com
To: user user@nutch.apache.org
Sent: Sun, Jul 29, 2012 10:58 am
Subject: Re: Nutch 2.0 Solr 4.0 Alpha
Forgot to do Specs
VMWare Machine with CentOS 6.3
On Sun, Jul 29, 2012 at 1:53 PM, X3C TECH t...@x3chaos.com wrote:
Hello,
Has anyone been successful
:53 PM, X3C TECH t...@x3chaos.com wrote:
Hello,
Has anyone been successful in hooking up Nutch 2 with Solr4?
I seem to have my config screwed up somehow. I've added the Nutch fields
to
Solr's example schema and changed the field type from text' to
text_general
However when I index, I get
Lewis,
I just ran the crawl command with the solrindex argument. I'll try to rerun
it with single commands
On Sun, Jul 29, 2012 at 2:53 PM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
Sorry
On Sun, Jul 29, 2012 at 7:53 PM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
the SolrIndexerJob alone (outside the Crawler
tool)? Please do so and post commandline / nutch config / logs.
Cheers,
Mathijs
On Jul 29, 2012, at 20:33 , X3C TECH t...@x3chaos.com wrote:
Hi Lewis,
Thanks for below, I just ran it on the new schema. Funny thing is in
Solr's
example/logs
the system throws a runtime error. Should I
just add it to schema? and if so what would be the definition, i.e.
text_general, indexed?
On Sun, Jul 29, 2012 at 2:56 PM, X3C TECH t...@x3chaos.com wrote:
OK I reran it with the commands
inject,generate,fetch,parse,updatedb,solrindex
solrindex -all
18 matches
Mail list logo