Re: Nutch Encoding on AWS

2012-08-08 Thread X3C TECH
Not sure if it matters, but what data center are you using? Maybe the data center region uses different characters if the native language isn't english On Wed, Aug 8, 2012 at 7:25 AM, Niccolò Becchi niccolo.bec...@gmail.comwrote: Hi, I have been using Nutch for fetching english sites (UTF-8

Re: Need help in setting up my First Crawler

2012-08-03 Thread X3C TECH
Saravanan, Did you add HTTP agent name to nutch-site.xml? If not add below line property namehttp.agent.name/name valueMy Nutch Spider/value /property On Fri, Aug 3, 2012 at 7:01 AM, Saravanan S saravanat...@gmail.com wrote: Dear all, I am new to nutch. Just installed and

Custom Meta Plugin

2012-08-03 Thread X3C TECH
Hello, I wanted to know at what point does Nutch stop keeping the HTML page? My issue is I need to be able to extract certain info from a page, for example: username description photo profile link there may be multiple profiles on each page, and my understanding is currently Nutch has an issue

Re: Custom Meta Plugin

2012-08-03 Thread X3C TECH
it is only done when parsing during fetch is set to true, otherwise it is not loaded at all. A separate (re)parser job is able to load the DOM too. Ferdy On Fri, Aug 3, 2012 at 1:43 PM, X3C TECH t...@x3chaos.com wrote: Hello, I wanted to know at what point does Nutch stop keeping the HTML

Re: No output to solr, no running error, with my install and config of nutch

2012-08-03 Thread X3C TECH
Sorry I couldn't be more help. One thing I'd suggest, is instead of Cygwin try using VMWare player or Virtualbox. Both are free and provide a native virtual Linux environment, so much easier to manage then Cygwin. On Fri, Aug 3, 2012 at 5:45 AM, veryblues_cn lhn...@gmail.com wrote: Hi X3C TECH

Re: No output to solr, no running error, with my install and config of nutch

2012-08-01 Thread X3C TECH
AHHH try replacing current schema with the one from the SVN, if I'm not mistaken they just updated the schema due to this issue On Wed, Aug 1, 2012 at 1:52 AM, veryblues_cn lhn...@gmail.com wrote: Hi X3C TECH, There is an error thrown if I replace the solr's schema.xml with nutch's schema.xml

Re: No output to solr, no running error, with my install and config of nutch

2012-08-01 Thread X3C TECH
http://svn.apache.org/viewvc/nutch/branches/branch-1.5.1/conf/schema.xml On Wed, Aug 1, 2012 at 12:22 PM, X3C TECH t...@x3chaos.com wrote: AHHH try replacing current schema with the one from the SVN, if I'm not mistaken they just updated the schema due to this issue On Wed, Aug 1, 2012 at 1

Re: Nutch 1.5.1 Solr 3.6.1 Error

2012-07-31 Thread X3C TECH
For Cygwin, make sure the Priveleged User is set as administrator in your Windows setup On Mon, Jul 30, 2012 at 10:19 PM, veryblues_cn lhn...@gmail.com wrote: Hi, I just installed the same version Nutch and solr in cgwin, but I met an error like that: *ERROR security.UserGroupInformation -

Re: Why won't my crawl ignore these urls?

2012-07-31 Thread X3C TECH
Not sure if i'm correct, but worth a try In your regex -^http://([a-z0-9\-A-Z]*\.)*www.elaweb.org.uk/resources/type.aspx.* to me that looks like the . at the end of aspx shouldn't be there, as it would ignore aspx.php or aspx.? let's say, but not aspx?. On Tue, Jul 31, 2012 at 3:59 AM, Ian Piper

Re: No output to solr, no running error, with my install and config of nutch

2012-07-31 Thread X3C TECH
What's the actual error thrown? or is it just completing without anything added to the index? On Tue, Jul 31, 2012 at 3:30 AM, veryblues_cn lhn...@gmail.com wrote: My environment is win7,Tomcat 6.0 ,cygwin,nutch 1.5.1,solr 3.60 I downloaded both the nutch-1.5.1 src and bin zip package ,hadoop

Re: No output to solr, no running error, with my install and config of nutch

2012-07-31 Thread X3C TECH
check the hadoop log and see if there is any obvious exception errors... also try running individual commands and see if something strange happens, i.e. parse says nothing to parse. this may help for command usage http://wiki.apache.org/nutch/NutchTutorial#A6._Integrate_Solr_with_Nutch also I

Nutch 2.0 Solr 4.0 Alpha

2012-07-29 Thread X3C TECH
Hello, Has anyone been successful in hooking up Nutch 2 with Solr4? I seem to have my config screwed up somehow. I've added the Nutch fields to Solr's example schema and changed the field type from text' to text_general However when I index, I get the message SolrIndexerJob:starting

Re: Nutch 2.0 Solr 4.0 Alpha

2012-07-29 Thread X3C TECH
Forgot to do Specs VMWare Machine with CentOS 6.3 On Sun, Jul 29, 2012 at 1:53 PM, X3C TECH t...@x3chaos.com wrote: Hello, Has anyone been successful in hooking up Nutch 2 with Solr4? I seem to have my config screwed up somehow. I've added the Nutch fields to Solr's example schema

Re: Nutch 2.0 Solr 4.0 Alpha

2012-07-29 Thread X3C TECH
- From: X3C TECH t...@x3chaos.com To: user user@nutch.apache.org Sent: Sun, Jul 29, 2012 10:58 am Subject: Re: Nutch 2.0 Solr 4.0 Alpha Forgot to do Specs VMWare Machine with CentOS 6.3 On Sun, Jul 29, 2012 at 1:53 PM, X3C TECH t...@x3chaos.com wrote: Hello, Has anyone been successful

Re: Nutch 2.0 Solr 4.0 Alpha

2012-07-29 Thread X3C TECH
:53 PM, X3C TECH t...@x3chaos.com wrote: Hello, Has anyone been successful in hooking up Nutch 2 with Solr4? I seem to have my config screwed up somehow. I've added the Nutch fields to Solr's example schema and changed the field type from text' to text_general However when I index, I get

Re: Nutch 2.0 Solr 4.0 Alpha

2012-07-29 Thread X3C TECH
Lewis, I just ran the crawl command with the solrindex argument. I'll try to rerun it with single commands On Sun, Jul 29, 2012 at 2:53 PM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: Sorry On Sun, Jul 29, 2012 at 7:53 PM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote:

Re: Nutch 2.0 Solr 4.0 Alpha

2012-07-29 Thread X3C TECH
the SolrIndexerJob alone (outside the Crawler tool)? Please do so and post commandline / nutch config / logs. Cheers, Mathijs On Jul 29, 2012, at 20:33 , X3C TECH t...@x3chaos.com wrote: Hi Lewis, Thanks for below, I just ran it on the new schema. Funny thing is in Solr's example/logs

Re: Nutch 2.0 Solr 4.0 Alpha

2012-07-29 Thread X3C TECH
the system throws a runtime error. Should I just add it to schema? and if so what would be the definition, i.e. text_general, indexed? On Sun, Jul 29, 2012 at 2:56 PM, X3C TECH t...@x3chaos.com wrote: OK I reran it with the commands inject,generate,fetch,parse,updatedb,solrindex solrindex -all