Re: Newbie question - syntax error on bin/nutch

2006-12-15 Thread Jonathan H
Im having this same problem, I edit the text as you suggested but i got the same error further down. How do I install the utility uname in cygwin? Ive looked online for this, but cant seem to find anything useful. Thanks in advance.

Re: subcollections

2006-12-15 Thread liv
Thanks for your quick answer. However I tried this several times and it doesn't work. Here is how I do: - I fetch/index 2 sites, having subcolelctions.xml file as follws: ?xml version=1.0 encoding=UTF-8? subcollections subcollection nametest1/name

Re: errors with parsing and indexing

2006-12-15 Thread Zaheed Haque
Hi: Please attach the patch with a jira issue my mail account give me trouble with attachment. Kind regards Zaheed On 12/14/06, Doğacan Güney [EMAIL PROTECTED] wrote: Doğacan Güney wrote: Hi, After hadoop-0.9.1, parsing and indexing doesn't seem to work. If you parse while fetching then

Re: classifying content

2006-12-15 Thread Eelco Lempsink
Hey Chad, On 7-dec-2006, at 18:52, chad savage wrote: We would like to organize information into a hierarchical category system. It's all general web content(html from the web). Yes, there are a number of references to varying techniques on the net (scientific papers, theoretical,

Re: pagerank implementation

2006-12-15 Thread Andrzej Bialecki
Mike Smith wrote: Hi, I remember nutch 0.7 used to have a scoring method similar to Google Pagerank by analysing links globally. But, since that was computationally intensive it was replaced by OPIC scoring. Since, Nutch is completely moved to Map/Red structure now, is it worth it to port

/tmp/hadoop filled up

2006-12-15 Thread Robin Haswell
Hey I just did a mergesegs on my 29GB segments dir - the /tmp/hadoop folder grew to 66GB before the disk /tmp is on filled up and did Bad Things. Can anyone suggest why this happened, how to prevent it (or better) tell Hadoop to use a tmpdir which is on a bigger disk? Thanks -Rob

Re: /tmp/hadoop filled up

2006-12-15 Thread Sean Dean
You can tell Hadoop to use a different temp directory by placing this in your hadoop-site.xml file, replace the value mount with your preferred: property namehadoop.tmp.dir/name value/usr/local/example/mount/hadoop-${user.name}/value descriptionHadoop temp directory/description /property

Error on convert to 0.9 during mergesegs step

2006-12-15 Thread RP
HELP - Error on migrate from 0.8 to 0.9 following the procedures outlined in the Wiki. This is at the mergesegs step, crawldb convert went fine - trying to deal with segments now as I'm on a slow connection and would be painful to re-crawl. Anyone seen this or have any ideas on how to get

Re: Newbie question - syntax error on bin/nutch

2006-12-15 Thread Wilson, Scott
Kevin, and Jonathon, If you install Cygwin I believe that uname is already there. I was having the same problems. From what I was able to find out/understand and piece together from various sources - the /bin/nutch file contains DOS code which will not run in a UNIX environment. You have

Re: Error on convert to 0.9 during mergesegs step

2006-12-15 Thread Andrzej Bialecki
RP wrote: 2006-12-15 00:52:43,895 WARN mapred.LocalJobRunner - job_dokmpz java.lang.NullPointerException at org.apache.hadoop.fs.LocalFileSystem.reportChecksumFailure(LocalFileSystem.java:380) at

Re: Error on convert to 0.9 during mergesegs step

2006-12-15 Thread RP
Pardon my ignorance, but where and how do I do this..?? Nothing in conf files that I can see as a switch rp Andrzej Bialecki wrote: RP wrote: 2006-12-15 00:52:43,895 WARN mapred.LocalJobRunner - job_dokmpz java.lang.NullPointerException at

Re: Error on convert to 0.9 during mergesegs step

2006-12-15 Thread Andrzej Bialecki
RP wrote: Pardon my ignorance, but where and how do I do this..?? Nothing in conf files that I can see as a switch Ah-ha! you can't see it because it's a _hidden_ switch ... *evil grin* Joking aside ... Yes, sorry for the confusion - it's a setting in the default Hadoop config, and

Re: error with trunk: linkdb copied to wrong dir

2006-12-15 Thread Sean Dean
I'm glad to report back that both of my fetches, one using Hadoop 7.1 and the other 9.1 were successful. It looks like the old Hadoop jar file was picked up by the java process before it was completely removed by the system. There are a few things that could have caused that, but none of them

Re: error with trunk: linkdb copied to wrong dir

2006-12-15 Thread Andrzej Bialecki
Sean Dean wrote: I'm glad to report back that both of my fetches, one using Hadoop 7.1 and the other 9.1 were successful. It looks like the old Hadoop jar file was picked up by the java process before it was completely removed by the system. There are a few things that could have caused that,

Re: Error on convert to 0.9 during mergesegs step

2006-12-15 Thread RP
Andrzej - Thanks that conf switch seemed to take care of it...! Any idea if the Hadoop native stuff will give any performance boost..?? rp Andrzej Bialecki wrote: RP wrote: Pardon my ignorance, but where and how do I do this..?? Nothing in conf files that I can see as a switch

Null Inlinks with rss redirect

2006-12-15 Thread sdeck
I am using the rss parser, and as the fetcher finds the urls within the rss feed, I am attaching the descriptions of each item and placing it into the anchors of the outlinks. when I get to the indexer, for any item url that did not redirect, I can get the Inlink and see the anchor text and