NullPointerException mapred

2009-04-10 Thread MyD
Hi @ all, I am using the newest trunk source code. I get every time this error msg: 2009-04-10 20:08:23,816 INFO indexer.Indexer - Indexer: done 2009-04-10 20:08:23,817 INFO indexer.DeleteDuplicates - Dedup: starting 2009-04-10 20:08:23,818 INFO indexer.DeleteDuplicates - Dedup: adding

[jira] Commented: (NUTCH-186) mapred-default.xml is over ridden by nutch-site.xml

2008-01-17 Thread Andrzej Bialecki (JIRA)
moved to Hadoop. mapred-default.xml is over ridden by nutch-site.xml --- Key: NUTCH-186 URL: https://issues.apache.org/jira/browse/NUTCH-186 Project: Nutch Issue Type: Bug Affects Versions

[jira] Resolved: (NUTCH-186) mapred-default.xml is over ridden by nutch-site.xml

2008-01-17 Thread Andrzej Bialecki (JIRA)
mapred-default.xml is over ridden by nutch-site.xml --- Key: NUTCH-186 URL: https://issues.apache.org/jira/browse/NUTCH-186 Project: Nutch Issue Type: Bug Affects Versions: 0.8

Usage of mapred-default.xml is deprecated in hadoop0.15.0

2007-11-08 Thread Ned Rockson
Right now I put default mapred data that I may override in mapred-default.xml rather than in nutch-site.xml (or hadoop-site.xml) because I can't override anything in *-site.xml. Has the new version of hadoop changed this and that's why mapred-default.xml is deprecated? If not, where should

[jira] Closed: (NUTCH-209) include nutch jar in mapred jobs

2006-10-24 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-209?page=all ] Sami Siren closed NUTCH-209. include nutch jar in mapred jobs Key: NUTCH-209 URL: http://issues.apache.org/jira/browse/NUTCH-209

Re: mapred question

2006-05-02 Thread Doug Cutting
[EMAIL PROTECTED] wrote: As far as we understood from MapRed documentation all reduce tasks must be launched after last map task is finished e.g map and reduce must not work simultaneously. But often in logs we see such records: map 80%, reduce 10% and many more records where map is less

mapred branch

2006-04-10 Thread Anton Potehin
Where now placed mapred branch of nutch ?

Re: mapred branch

2006-04-10 Thread Piotr Kosiorowski
Anton Potehin wrote: Where now placed mapred branch of nutch ? it is developed in trunk now. P.

[jira] Commented: (NUTCH-209) include nutch jar in mapred jobs

2006-02-09 Thread Andrzej Bialecki (JIRA)
... A wild idea: could we put this jar on NDFS, sorry, DFS, implement a DFSClassLoader and point all the tasks' classloaders there? Eventually, when DFS grows the locality mechanism, we would avoid transmitting this data unless it's really changed... include nutch jar in mapred jobs

[jira] Resolved: (NUTCH-209) include nutch jar in mapred jobs

2006-02-09 Thread Doug Cutting (JIRA)
you're asking for. include nutch jar in mapred jobs Key: NUTCH-209 URL: http://issues.apache.org/jira/browse/NUTCH-209 Project: Nutch Type: Improvement Versions: 0.8-dev Reporter: Doug Cutting Priority: Minor

[jira] Commented: (NUTCH-209) include nutch jar in mapred jobs

2006-02-09 Thread Doug Cutting (JIRA)
could also try to make the job jar smaller, e.g., by only including enabled plugins. include nutch jar in mapred jobs Key: NUTCH-209 URL: http://issues.apache.org/jira/browse/NUTCH-209 Project: Nutch Type: Improvement Versions

[jira] Commented: (NUTCH-209) include nutch jar in mapred jobs

2006-02-09 Thread Andrzej Bialecki (JIRA)
it would be a small change. Re: including only enabled plugins: potentially you would have to build a custom jar for each job, because the list of active plugins depends on the job's Configuration. I think I would prefer the replication trick. include nutch jar in mapred jobs

mapred: config parameters

2006-01-31 Thread Michael Nebel
Hi, the last days I gave the mapred-branch a try and I was impressed! But I still have a problem with the incremental crawling. My setup: I have 4 boxes (1x namenode/jobtracker - 3x datanode/tasktracker). Running one round of crawling consists out of the steps: - generate (I set a limit

[jira] Commented: (NUTCH-186) mapred-default.xml is over ridden by nutch-site.xml

2006-01-25 Thread Gal Nitzan (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-186?page=comments#action_12364010 ] Gal Nitzan commented on NUTCH-186: -- After reading the code and I think I figured it... :) The issue of the mapred-default.xml is totaly misleading. Actualy

[jira] Commented: (NUTCH-186) mapred-default.xml is over ridden by nutch-site.xml

2006-01-24 Thread Andrzej Bialecki (JIRA)
and use a pair of mapred-default/mapred-site.xml ... It would be more understandable for users. mapred-default.xml is over ridden by nutch-site.xml --- Key: NUTCH-186 URL: http://issues.apache.org/jira/browse/NUTCH-186

[jira] Commented: (NUTCH-186) mapred-default.xml is over ridden by nutch-site.xml

2006-01-24 Thread Gal Nitzan (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-186?page=comments#action_12363903 ] Gal Nitzan commented on NUTCH-186: -- ok, JobConf extends NutchConf and in the (JobConf) constructor it adds the mapred-default.xml resource. the call to add resource

[jira] Updated: (NUTCH-186) mapred-default.xml is over ridden by nutch-site.xml

2006-01-24 Thread Gal Nitzan (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-186?page=all ] Gal Nitzan updated NUTCH-186: - Attachment: myBeautifulPatch.patch the patch attached mapred-default.xml is over ridden by nutch-site.xml

Seperating mapred/ndfs and nutch search engine

2006-01-15 Thread Dominik Friedrich
While going through the nutch sources for creating an updated nutch-default.xml I got some ideas. Currently the mapred/ndfs engine is just seen as one part of nutch and so it makes sense to have mapred/ndfs properties set in the same file as the rest of the nutch config properties

Re: mapred crawling exception - Job failed!

2006-01-06 Thread Lukas Vlcek
Huh... anybody interested in this? Normally I would be so pushy but to me it seems that Nutch dies if it meets word document which can't be parsed. This seems like a serious issue to me. Or did I overlooked something important/fundamental? Lukas On 1/6/06, Lukas Vlcek [EMAIL PROTECTED] wrote:

Re: mapred crawling exception - Job failed!

2006-01-05 Thread Andrzej Bialecki
Lukas Vlcek wrote: How can I learn that? What I do is running regular one-step command [/bin/nutch crawl] In that case your nutch-default.xml / nutch-site.xml decides, there is a boolean option there. If you didn't change this, then it defaults to true (i.e. your fetcher is parsing the

Re: mapred crawling exception - Job failed!

2006-01-05 Thread Lukas Vlcek
Hi, I found the reason of that exception! If you look into my crawl.log carefully then you notice these lines: 060104 213608 Parsing [http://220.000.000.001/otd_04_Detailed_Design_Document.doc] with [EMAIL PROTECTED] 060104 213609 Unable to successfully parse content

Re: mapred crawling exception - Job failed!

2006-01-04 Thread Gal Nitzan
Yes it was fixed. just update your code from trunk. On Wed, 2006-01-04 at 08:51 +0100, Andrzej Bialecki wrote: Lukas Vlcek wrote: Hi, I am trying to use the latest nutch-trunk version but I am facing unexpected Job failed! exception. It seems that all crawling work has been already done

Re: mapred crawling exception - Job failed!

2006-01-04 Thread Lukas Vlcek
Hmmm... If I am looking correctly into my local SVN copy then I see I last updated yesterday - thus I have revision 365850 (Update of HTTPClient to v3.0). So this should be already fixed... :-( Andrzej, since you did probably the fix, is there anything special I should check to be sure I have

Re: mapred crawling exception - Job failed!

2006-01-04 Thread Byron Miller
Fixed in the copy i run as i've been able to get my 100k pages indexed without getting that error. -byron --- Andrzej Bialecki [EMAIL PROTECTED] wrote: Lukas Vlcek wrote: Hi, I am trying to use the latest nutch-trunk version but I am facing unexpected Job failed! exception. It seems

Re: mapred crawling exception - Job failed!

2006-01-04 Thread Lukas Vlcek
Thanks guys! I really didn't have the latest copy... L. On 1/4/06, Byron Miller [EMAIL PROTECTED] wrote: Fixed in the copy i run as i've been able to get my 100k pages indexed without getting that error. -byron --- Andrzej Bialecki [EMAIL PROTECTED] wrote: Lukas Vlcek wrote: Hi,

Re: mapred crawling exception - Job failed!

2006-01-04 Thread Lukas Vlcek
I gave it a next try this night and I still have troubles. This is the very end of my log (full version is attached) and you can see another nasty exception: ... 060104 213644 map 100% 060104 213645 Optimizing index. java.lang.NullPointerException: value cannot be null at

Re: mapred crawling exception - Job failed!

2006-01-04 Thread Andrzej Bialecki
Lukas Vlcek wrote: I gave it a next try this night and I still have troubles. This is the very end of my log (full version is attached) and you can see another nasty exception: Do you use the Fetcher in parsing or non-parsing mode, i.e. do you run a ParseSegment as a separate step? --

mapred crawling exception - Job failed!

2006-01-03 Thread Lukas Vlcek
property namefetcher.verbose/name valuetrue/value descriptionIf true, fetcher will log more verbosely./description /property property namemapred.local.dir/name value/home/lukas/nutch/mapred/local/value descriptionThe local directory where MapReduce stores intermediate data files

Re: mapred crawling exception - Job failed!

2006-01-03 Thread Lukas Vlcek
Note: I mistakenly used nutch-user email for reply-to value. Feel free to reply to either nutch-dev or nutch-user as I monitor both of them :-) Anyway can anybody tell me how I can easily change reply-to value in gmail? I am struggling with this all the time especially when replying to multiple

Re: mapred crawling exception - Job failed!

2006-01-03 Thread Andrzej Bialecki
Lukas Vlcek wrote: Hi, I am trying to use the latest nutch-trunk version but I am facing unexpected Job failed! exception. It seems that all crawling work has been already done but some threads are hunged which results into exception after some timeout. This was fixed (or should be fixed

[jira] Closed: (NUTCH-121) SegmentReader for mapred

2005-12-29 Thread Andrzej Bialecki (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-121?page=all ] Andrzej Bialecki closed NUTCH-121: --- Fix Version: 0.8-dev Resolution: Fixed Assign To: Andrzej Bialecki Commited. Thanks! SegmentReader for mapred

Latest version of Mapred

2005-12-19 Thread Rafi Iz
Hi all, I am currently working with Nutch 0.7.1, I want to start using the mapred, any ideas where I can find the latest version. B.T.W I looked at the path: http://svn.apache.org/repos/asf/lucene/nutch/branches/ but the only directory that exists there is branch-0.7/ Thanks, Raffi

Re: Latest version of Mapred

2005-12-19 Thread Stefan Groschupf
mapred is now trunk... Am 19.12.2005 um 18:46 schrieb Rafi Iz: Hi all, I am currently working with Nutch 0.7.1, I want to start using the mapred, any ideas where I can find the latest version. B.T.W I looked at the path: http://svn.apache.org/repos/asf/lucene/ nutch/branches/ but the only

Re: Latest version of Mapred

2005-12-19 Thread Rafi Iz
Thanks for the fast response, Do you know where I can find a compressed version? Thanks, Rafi From: Stefan Groschupf [EMAIL PROTECTED] Reply-To: nutch-dev@lucene.apache.org To: nutch-dev@lucene.apache.org Subject: Re: Latest version of Mapred Date: Mon, 19 Dec 2005 19:00:29 +0100 mapred

Re: Latest version of Mapred

2005-12-19 Thread Jérôme Charron
Thanks for the fast response, Do you know where I can find a compressed version? Here are the nightly builds: http://cvs.apache.org/dist/lucene/nutch/nightly/ Regards Jérôme -- http://motrech.free.fr/ http://www.frutch.org/

mapred merge to trunk

2005-12-15 Thread Doug Cutting
Sami Siren wrote: +1. I think this is good time to merge now as the mapred is fully usable. Barring objections, I will do this tomorrow morning, Pacific time. Doug

mapred crawl

2005-11-23 Thread Anton Potehin
dedup.tmp After each iteration we produce new segment and may use it for search. Now we try mapred. How we can use crawl in similar way? We need results in process, but not in the end of crawling (since is very long process - weeks).

Re: svn commit: r348431 - in /lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl: CrawlDatum.java CrawlDbReader.java

2005-11-23 Thread Andrzej Bialecki
Sami Siren wrote: + if (k.contains(score)) { Since: 1.5 Ah, indeed. Fixed - thanks! -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix,

Re: svn commit: r348431 - in /lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl: CrawlDatum.java CrawlDbReader.java

2005-11-23 Thread Doug Cutting
[EMAIL PROTECTED] wrote: Implement a reader for CrawlDB, loosely inspired by NUTCH-114 (thanks Stefan!). The reader offers similar functionality to the classic readdb command. This looks great! Thanks, Andrzej. I just ran it on a 50M page crawl. It took longer than I expected. The reduce

Re: svn commit: r348431 - in /lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl: CrawlDatum.java CrawlDbReader.java

2005-11-23 Thread Doug Cutting
Doug Cutting wrote: I just ran it on a 50M page crawl. FYI, here's the output: 051123 191703 TOTAL urls: 167780785 051123 191703 avg score:1.152 051123 191703 max score:47357.137 051123 191703 min score:1.0 051123 191703 retry 0: 167780785 051123 191703 status 1

Re: problem with inject url on mapred

2005-11-16 Thread Doug Cutting
[EMAIL PROTECTED] wrote: Yes, problem in negative progress percentages. Is /usr/root/seeds/urls the same file on all hosts? How big is it? Doug

Re: problem with inject url on mapred

2005-11-10 Thread Paul Baclace
[regarding mapred ver 0.8] Anton Potehin wrote: I tried to launch mapred on 2 machines: 192.168.0.250 and 192.168.0.111. 051123 053136 task_m_xaynqo -14885.741% /user/root/seeds/urls:31+31 Please help me to find out what the problem is? And what I did wrong? Is the problem the negative

Re: mapred bug -- bad part calculation?

2005-11-09 Thread Paul Baclace
Rod Taylor wrote: The attached patches for Generator.java and Injector.java allow a specific temporary directory to be specified. This gives Nutch the full path to these temporary directories and seems to fix the No input directories issue when using a local filesystem with multiple task

Re: mapred bug -- bad part calculation?

2005-11-08 Thread Doug Cutting
Rod Taylor wrote: The attached patches for Generator.java and Injector.java allow a specific temporary directory to be specified. This gives Nutch the full path to these temporary directories and seems to fix the No input directories issue when using a local filesystem with multiple task

Re: mapred bug -- bad part calculation?

2005-11-07 Thread Rod Taylor
/nutch-default.xml 051107 091256 parsing file:/opt/nutch-0.8_7/conf/mapred-default.xml 051107 091256 parsing file:/opt/nutch-0.8_7/conf/nutch-site.xml 051107 091256 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml 051107 091256 parsing file:/opt/nutch-0.8_7/conf/nutch-site.xml 051107 091256 Client

Re: mapred bug -- bad part calculation?

2005-11-07 Thread Massimo Miccoli
Hello Nutch devs, I have same problems. I have 10 hosts and one master. For each host I have a datanode and tasktracer. My mapred conf is 100 maps and 25 reducers. Belove the logs with errors. Thanks 051107 144101 task_r_pd3ybk 0.224% reduce copy 051107 144102 Moving bad file /tmp

Re: mapred bug -- bad part calculation?

2005-11-07 Thread Rod Taylor
urls due for fetch. 051107 091256 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml 051107 091256 parsing file:/opt/nutch-0.8_7/conf/mapred-default.xml 051107 091256 parsing file:/opt/nutch-0.8_7/conf/nutch-site.xml 051107 091256 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml 051107

Re: mapred bug -- bad part calculation?

2005-11-07 Thread Paul Baclace
Rod Taylor wrote: The attached patches for Generator.java and Injector.java allow a specific temporary directory to be specified. This gives Nutch the full path to these temporary directories and seems to fix the No input directories issue when using a local filesystem with multiple task

Re: mapred bug -- bad part calculation?

2005-11-07 Thread Rod Taylor
On Mon, 2005-11-07 at 17:26 -0800, Paul Baclace wrote: Rod Taylor wrote: The attached patches for Generator.java and Injector.java allow a specific temporary directory to be specified. This gives Nutch the full path to these temporary directories and seems to fix the No input directories

Re: mapred bug -- bad part calculation?

2005-11-07 Thread Paul Baclace
Rod Taylor wrote: NDFS accomplishes the above path finding by auto-prefixing any path not beginning with / with a /user/$USER. I didn't think it was appropriate for LocalFileSystem.java to be mucking around trying to automatically adjust paths to what the user may have intended. Grep-ing for

Re: mapred bug -- bad part calculation?

2005-11-07 Thread Rod Taylor
On Mon, 2005-11-07 at 18:12 -0800, Paul Baclace wrote: Rod Taylor wrote: NDFS accomplishes the above path finding by auto-prefixing any path not beginning with / with a /user/$USER. I didn't think it was appropriate for LocalFileSystem.java to be mucking around trying to automatically

Re: mapred bug -- bad part calculation?

2005-11-05 Thread Stefan Groschupf
I tried running one datanode per machine connecting back to the same SAN but it seemed pretty clunky. SAN in general is a bad idea. A SAN is too slow for a serious setup. ... and it is the single point of failure... Better use many local hdd. Stefan

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Doug Cutting
Rod Taylor wrote: Every segment that I fetch seems to be missing a part when stored on the filesystem. The stranger thing is it is always the same part (very reproducible). This sounds strange. Are the datanode errors always on the same host? How many hosts are you running this on? Doug

Re: mapred questions

2005-11-04 Thread Doug Cutting
Ken van Mulder wrote: First is that the fetcher slows down over time and continues to use more and more memory as it goes (which I think is eventually hanging the process). What parser plugins do you have enabled? These are usually the culprit. Try using 'kill -QUIT' to see what various

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Rod Taylor
On Fri, 2005-11-04 at 13:43 -0800, Doug Cutting wrote: Rod Taylor wrote: Every segment that I fetch seems to be missing a part when stored on the filesystem. The stranger thing is it is always the same part (very reproducible). This sounds strange. Are the datanode errors always on the

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Rod Taylor
On Fri, 2005-11-04 at 13:43 -0800, Doug Cutting wrote: Rod Taylor wrote: Every segment that I fetch seems to be missing a part when stored on the filesystem. The stranger thing is it is always the same part (very reproducible). This sounds strange. Are the datanode errors always on the

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Doug Cutting
Rod Taylor wrote: There is only a single datanode and there are 20 hosts. That's a lot of load on one datanode. I typically run a datanode on every host, accessing the local drives on that host. Doug

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Doug Cutting
Rod Taylor wrote: I tried running one datanode per machine connecting back to the same SAN but it seemed pretty clunky. A crash of any datanode would take down the entire system (no data replication since it's a common data-store in the end). Reducing it to a single datanode did not have this

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Rod Taylor
in the end). Reducing it to a single datanode did not have this impact. Why use NDFS at all? Why not just mount the SAN on all hosts? You're not using NDFS as a distributed file system, but rather as a centralized file system. I was unable to make the mapred branch work by using 'local

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Rod Taylor
was unable to make the mapred branch work by using 'local' as the filesystem and having more than one tasktracker. Tasktrackers were unable to complete any work, although it was quite a while ago when I last tried (September). Here you go. local filesystem and a single job tracker on another

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Doug Cutting
Rod Taylor wrote: Here you go. local filesystem and a single job tracker on another machine. When the tasktracker and jobtracker are on the same box there isn't a problem. When they are on different machines it runs into issues. This is using mapred.local.dir on the local machine (not sharedd

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Rod Taylor
235806 Lost connection to JobTracker [sbider5.sitebuildit.com/192.168.100.14:5464]. Retrying... 051104 235811 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml 051104 235811 parsing file:/opt/nutch-0.8_7/conf/mapred-default.xml 051104 235811 parsing /home/sitesell/local/taskTracker/task_r_mdnul7

mapred bug -- bad part calculation?

2005-11-03 Thread Rod Taylor
Sources are from October 31st. Sun Standard Edition 1.5.0_02-b09 for amd64 Every segment that I fetch seems to be missing a part when stored on the filesystem. The stranger thing is it is always the same part (very reproducible). If I have mapred.reduce.tasks set to 20, the hole is at part 13.

Re: mapred bug -- bad part calculation?

2005-11-03 Thread Rod Taylor
I forgot to provide this earlier. Here is nutch ndfs -ls output for the directory structure of a segment with a failed part-00013. [EMAIL PROTECTED] ~]$ /opt/nutch/bin/nutch ndfs -ls /opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133 051103 162002 parsing

Re: mapred patch for improved error message and some javadoc comments

2005-09-19 Thread Doug Cutting
Paul Baclace wrote: Here is a patch for improving the error message that is displayed when an intranet crawl commandline has a file instead of a directory of files containing URLs. I have committed this to the mapred branch. Thanks, Paul! Doug

Re: merge mapred to trunk

2005-09-15 Thread Doug Cutting
I will postpone the merge of the mapred branch into trunk until I have a chance to (a) add some MapReduce documentation; and (b) implement MapReduce-based dedup. Doug Doug Cutting wrote: Currently we have three versions of nutch: trunk, 0.7 and mapred. This increases the chances

Re: [Nutch-cvs] svn commit: r280368 - /lucene/nutch/branches/mapred/src/java/org/apache/nutch/fs/TestClient.java

2005-09-12 Thread Andrzej Bialecki
[EMAIL PROTECTED] wrote: Author: cutting Date: Mon Sep 12 10:03:00 2005 New Revision: 280368 URL: http://svn.apache.org/viewcvs?rev=280368view=rev Log: Change so that -du and -ls commands work with zero arguments. Come to think of that... Shouldn't the enigmatic TestClient be renamed to

nutch/mapred tutorial

2005-09-07 Thread Earl Cahill
howdy, I have been looking around for a nutch/mapred tutorial and haven't had much luck. I found this one http://lucene.apache.org/nutch/tutorial.html which did help me get a crawl going on trunk, but no such luck in branches/mapred. I set the urls file and the filter in the same way that I

Re: nutch/mapred tutorial

2005-09-07 Thread Fredrik Andersson
--- Earl Cahill [EMAIL PROTECTED] wrote: howdy, I have been looking around for a nutch/mapred tutorial and haven't had much luck. I found this one http://lucene.apache.org/nutch/tutorial.html which did help me get a crawl going on trunk, but no such luck in branches/mapred

Re: To mapred or not

2005-09-01 Thread Stefan Groschupf
In some cases, though, focused crawling requirements may require extra data to be stored, which is not useful for whole-web, for example, storing a url's parent and seed url and its depth (essential for crawl scopes). Sounds like meta data for a page. :) Some time ago I submit a patch to

Re: merge mapred to trunk

2005-08-31 Thread Piotr Kosiorowski
Doug Cutting wrote: Currently we have three versions of nutch: trunk, 0.7 and mapred. This increases the chances for conflicts. I would thus like to merge the mapred branch into trunk soon. The soonest I could actually start this is next week. Are there any objections? Doug +1 P.

Re: merge mapred to trunk

2005-08-31 Thread ogjunk-nutch
Currently we have three versions of nutch: trunk, 0.7 and mapred. This increases the chances for conflicts. I would thus like to merge the mapred branch into trunk soon. The soonest I could actually start this is next week. Are there any objections? I, too, am looking forward

Re: merge mapred to trunk

2005-08-31 Thread Doug Cutting
[EMAIL PROTECTED] wrote: I, too, am looking forward to this, but I am wondering what that will do to Kelvin Tan's recent contribution, especially since I saw that both MapReduce and Kelvin's code change how FetchListEntry works. If merging mapred to trunk means losing Kelvin's changes, then I

Re: merge mapred to trunk

2005-08-31 Thread ogjunk-nutch
mapred to trunk means losing Kelvin's changes, then I suggest one of Nutch developers evaluates Kelvin's modifications and, if they are good, commits them to trunk, and then makes the final pre-mapred release (e.g. release-0.8). It won't lose Kelvin's patch: it will still be a patch to 0.7

Re: merge mapred to trunk

2005-08-31 Thread Kelvin Tan
.  If merging mapred to trunk means losing  Kelvin's changes, then I suggest one of Nutch developers  evaluates Kelvin's modifications and, if they are good, commits  them to trunk, and then makes the final pre-mapred release (e.g.  release-0.8).  It won't lose Kelvin's patch: it will still be a patch

Re: [mapred] Possible bug, static primatives holding config values?

2005-08-30 Thread Doug Cutting
tasks, but have run into issues with the tasks timing out. I attempted to both override the mapred.tasks.timeout option in mapred-default.xml and in the actual code for my Mapper class, but my timeout durations remained steady at the default 10 minutes. I looked at TaskTracker and I see

(mapred branch) Job.xml as a directory instead of a file, other issues.

2005-08-16 Thread Jeremy Bensley
I have been attempting to get the mapred branch version of the crawler working and have hit some snags. First, I have observed the same behavior as a previous poster from yesterday who, instead of specifying a file for the URLs to be read from, must now specify a directory (full path) to which

Re: mapred

2005-08-15 Thread Doug Cutting
Jay Pound wrote: is the org.apache.nutch.crawl package a part of the nightly builds? No. Nightly builds are from trunk. The mapred code is in a separate branch in subversion. After the 0.7 release, when the mapred branch is folded into trunk, then it will be in nightly builds. Until

Re: MapRed - Injector - urlDir - Format?

2005-08-15 Thread Doug Cutting
Fuad Efendi wrote: Which parameter should I pass to Crawl? It should be directory containing smth. in which format? As before, inject takes a flat text files of urls, one per line. If you wish to inject DMOZ urls, there is now a utility main() that will convert the DMOZ file to such a file.

RE: MapRed - Injector - urlDir - Format?

2005-08-15 Thread Fuad Efendi
Thanks, It works now, I pass a folder to Crawl containing plain text file with URLs. I am testing, and I pass single URL. At some point I have: 050815 162137 parsing \tmp\nutch\mapred\local\job_q3s4ai.xml 050815 162137 parsing file:/C:/workspace/MapRed/conf/nutch-site.xml java.io.IOException

mapred

2005-08-12 Thread webmaster
I need some help with how to use mapred, what are the commands to use with it? Thanks, Jay Pound -- Pound Web Hosting www.poundwebhosting.com (607)-435-3048

mapred question

2005-08-06 Thread Jay Pound
how would I setup mapred for smp machines, I understand it will split up big jobs like indexing or updating the db into a bunch of chunks to be processed by separate machines, I have machines that are multiple processor machines that I want to test this with internally, makes sense to utilize

mapred branch Revision 226742

2005-08-01 Thread Yitao Duan
I saw this revision fixed something that has been puzzling me. However, if the fix is applied, NDFS can't handle 0-byte files anymore. It will simply hang. I didn't look into the code yet. Maybe this case is something that needs to be handled specially? Yitao

NDFS Bug, Mapred from SVN - Tokenizer and New Line Error

2005-07-28 Thread Jon Shoberg
I'm trying to start a NDFS datanode and keep getting the following error: [EMAIL PROTECTED] nutchmapre]$ bin/nutch datanode 050728 213401 10 parsing file:/usr/local/nutchmapre/conf/nutch-default.xml 050728 213402 10 parsing file:/usr/local/nutchmapre/conf/nutch-site.xml 050728 213402 10 Opened