Re: VOTE: clustering plugin update for Rel 0.7

2005-08-15 Thread Dawid Weiss
Ok, let it be 0.7.1 then, Andrzej. No worries. D. Andrzej Bialecki wrote: Hi, This is yet another request for exception from the no-commit rule before release ... *sigh* Dawid Weiss reported that he prepared an updated version of the Carrot2 clustering plugin, which contains significant u

RE: MapRed - Injector - urlDir - Format?

2005-08-15 Thread Fuad Efendi
I downloaded code just a few hours ago... Windows XP, I have a Suse Linux 9.3 on another PC but I am too lazy... If nobody have such error under Linux - suppose I am wrong... I run this inside Eclipse, J2SE 1.4.2_08, with classpath links to CONF and directory containing Plugins. I need to check c

Re: MapRed - Injector - urlDir - Format?

2005-08-15 Thread Doug Cutting
Fuad Efendi wrote: It works now, I pass a folder to Crawl containing plain text file with URLs. I am testing, and I pass single URL. At some point I have: 050815 162137 parsing \tmp\nutch\mapred\local\job_q3s4ai.xml 050815 162137 parsing file:/C:/workspace/MapRed/conf/nutch-site.xml java.io.IOEx

RE: MapRed - Injector - urlDir - Format?

2005-08-15 Thread Fuad Efendi
Thanks, It works now, I pass a folder to Crawl containing plain text file with URLs. I am testing, and I pass single URL. At some point I have: 050815 162137 parsing \tmp\nutch\mapred\local\job_q3s4ai.xml 050815 162137 parsing file:/C:/workspace/MapRed/conf/nutch-site.xml java.io.IOException: Fil

Re: MapRed - Injector - urlDir - Format?

2005-08-15 Thread Doug Cutting
Fuad Efendi wrote: Which parameter should I pass to Crawl? It should be directory containing smth. in which format? As before, inject takes a flat text files of urls, one per line. If you wish to inject DMOZ urls, there is now a utility main() that will convert the DMOZ file to such a file.

MapRed - Injector - urlDir - Format?

2005-08-15 Thread Fuad Efendi
Which parameter should I pass to Crawl? It should be directory containing smth. in which format? Thanks, Fuad

[jira] Updated: (NUTCH-83) Release deliverable as zip

2005-08-15 Thread AJ Banck (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-83?page=all ] AJ Banck updated NUTCH-83: -- Attachment: build_zip.patch Patch of build.xml that adds a "package-zip" target and call this from the dist target. > Release deliverable as zip > -

Re: page ranking weights

2005-08-15 Thread Piotr Kosiorowski
Boost for the page maybe calculated in few different ways (and in few different places in nutch): 1) PageRank based score - calculated by "nutch analyze" command based on WebDB - during fetchlist generation scores from WebDB are stored in segment - indexing phase uses scor

[jira] Created: (NUTCH-83) Release deliverable as zip

2005-08-15 Thread AJ Banck (JIRA)
Release deliverable as zip -- Key: NUTCH-83 URL: http://issues.apache.org/jira/browse/NUTCH-83 Project: Nutch Type: Improvement Environment: Windows Reporter: AJ Banck Like Lucene, Nutch could be delivered as a .zip file so it can be us

Re: FW: Fetcher, ParseText, ParseData - need to modify

2005-08-15 Thread Piotr Kosiorowski
Hello, To change nutch standard html parsing the best place to start would be probably parse-html plugin. Regards Piotr Fuad Efendi wrote: 1. This is part of ParseText: Any Accessories Backup Devices & Media Barebone Systems Camcorder Accessories Camcorders Cases & External Enclosures CD / DVD

Re: VOTE: clustering plugin update for Rel 0.7

2005-08-15 Thread Jérôme Charron
-1 Maybe it would be a better idea to go for 0.7 branch and schedule a new > 0.7.1 release in short time? But +1 to include it in a 0.7.1 release !! Regards Jérôme -- http://motrech.free.fr/ http://www.frutch.org/

[jira] Closed: (NUTCH-80) Web UI only works when project deployed in root

2005-08-15 Thread Piotr Kosiorowski (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-80?page=all ] Piotr Kosiorowski closed NUTCH-80: -- Resolution: Invalid Closed according to reporter comment. > Web UI only works when project deployed in root > --

Re: VOTE: clustering plugin update for Rel 0.7

2005-08-15 Thread Piotr Kosiorowski
Hi, Maybe it would be a better idea to go for 0.7 branch and schedule a new 0.7.1 release in short time? It is difficult for me to judge if the patch I had not seen is good for release. So I would say 0 from me (if you think it is good enough I will not object). Regards, Piotr Andrzej Bialeck

Re: VOTE: clustering plugin update for Rel 0.7

2005-08-15 Thread Dawid Weiss
I'm not desperate to have it included. But it is stable and I don't think you need to worry. I checked that it works with the newest SVN code (I was mostly worried about the glue code -- the extension). I wouldn't worry much about the clustering part that's compiled into JARs -- these are un

Re: [Nutch-dev] turn on Log

2005-08-15 Thread Hasan Diwan
Michael: On Aug 14, 2005, at 3:54 PM, Michael Ji wrote: I saw several Log to debug the code, such as LOG.fine and LOG.warning; I wonder if the debugging text is output a particular log file. If so, which command that I could use to turn it on? Though nutch uses the Jakarta commons logging syst

RE: VOTE: clustering plugin update for Rel 0.7

2005-08-15 Thread Sébastien LE CALLONNEC
-1 (Not that my vote really matters but in fairness, I think that's too short a notice for you, guys. I am sure Dawid did a great job, but better be cautious IMHO). Sébastien. --- Andrzej Bialecki <[EMAIL PROTECTED]> a écrit : > Hi, > > This is yet another request for exception from the no-co

[jira] Updated: (NUTCH-82) Nutch Commands should run on Windows without external tools

2005-08-15 Thread AJ Banck (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-82?page=all ] AJ Banck updated NUTCH-82: -- Attachment: nutch.bat Update, remove obsolete comment > Nutch Commands should run on Windows without external tools > ---

[jira] Updated: (NUTCH-82) Nutch Commands should run on Windows without external tools

2005-08-15 Thread AJ Banck (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-82?page=all ] AJ Banck updated NUTCH-82: -- Attachment: nutch.bat nutch.bat to be placed in bin folder. This allows running all Nutch commandline tools from Windows. Tested for Windows 2000 and XP. > Nutch Commands s

[jira] Created: (NUTCH-82) Nutch Commands should run on Windows without external tools

2005-08-15 Thread AJ Banck (JIRA)
Nutch Commands should run on Windows without external tools --- Key: NUTCH-82 URL: http://issues.apache.org/jira/browse/NUTCH-82 Project: Nutch Type: New Feature Environment: Windows 2000 Reporter: AJ Banck

VOTE: clustering plugin update for Rel 0.7

2005-08-15 Thread Andrzej Bialecki
Hi, This is yet another request for exception from the no-commit rule before release ... *sigh* Dawid Weiss reported that he prepared an updated version of the Carrot2 clustering plugin, which contains significant updates and improvements. He suggests that it would be a good idea to include

Re: mapred

2005-08-15 Thread Doug Cutting
Jay Pound wrote: is the org.apache.nutch.crawl package a part of the nightly builds? No. Nightly builds are from trunk. The mapred code is in a separate branch in subversion. After the 0.7 release, when the mapred branch is folded into trunk, then it will be in nightly builds. Until then

Re: mapred

2005-08-15 Thread Jay Pound
is the org.apache.nutch.crawl package a part of the nightly builds? -J - Original Message - From: "Doug Cutting" <[EMAIL PROTECTED]> To: Sent: Monday, August 15, 2005 1:13 PM Subject: Re: mapred > webmaster wrote: > > I need some help with how to use mapred, what are the commands to use

FW: Fetcher, ParseText, ParseData - need to modify

2005-08-15 Thread Fuad Efendi
1. This is part of ParseText: Any Accessories Backup Devices & Media Barebone Systems Camcorder Accessories Camcorders Cases & External Enclosures CD / DVD Drives & Media Cooling Devices Digital Camera Accessories Digital Cameras - it is content of Dropdown, in HTML 2. I have some sub-text in P

Fetcher, ParseText, ParseData - need to modify

2005-08-15 Thread Fuad Efendi
I just catched some output from Fetcher.FetcherThread.outputPage(.) and noticed that some anchors are in a text, and some tags within a text too. LOG.info("ParseText = "+text); LOG.info("ParseData = "+ parseData); I'd like to modify behaviour, ParseText should contain subset o

Re: mapred

2005-08-15 Thread Doug Cutting
webmaster wrote: I need some help with how to use mapred, what are the commands to use with it? The mapred work is in progress and is not yet ready for production use. In the mapred branch most of the Nutch backend crawling and indexing commands have been rewritten in terms of MapReduce. The

Re: Indexing the whole WebDB or get Pages out of WebDB that are Indexed

2005-08-15 Thread Doug Cutting
Nils Hoeller wrote: is there a way to index the whole WebDB, which means the normal sites that have been indexed + the sites that are one depth deeper and so beeing only stored in the WebDB This is supposed to be possible, but I think no one has tried this in a while and fear it may no longe

[jira] Commented: (NUTCH-81) Webapp only works when deployed in root

2005-08-15 Thread AJ Banck (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-81?page=comments#action_12318805 ] AJ Banck commented on NUTCH-81: --- Description is inaccurate, the current index.jsp does a forward not a redirect. If doing forward is needed the xsl or xml must be changed. > Weba

[jira] Commented: (NUTCH-80) Web UI only works when project deployed in root

2005-08-15 Thread AJ Banck (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-80?page=comments#action_12318803 ] AJ Banck commented on NUTCH-80: --- Please close this case. Created by mistake. > Web UI only works when project deployed in root > --- >

[jira] Updated: (NUTCH-81) Webapp only works when deployed in root

2005-08-15 Thread AJ Banck (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-81?page=all ] AJ Banck updated NUTCH-81: -- Attachment: nutch_indexjsp_redirect.txt Possible solution for working from a non-root application. This by doing a redirect with the language path. This changes the URL in t

[jira] Created: (NUTCH-81) Webapp only works when deployed in root

2005-08-15 Thread AJ Banck (JIRA)
Webapp only works when deployed in root --- Key: NUTCH-81 URL: http://issues.apache.org/jira/browse/NUTCH-81 Project: Nutch Type: Bug Components: web gui Environment: Windows 2000 / Jakarta Tomcat 5.0.25 / JDK 1.5 / 0.7-dev fr

[jira] Created: (NUTCH-80) Web UI only works when project deployed in root

2005-08-15 Thread AJ Banck (JIRA)
Web UI only works when project deployed in root --- Key: NUTCH-80 URL: http://issues.apache.org/jira/browse/NUTCH-80 Project: Nutch Type: Bug Components: web gui Environment: Windows / Jakarta Tomcat 5.0.25 / JDK 1.5

Re: Injecting documents manually.

2005-08-15 Thread Dawid Weiss
Thanks, this helps. D. Andrzej Bialecki wrote: Andy Liu wrote: This is built into Nutch. Instead of injecting http:// url's, use file:// , and Nutch will use protocol-file to fetch the files locally. Andy Also, there is a tool I created to skip importing these URLs into database first..