Ok, let it be 0.7.1 then, Andrzej. No worries.
D.
Andrzej Bialecki wrote:
Hi,
This is yet another request for exception from the no-commit rule before
release ... *sigh*
Dawid Weiss reported that he prepared an updated version of the Carrot2
clustering plugin, which contains significant u
I downloaded code just a few hours ago... Windows XP, I have a Suse
Linux 9.3 on another PC but I am too lazy...
If nobody have such error under Linux - suppose I am wrong...
I run this inside Eclipse, J2SE 1.4.2_08, with classpath links to CONF
and directory containing Plugins. I need to check c
Fuad Efendi wrote:
It works now, I pass a folder to Crawl containing plain text file with
URLs. I am testing, and I pass single URL.
At some point I have:
050815 162137 parsing \tmp\nutch\mapred\local\job_q3s4ai.xml
050815 162137 parsing file:/C:/workspace/MapRed/conf/nutch-site.xml
java.io.IOEx
Thanks,
It works now, I pass a folder to Crawl containing plain text file with
URLs. I am testing, and I pass single URL.
At some point I have:
050815 162137 parsing \tmp\nutch\mapred\local\job_q3s4ai.xml
050815 162137 parsing file:/C:/workspace/MapRed/conf/nutch-site.xml
java.io.IOException: Fil
Fuad Efendi wrote:
Which parameter should I pass to Crawl? It should be directory
containing smth. in which format?
As before, inject takes a flat text files of urls, one per line. If you
wish to inject DMOZ urls, there is now a utility main() that will
convert the DMOZ file to such a file.
Which parameter should I pass to Crawl? It should be directory
containing smth. in which format?
Thanks,
Fuad
[ http://issues.apache.org/jira/browse/NUTCH-83?page=all ]
AJ Banck updated NUTCH-83:
--
Attachment: build_zip.patch
Patch of build.xml that adds a "package-zip" target and call this from the dist
target.
> Release deliverable as zip
> -
Boost for the page maybe calculated in few different ways (and in few
different places in nutch):
1) PageRank based score
- calculated by "nutch analyze" command based on WebDB
- during fetchlist generation scores from WebDB are stored in segment
- indexing phase uses scor
Release deliverable as zip
--
Key: NUTCH-83
URL: http://issues.apache.org/jira/browse/NUTCH-83
Project: Nutch
Type: Improvement
Environment: Windows
Reporter: AJ Banck
Like Lucene, Nutch could be delivered as a .zip file so it can be us
Hello,
To change nutch standard html parsing the best place to start would be
probably parse-html plugin.
Regards
Piotr
Fuad Efendi wrote:
1. This is part of ParseText:
Any Accessories Backup Devices & Media Barebone Systems Camcorder
Accessories Camcorders Cases & External Enclosures CD / DVD
-1
Maybe it would be a better idea to go for 0.7 branch and schedule a new
> 0.7.1 release in short time?
But +1 to include it in a 0.7.1 release !!
Regards
Jérôme
--
http://motrech.free.fr/
http://www.frutch.org/
[ http://issues.apache.org/jira/browse/NUTCH-80?page=all ]
Piotr Kosiorowski closed NUTCH-80:
--
Resolution: Invalid
Closed according to reporter comment.
> Web UI only works when project deployed in root
> --
Hi,
Maybe it would be a better idea to go for 0.7 branch and schedule a new
0.7.1 release in short time?
It is difficult for me to judge if the patch I had not seen is good for
release. So I would say 0 from me (if you think it is good enough I will
not object).
Regards,
Piotr
Andrzej Bialeck
I'm not desperate to have it included. But it is stable and I don't
think you need to worry.
I checked that it works with the newest SVN code (I was mostly worried
about the glue code -- the extension). I wouldn't worry much about the
clustering part that's compiled into JARs -- these are un
Michael:
On Aug 14, 2005, at 3:54 PM, Michael Ji wrote:
I saw several Log to debug the code, such as
LOG.fine and LOG.warning;
I wonder if the debugging text is output a particular
log file. If so, which command that I could use to
turn it on?
Though nutch uses the Jakarta commons logging syst
-1
(Not that my vote really matters but in fairness, I think that's too
short a notice for you, guys. I am sure Dawid did a great job, but
better be cautious IMHO).
Sébastien.
--- Andrzej Bialecki <[EMAIL PROTECTED]> a écrit :
> Hi,
>
> This is yet another request for exception from the no-co
[ http://issues.apache.org/jira/browse/NUTCH-82?page=all ]
AJ Banck updated NUTCH-82:
--
Attachment: nutch.bat
Update, remove obsolete comment
> Nutch Commands should run on Windows without external tools
> ---
[ http://issues.apache.org/jira/browse/NUTCH-82?page=all ]
AJ Banck updated NUTCH-82:
--
Attachment: nutch.bat
nutch.bat to be placed in bin folder.
This allows running all Nutch commandline tools from Windows. Tested for
Windows 2000 and XP.
> Nutch Commands s
Nutch Commands should run on Windows without external tools
---
Key: NUTCH-82
URL: http://issues.apache.org/jira/browse/NUTCH-82
Project: Nutch
Type: New Feature
Environment: Windows 2000
Reporter: AJ Banck
Hi,
This is yet another request for exception from the no-commit rule before
release ... *sigh*
Dawid Weiss reported that he prepared an updated version of the Carrot2
clustering plugin, which contains significant updates and improvements.
He suggests that it would be a good idea to include
Jay Pound wrote:
is the org.apache.nutch.crawl package a part of the nightly builds?
No. Nightly builds are from trunk. The mapred code is in a separate
branch in subversion. After the 0.7 release, when the mapred branch is
folded into trunk, then it will be in nightly builds. Until then
is the org.apache.nutch.crawl package a part of the nightly builds?
-J
- Original Message -
From: "Doug Cutting" <[EMAIL PROTECTED]>
To:
Sent: Monday, August 15, 2005 1:13 PM
Subject: Re: mapred
> webmaster wrote:
> > I need some help with how to use mapred, what are the commands to use
1. This is part of ParseText:
Any Accessories Backup Devices & Media Barebone Systems Camcorder
Accessories Camcorders Cases & External Enclosures CD / DVD Drives &
Media Cooling Devices Digital Camera Accessories Digital Cameras
- it is content of Dropdown, in HTML
2. I have some sub-text in P
I just catched some output from Fetcher.FetcherThread.outputPage(.) and
noticed that some anchors are in a text, and some tags within
a text too.
LOG.info("ParseText = "+text);
LOG.info("ParseData = "+ parseData);
I'd like to modify behaviour, ParseText should contain subset o
webmaster wrote:
I need some help with how to use mapred, what are the commands to use with it?
The mapred work is in progress and is not yet ready for production use.
In the mapred branch most of the Nutch backend crawling and indexing
commands have been rewritten in terms of MapReduce. The
Nils Hoeller wrote:
is there a way to index the whole WebDB,
which means the normal sites that have been indexed
+ the sites that are one depth deeper and so
beeing only stored in the WebDB
This is supposed to be possible, but I think no one has tried this in a
while and fear it may no longe
[
http://issues.apache.org/jira/browse/NUTCH-81?page=comments#action_12318805 ]
AJ Banck commented on NUTCH-81:
---
Description is inaccurate, the current index.jsp does a forward not a redirect.
If doing forward is needed the xsl or xml must be changed.
> Weba
[
http://issues.apache.org/jira/browse/NUTCH-80?page=comments#action_12318803 ]
AJ Banck commented on NUTCH-80:
---
Please close this case. Created by mistake.
> Web UI only works when project deployed in root
> ---
>
[ http://issues.apache.org/jira/browse/NUTCH-81?page=all ]
AJ Banck updated NUTCH-81:
--
Attachment: nutch_indexjsp_redirect.txt
Possible solution for working from a non-root application. This by doing a
redirect with the language path. This changes the URL in t
Webapp only works when deployed in root
---
Key: NUTCH-81
URL: http://issues.apache.org/jira/browse/NUTCH-81
Project: Nutch
Type: Bug
Components: web gui
Environment: Windows 2000 / Jakarta Tomcat 5.0.25 / JDK 1.5 / 0.7-dev fr
Web UI only works when project deployed in root
---
Key: NUTCH-80
URL: http://issues.apache.org/jira/browse/NUTCH-80
Project: Nutch
Type: Bug
Components: web gui
Environment: Windows / Jakarta Tomcat 5.0.25 / JDK 1.5
Thanks, this helps.
D.
Andrzej Bialecki wrote:
Andy Liu wrote:
This is built into Nutch. Instead of injecting http:// url's, use
file:// , and Nutch will use protocol-file to fetch the files locally.
Andy
Also, there is a tool I created to skip importing these URLs into
database first..
32 matches
Mail list logo