Build failed in Jenkins: Nutch-trunk #1572

2011-08-09 Thread Apache Jenkins Server
See -- [...truncated 924 lines...] A src/plugin/parse-tika/src/java/org/apache/nutch/parse/tika/TikaConfig.java A src/plugin/parse-tika/plugin.xml A src/plugin/parse-tika/build.xml A

[jira] [Updated] (NUTCH-208) http: proxy exception list:

2011-08-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-208: --- Attachment: NUTCH-208-branch-1.4-20110809-v2.patch v2 of the patch for branch-1.4

[Nutch Wiki] Trivial Update of "SetupProxyForNutch" by LewisJohnMcgibbney

2011-08-09 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "SetupProxyForNutch" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/SetupProxyForNutch?action=diff&rev1=15&rev2=16 Tinyproxy supports filtering of web

[Nutch Wiki] Trivial Update of "SetupProxyForNutch" by LewisJohnMcgibbney

2011-08-09 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "SetupProxyForNutch" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/SetupProxyForNutch?action=diff&rev1=14&rev2=15 Tinyproxy supports filtering of web si

[Nutch Wiki] Trivial Update of "SetupProxyForNutch" by LewisJohnMcgibbney

2011-08-09 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "SetupProxyForNutch" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/SetupProxyForNutch?action=diff&rev1=13&rev2=14 If necessary these will act as a black

[jira] [Closed] (NUTCH-296) Image Search

2011-08-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed NUTCH-296. -- Resolution: Won't Fix Assignee: Lewis John McGibbney As there has been no progress

[jira] [Created] (NUTCH-1077) Nutch 2 DbUpdateMapper throws ArrayOutOfBoundsException when running update

2011-08-09 Thread Tom Davidson (JIRA)
Nutch 2 DbUpdateMapper throws ArrayOutOfBoundsException when running update --- Key: NUTCH-1077 URL: https://issues.apache.org/jira/browse/NUTCH-1077 Project: Nutch Issu

[Nutch Wiki] Trivial Update of "SetupProxyForNutch" by LewisJohnMcgibbney

2011-08-09 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "SetupProxyForNutch" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/SetupProxyForNutch?action=diff&rev1=12&rev2=13 google.com apache.org }}} + for th

RE: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

2011-08-09 Thread Tom Davidson
Hi All, I have been using Nutch 1.x for the last 9 months or so and it works well for large scale crawls up to around a billion pages. However, the inherent lack of random access in HDFS really starts to become a burden on our hadoop cluster when going through the whole generate/update/fetch cy

[jira] [Commented] (NUTCH-666) Analysis plugins for multiple language and new Language Identifier Tool

2011-08-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081757#comment-13081757 ] Lewis John McGibbney commented on NUTCH-666: Thank you Dennis for confirming.

[jira] [Commented] (NUTCH-849) different versions of the same library in nutch-2.0-dev.job and local\lib directory

2011-08-09 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081755#comment-13081755 ] Markus Jelsma commented on NUTCH-849: - I see it in my 1.4-build too with several deps.

[jira] [Commented] (NUTCH-666) Analysis plugins for multiple language and new Language Identifier Tool

2011-08-09 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081752#comment-13081752 ] Dennis Kubes commented on NUTCH-666: I am still here. I still keep track of the lists

[jira] [Commented] (NUTCH-296) Image Search

2011-08-09 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081750#comment-13081750 ] Markus Jelsma commented on NUTCH-296: - Would be a nice feature but no patches. +1 close

[Nutch Wiki] Trivial Update of "GORA_HBase" by LewisJohnMcgibbney

2011-08-09 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "GORA_HBase" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/GORA_HBase?action=diff&rev1=6&rev2=7 }}} * Compile Nutch -> ant runtime - * Make sure

[jira] [Closed] (NUTCH-666) Analysis plugins for multiple language and new Language Identifier Tool

2011-08-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed NUTCH-666. -- Resolution: Won't Fix I understand that this is not my issue to close, however I have no

[jira] [Updated] (NUTCH-1067) Configure minimum throughput for fetcher

2011-08-09 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1067: - Attachment: NUTCH-1067-1.4-3.patch Another patch. It cleans the queue the same as time bomb and r

[jira] [Commented] (NUTCH-849) different versions of the same library in nutch-2.0-dev.job and local\lib directory

2011-08-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081731#comment-13081731 ] Lewis John McGibbney commented on NUTCH-849: I checked out the latest trunk 2.0

[jira] [Commented] (NUTCH-623) Change plugin source directory "languageidentifier" to "language-identifier"

2011-08-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081719#comment-13081719 ] Lewis John McGibbney commented on NUTCH-623: yes your right Julien... sorry my

[jira] [Commented] (NUTCH-296) Image Search

2011-08-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081714#comment-13081714 ] Lewis John McGibbney commented on NUTCH-296: The parsing and extraction of meta

[jira] [Commented] (NUTCH-342) Nutch commands log to nutch/logs/hadoop.logs by default

2011-08-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081708#comment-13081708 ] Lewis John McGibbney commented on NUTCH-342: OK well I think that sets a preced

[jira] [Commented] (NUTCH-978) [GSoC 2011] A Plugin for extracting certain element of a web page on html page parsing.

2011-08-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081703#comment-13081703 ] Lewis John McGibbney commented on NUTCH-978: If there has been a plugin written

Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

2011-08-09 Thread Kirby Bohling
Julien, On Tue, Aug 9, 2011 at 10:10 AM, Julien Nioche < lists.digitalpeb...@gmail.com> wrote: > Hi Kirby, > > Grumble, Grumble. (adding dev@nutch, as that is more than likely >> where this discussion really belongs)... >> > > am adding gora-...@incubator.apache.org as well > > >> It'd be reall

[jira] [Commented] (NUTCH-619) Another Language Identifier Plugin using Unicode code point range

2011-08-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081699#comment-13081699 ] Lewis John McGibbney commented on NUTCH-619: If language identification is dele

[jira] [Closed] (NUTCH-537) TestMP3Parser.java, TestRTFParser.java, TestMSWordParser.java compile

2011-08-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-537. --- Resolution: Won't Fix > TestMP3Parser.java, TestRTFParser.java, TestMSWordParser.java compile > --

[jira] [Commented] (NUTCH-537) TestMP3Parser.java, TestRTFParser.java, TestMSWordParser.java compile

2011-08-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081697#comment-13081697 ] Lewis John McGibbney commented on NUTCH-537: This issue is well and truly of no

[jira] [Closed] (NUTCH-463) Nutch powerpoint parser plugin fails to parse ppt with images

2011-08-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-463. --- Resolution: Won't Fix Parsing delegated to Tika > Nutch powerpoint parser plugin fails to parse ppt w

[jira] [Commented] (NUTCH-463) Nutch powerpoint parser plugin fails to parse ppt with images

2011-08-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081695#comment-13081695 ] Lewis John McGibbney commented on NUTCH-463: Can we close this issue? .ppt det

[jira] [Commented] (NUTCH-314) Multiple language identifier instances

2011-08-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081692#comment-13081692 ] Lewis John McGibbney commented on NUTCH-314: As language identification is bein

[jira] [Commented] (NUTCH-623) Change plugin source directory "languageidentifier" to "language-identifier"

2011-08-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081693#comment-13081693 ] Julien Nioche commented on NUTCH-623: - Lewis, Again this is a separate issue from the

Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

2011-08-09 Thread Julien Nioche
Hi Kirby, Grumble, Grumble. (adding dev@nutch, as that is more than likely > where this discussion really belongs)... > am adding gora-...@incubator.apache.org as well > It'd be really nice if folks could just follow the commands in the > nightly build, and get a build pushed out. I've pointe

[jira] [Commented] (NUTCH-839) nutch doesnt run under 0.20.2+228-1~karmic-cdh3b1 version of hadoop

2011-08-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081686#comment-13081686 ] Lewis John McGibbney commented on NUTCH-839: It would appear that a very simila

[jira] [Commented] (NUTCH-623) Change plugin source directory "languageidentifier" to "language-identifier"

2011-08-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081677#comment-13081677 ] Lewis John McGibbney commented on NUTCH-623: If we wished to fix this, then it

Re: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk

2011-08-09 Thread lewis john mcgibbney
Hi Kirby, I was aware that this had been a concern for sometime and was unfamiliar with the process for nightly builds with trunk and dependencies resulting in the Nutch trunk build consistently failing. I thought it was to do with [1], however we still seem to be having major problems with this.

[jira] [Commented] (NUTCH-623) Change plugin source directory "languageidentifier" to "language-identifier"

2011-08-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081663#comment-13081663 ] Julien Nioche commented on NUTCH-623: - The functionality being delegated to Tika does m

[jira] [Issue Comment Edited] (NUTCH-623) Change plugin source directory "languageidentifier" to "language-identifier"

2011-08-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081663#comment-13081663 ] Julien Nioche edited comment on NUTCH-623 at 8/9/11 2:34 PM: - T

[jira] [Commented] (NUTCH-623) Change plugin source directory "languageidentifier" to "language-identifier"

2011-08-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081642#comment-13081642 ] Lewis John McGibbney commented on NUTCH-623: On second thoughts, and taking int

[jira] [Assigned] (NUTCH-623) Change plugin source directory "languageidentifier" to "language-identifier"

2011-08-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-623: -- Assignee: Lewis John McGibbney > Change plugin source directory "languageidentif

[jira] [Commented] (NUTCH-881) Good quality documentation for Nutch

2011-08-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081639#comment-13081639 ] Lewis John McGibbney commented on NUTCH-881: In Nutch trunk we currently only h

[jira] [Assigned] (NUTCH-881) Good quality documentation for Nutch

2011-08-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-881: -- Assignee: Lewis John McGibbney > Good quality documentation for Nutch >

Re: Nutch 2.0 Documentation

2011-08-09 Thread lewis john mcgibbney
Hi Markus, This is correct, in Nutch trunk we currently only have the wiki as a repository for any Nutch 2.0 information. Is this satisfactory? As far as I can tell, the documentation for Gora_trunk is produced using Apache Forrest. I am reasonably familiar with using Forrest and it would be a gr

[jira] [Commented] (NUTCH-1028) Log parser keys

2011-08-09 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081587#comment-13081587 ] Markus Jelsma commented on NUTCH-1028: -- Distributed mode is recommended indeed and in

[jira] [Commented] (NUTCH-1028) Log parser keys

2011-08-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081585#comment-13081585 ] Julien Nioche commented on NUTCH-1028: -- You can see the progression of the parsing on

[jira] [Updated] (NUTCH-1028) Log parser keys

2011-08-09 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1028: - Patch Info: [Patch Available] > Log parser keys > --- > > Key: NUTCH-

[jira] [Updated] (NUTCH-1028) Log parser keys

2011-08-09 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1028: - Attachment: NUTCH-1028-1.4-1.patch Patch for 1.4 > Log parser keys > --- > >

Re: Nutch 2.0 Documentation

2011-08-09 Thread Markus Jelsma
Hi, Maybe a stupid question but i don't see a trunk/docs? Cheers On Thursday 04 August 2011 12:47:54 lewis john mcgibbney wrote: > Hi, > > Was mucking around on a totally separate personal issue with Gora today and > couldn't help but like the /docs directory which is bundled when you svn co >