[jira] [Updated] (NUTCH-1075) Delegate language identification to Tika

2011-08-19 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1075: - Attachment: NUTCH-1075-v3.patch Added parameter to bypass the check on isReasonablyCertain

[jira] [Assigned] (NUTCH-1064) o.a.n.util.MimeUtil uses deprecated Tika methods

2011-08-19 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche reassigned NUTCH-1064: Assignee: Julien Nioche > o.a.n.util.MimeUtil uses deprecated Tika meth

[jira] [Commented] (NUTCH-1075) Delegate language identification to Tika

2011-08-19 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087685#comment-13087685 ] Julien Nioche commented on NUTCH-1075: -- the identification should not be affecte

[jira] [Resolved] (NUTCH-1075) Delegate language identification to Tika

2011-08-19 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1075. -- Resolution: Fixed Committed revision 1159621. Thanks for reviewing it! > Delegate langu

[jira] [Updated] (NUTCH-1085) Nutch script does not require HADOOP_HOME

2011-08-22 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1085: - Attachment: NUTCH-1085.patch > Nutch script does not require HADOOP_H

[jira] [Created] (NUTCH-1085) Nutch script does not require HADOOP_HOME

2011-08-22 Thread Julien Nioche (JIRA)
Reporter: Julien Nioche Assignee: Julien Nioche Fix For: 1.4, 2.0 The Nutch script currently requires HADOOP_HOME to be set and point to a valid HADOOP setup in order to run in distributed mode. What is actually needs is not the location of the whole Hadoop setup but just to

[jira] [Commented] (NUTCH-1067) Configure minimum throughput for fetcher

2011-08-22 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088602#comment-13088602 ] Julien Nioche commented on NUTCH-1067: -- Looks good but 2 comments th

[jira] [Commented] (NUTCH-1073) Rename parameters 'fetcher.threads.per.host.by.ip' and 'fetcher.threads.per.host'

2011-08-22 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088597#comment-13088597 ] Julien Nioche commented on NUTCH-1073: -- Will commit shortly unless someone ha

[jira] [Commented] (NUTCH-1067) Configure minimum throughput for fetcher

2011-08-22 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088677#comment-13088677 ] Julien Nioche commented on NUTCH-1067: -- {quote} * this is going to be diffi

[jira] [Commented] (NUTCH-1073) Rename parameters 'fetcher.threads.per.host.by.ip' and 'fetcher.threads.per.host'

2011-08-22 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088712#comment-13088712 ] Julien Nioche commented on NUTCH-1073: -- Don't think it can be applied as is

[jira] [Resolved] (NUTCH-1085) Nutch script does not require HADOOP_HOME

2011-08-23 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1085. -- Resolution: Fixed Trunk : Committed revision 1160738 1.4 : Committed revision 1160734 > Nu

[jira] [Resolved] (NUTCH-1089) short compressed pages caused Exception

2011-08-23 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1089. -- Resolution: Fixed 1.4 Committed revision 1160753. trunk Committed revision 1160754 Thanks

[jira] [Commented] (NUTCH-1057) Make fetcher thread time out configurable

2011-08-24 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090081#comment-13090081 ] Julien Nioche commented on NUTCH-1057: -- Haven't you committed it already?

[jira] [Commented] (NUTCH-1024) Dynamically set fetchInterval by MIME-type

2011-08-24 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090082#comment-13090082 ] Julien Nioche commented on NUTCH-1024: -- Do you mind if we wait a bit? I'

[jira] [Commented] (NUTCH-1024) Dynamically set fetchInterval by MIME-type

2011-08-24 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090115#comment-13090115 ] Julien Nioche commented on NUTCH-1024: -- There is a JIRA issue for 2.0 h

[jira] [Commented] (NUTCH-1095) remove i18n from Nutch site to archive and legacy secton of wiki

2011-08-24 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090477#comment-13090477 ] Julien Nioche commented on NUTCH-1095: -- +1 thanks! > remove i18n from Nutch

[jira] [Commented] (NUTCH-937) When nutch is run on hadoop > 0.20.2 (or cdh) it will not find plugins because MapReduce will not unpack plugin/ directory from the job's pack (due to MAPREDUCE-967)

2011-08-24 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090811#comment-13090811 ] Julien Nioche commented on NUTCH-937: - Markus, The param plugin.folder is multiva

[jira] [Reopened] (NUTCH-990) protocol-httpclient fails with short pages

2011-08-26 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche reopened NUTCH-990: - > protocol-httpclient fails with short pa

[jira] [Resolved] (NUTCH-990) protocol-httpclient fails with short pages

2011-08-26 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-990. - Resolution: Fixed Fix Version/s: (was: 1.3) 1.4 A patch has been

[jira] [Commented] (NUTCH-937) When nutch is run on hadoop > 0.20.2 (or cdh) it will not find plugins because MapReduce will not unpack plugin/ directory from the job's pack (due to MAPREDUCE-967)

2011-08-26 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091747#comment-13091747 ] Julien Nioche commented on NUTCH-937: - @Radim : Nutch is based on the Ap

[jira] [Commented] (NUTCH-937) When nutch is run on hadoop > 0.20.2 (or cdh) it will not find plugins because MapReduce will not unpack plugin/ directory from the job's pack (due to MAPREDUCE-967)

2011-08-30 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093807#comment-13093807 ] Julien Nioche commented on NUTCH-937: - @Ferdy - good detective work! I like

[jira] [Assigned] (NUTCH-937) When nutch is run on hadoop > 0.20.2 (or cdh) it will not find plugins because MapReduce will not unpack plugin/ directory from the job's pack (due to MAPREDUCE-967)

2011-09-01 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche reassigned NUTCH-937: --- Assignee: Julien Nioche (was: Markus Jelsma) > When nutch is run on hadoop > 0.20.2 (

[jira] [Updated] (NUTCH-937) When nutch is run on hadoop > 0.20.2 (or cdh) it will not find plugins because MapReduce will not unpack plugin/ directory from the job's pack (due to MAPREDUCE-967)

2011-09-01 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-937: Priority: Minor (was: Major) Patch Info: [Patch Available] Issue Type: Improvement (was

[jira] [Commented] (NUTCH-937) When nutch is run on hadoop > 0.20.2 (or cdh) it will not find plugins because MapReduce will not unpack plugin/ directory from the job's pack (due to MAPREDUCE-967)

2011-09-01 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095229#comment-13095229 ] Julien Nioche commented on NUTCH-937: - Works fine on Hadoop-0.20.203.0 and

[jira] [Resolved] (NUTCH-1073) Rename parameters 'fetcher.threads.per.host.by.ip' and 'fetcher.threads.per.host'

2011-09-01 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1073. -- Resolution: Fixed Committed revision 1164064. > Rename paramet

[jira] [Resolved] (NUTCH-1096) Empty (not null) ContentLength results in failure of fetch

2011-09-01 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1096. -- Resolution: Fixed trunk : Committed revision 1164107 1.4 : Committed revision 1164108 Thanks

[jira] [Commented] (NUTCH-1102) Fetcher, rely on fetcher.parse directive only

2011-09-06 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097948#comment-13097948 ] Julien Nioche commented on NUTCH-1102: -- @Markus : in the future maybe try and ha

[jira] [Commented] (NUTCH-1067) Configure minimum throughput for fetcher

2011-09-06 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097950#comment-13097950 ] Julien Nioche commented on NUTCH-1067: -- see comments on NUTCH-1102 Patch for

[jira] [Commented] (NUTCH-1101) Options to purge db_gone records in updatedb

2011-09-06 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097955#comment-13097955 ] Julien Nioche commented on NUTCH-1101: -- The functionality makes sense but I am

[jira] [Commented] (NUTCH-1108) Index image and video format with nutch 1.3

2011-09-10 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102011#comment-13102011 ] Julien Nioche commented on NUTCH-1108: -- Can you parse the file succesfully with

[jira] [Closed] (NUTCH-914) Implement Apache Project Branding Requirements

2011-09-11 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-914. --- Resolution: Fixed Thanks Lewis > Implement Apache Project Branding Requireme

[jira] [Commented] (NUTCH-1005) Index headings plugin

2011-09-14 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104427#comment-13104427 ] Julien Nioche commented on NUTCH-1005: -- Can't you do that with urlmet

[jira] [Reopened] (NUTCH-1067) Configure minimum throughput for fetcher

2011-09-14 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche reopened NUTCH-1067: -- At revision 1170548. ant clean then ant => compile-core: [javac] /data/nutch-1.4/build.

[jira] [Commented] (NUTCH-1005) Index headings plugin

2011-09-15 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13105456#comment-13105456 ] Julien Nioche commented on NUTCH-1005: -- you are right. I'd read your com

[jira] [Closed] (NUTCH-1112) off-by-one error in protocol-httpclient; truncates up to HttpBase.BUFFER_SIZE content

2011-09-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-1112. Resolution: Duplicate https://issues.apache.org/jira/browse/NUTCH-1089 already fixed this. Thanks

[jira] [Commented] (NUTCH-1052) Multiple deletes of the same URL using SolrClean

2011-09-20 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108633#comment-13108633 ] Julien Nioche commented on NUTCH-1052: -- I like the original idea and agree

[jira] [Commented] (NUTCH-1052) Multiple deletes of the same URL using SolrClean

2011-09-20 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108701#comment-13108701 ] Julien Nioche commented on NUTCH-1052: -- Yep, that's the idea. The class

[jira] [Commented] (NUTCH-1052) Multiple deletes of the same URL using SolrClean

2011-09-20 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108757#comment-13108757 ] Julien Nioche commented on NUTCH-1052: -- {quote} Julien, will it break on Ha

[jira] [Commented] (NUTCH-1005) Index headings plugin

2011-09-21 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109435#comment-13109435 ] Julien Nioche commented on NUTCH-1005: -- let's try and come up with a sing

[jira] [Commented] (NUTCH-1115) Option to disable fixing of embedded params in DomContentUtils

2011-09-22 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112576#comment-13112576 ] Julien Nioche commented on NUTCH-1115: -- +1 Don't forget to add the same

[jira] [Commented] (NUTCH-1129) Any23 Nutch plugin

2011-09-24 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114077#comment-13114077 ] Julien Nioche commented on NUTCH-1129: -- Any23 might graduate into a Tika subpro

[jira] [Created] (NUTCH-1131) Rely on published artefacts for GORA

2011-09-25 Thread Julien Nioche (JIRA)
Reporter: Julien Nioche Fix For: 2.0 We had to build GORA locally prior to building Nutch 2.0 but can now rely on the published artefacts with version 0.1.1-incubation -- This message is automatically generated by JIRA. For more information on JIRA, see: http

[jira] [Closed] (NUTCH-1131) Rely on published artefacts for GORA

2011-09-25 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-1131. > Rely on published artefacts for GORA > > >

[jira] [Resolved] (NUTCH-1131) Rely on published artefacts for GORA

2011-09-25 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1131. -- Resolution: Fixed Committed revision 1175571. > Rely on published artefacts for G

[jira] [Commented] (NUTCH-882) Design a Host table in GORA

2012-04-26 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13262500#comment-13262500 ] Julien Nioche commented on NUTCH-882: - Ferdy I'll let you close it. I don&#x

[jira] [Commented] (NUTCH-1347) fetcher politeness related to map-reduce

2012-05-01 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265728#comment-13265728 ] Julien Nioche commented on NUTCH-1347: -- Not clear what the issue is. You can g

[jira] [Commented] (NUTCH-1347) fetcher politeness related to map-reduce

2012-05-01 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265804#comment-13265804 ] Julien Nioche commented on NUTCH-1347: -- bq. i can not recognize your solution th

[jira] [Commented] (NUTCH-809) Parse-metatags plugin

2012-05-08 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13270304#comment-13270304 ] Julien Nioche commented on NUTCH-809: - Kristof, please use the mailing list ins

[jira] [Updated] (NUTCH-1370) Expose exact number of urls injected @runtime

2012-05-22 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1370: - Priority: Minor (was: Major) Running in pseudo-distributed mode gives you more information if

[jira] [Created] (NUTCH-1371) Replace Ivy with Maven Ant tasks

2012-05-22 Thread Julien Nioche (JIRA)
Julien Nioche created NUTCH-1371: Summary: Replace Ivy with Maven Ant tasks Key: NUTCH-1371 URL: https://issues.apache.org/jira/browse/NUTCH-1371 Project: Nutch Issue Type: Improvement

[jira] [Updated] (NUTCH-1371) Replace Ivy with Maven Ant tasks

2012-05-22 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1371: - Attachment: NUTCH-1371.patch Preliminary version. Needs maven-ant-tasks-2.1.3.jar in ivy dir

[jira] [Commented] (NUTCH-1375) extract main content of a html file

2012-05-22 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13281024#comment-13281024 ] Julien Nioche commented on NUTCH-1375: -- your patch generates nois

[jira] [Updated] (NUTCH-1370) Expose exact number of urls injected @runtime

2012-06-07 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1370: - Affects Version/s: (was: 1.4) 1.5 Fix Version/s: (was: 1.5

[jira] [Created] (NUTCH-1396) Upgrade to Tika 1.1

2012-06-15 Thread Julien Nioche (JIRA)
Julien Nioche created NUTCH-1396: Summary: Upgrade to Tika 1.1 Key: NUTCH-1396 URL: https://issues.apache.org/jira/browse/NUTCH-1396 Project: Nutch Issue Type: Bug Affects Versions

[jira] [Updated] (NUTCH-1396) Upgrade to Tika 1.1

2012-06-15 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1396: - Attachment: NUTCH-1396.patch > Upgrade to Tika

[jira] [Closed] (NUTCH-1396) Upgrade to Tika 1.1

2012-06-15 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-1396. Assignee: Julien Nioche Thanks Lewis > Upgrade to Tika

[jira] [Commented] (NUTCH-1081) ant tests fail

2012-06-15 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295672#comment-13295672 ] Julien Nioche commented on NUTCH-1081: -- The tests for nutchgora seem to work

[jira] [Created] (NUTCH-1398) Upgrade to Hadoop 1.0.3

2012-06-15 Thread Julien Nioche (JIRA)
Julien Nioche created NUTCH-1398: Summary: Upgrade to Hadoop 1.0.3 Key: NUTCH-1398 URL: https://issues.apache.org/jira/browse/NUTCH-1398 Project: Nutch Issue Type: Improvement Affects

[jira] [Commented] (NUTCH-1398) Upgrade to Hadoop 1.0.3

2012-06-15 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295674#comment-13295674 ] Julien Nioche commented on NUTCH-1398: -- trunk : Committed revision 1350630.

[jira] [Commented] (NUTCH-1397) language-identifier incorrectly handles double-barreled language properties

2012-06-15 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295738#comment-13295738 ] Julien Nioche commented on NUTCH-1397: -- Lewis, the language identification

[jira] [Created] (NUTCH-1399) TestProtocolHttpClient fails

2012-06-17 Thread Julien Nioche (JIRA)
Julien Nioche created NUTCH-1399: Summary: TestProtocolHttpClient fails Key: NUTCH-1399 URL: https://issues.apache.org/jira/browse/NUTCH-1399 Project: Nutch Issue Type: Bug Affects

[jira] [Updated] (NUTCH-1399) TestProtocolHttpClient fails

2012-06-17 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1399: - Attachment: NUTCH-1399.patch > TestProtocolHttpClient fa

[jira] [Created] (NUTCH-1401) Upgrade to Hadoop 1.0.3

2012-06-19 Thread Julien Nioche (JIRA)
Julien Nioche created NUTCH-1401: Summary: Upgrade to Hadoop 1.0.3 Key: NUTCH-1401 URL: https://issues.apache.org/jira/browse/NUTCH-1401 Project: Nutch Issue Type: Improvement Affects

[jira] [Created] (NUTCH-1402) Create AbstractScoringFilter

2012-06-19 Thread Julien Nioche (JIRA)
Julien Nioche created NUTCH-1402: Summary: Create AbstractScoringFilter Key: NUTCH-1402 URL: https://issues.apache.org/jira/browse/NUTCH-1402 Project: Nutch Issue Type: Improvement

[jira] [Created] (NUTCH-1403) Add default ScoringFilter for manipulating metadata

2012-06-19 Thread Julien Nioche (JIRA)
Julien Nioche created NUTCH-1403: Summary: Add default ScoringFilter for manipulating metadata Key: NUTCH-1403 URL: https://issues.apache.org/jira/browse/NUTCH-1403 Project: Nutch Issue

[jira] [Updated] (NUTCH-1401) Upgrade to Hadoop 1.0.3

2012-06-19 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1401: - Fix Version/s: (was: 1.6) 1.5.1 Assignee: Julien Nioche

[jira] [Updated] (NUTCH-1400) Remove developer -core option for bin/nutch

2012-06-19 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1400: - Fix Version/s: (was: 2.1) (was: 1.6) 1.5.1

[jira] [Created] (NUTCH-1404) Nutch script fails to find job file in deploy mode

2012-06-19 Thread Julien Nioche (JIRA)
Julien Nioche created NUTCH-1404: Summary: Nutch script fails to find job file in deploy mode Key: NUTCH-1404 URL: https://issues.apache.org/jira/browse/NUTCH-1404 Project: Nutch Issue Type

[jira] [Updated] (NUTCH-1398) Upgrade to Hadoop 1.0.3

2012-06-19 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1398: - Affects Version/s: (was: nutchgora) Fix Version/s: (was: 2.1) > Upgrade

[jira] [Updated] (NUTCH-1398) Upgrade to Hadoop 1.0.3

2012-06-19 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1398: - Fix Version/s: (was: 1.6) 1.5.1 > Upgrade to Hadoop 1.

[jira] [Resolved] (NUTCH-1398) Upgrade to Hadoop 1.0.3

2012-06-19 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1398. -- Resolution: Fixed > Upgrade to Hadoop 1.

[jira] [Resolved] (NUTCH-1401) Upgrade to Hadoop 1.0.3

2012-06-19 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1401. -- Resolution: Fixed Committed revision 1351705 in branch nutchgora > Upgr

[jira] [Updated] (NUTCH-1401) Upgrade to Hadoop 1.0.3

2012-06-19 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1401: - Affects Version/s: (was: 1.5) Fix Version/s: (was: 1.5.1) > Upgrade

[jira] [Resolved] (NUTCH-1404) Nutch script fails to find job file in deploy mode

2012-06-19 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1404. -- Resolution: Fixed Nutchgora : Committed revision 1351707. Trunk : Committed revision 1351709

[jira] [Resolved] (NUTCH-1400) Remove developer -core option for bin/nutch

2012-06-20 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1400. -- Resolution: Fixed Trunk => Committed revision 1352008. NutchGora => Committed revision 1

[jira] [Updated] (NUTCH-1391) readdb -stats fires java.io.EOFException

2012-06-20 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1391: - Fix Version/s: (was: 2.1) nutchgora I think we should fix it for 2.0

[jira] [Commented] (NUTCH-1391) readdb -stats fires java.io.EOFException

2012-06-20 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397411#comment-13397411 ] Julien Nioche commented on NUTCH-1391: -- A repeart of NUTCH-1110 -> we d

[jira] [Resolved] (NUTCH-1391) readdb -stats fires java.io.EOFException

2012-06-20 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1391. -- Resolution: Fixed Committed revision 1352037. > readdb -stats fi

[jira] [Commented] (NUTCH-1406) Metatags-index/-parse plugin: conversion to Solr date format and prevents parsing/indexing of empty tags

2012-06-21 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398247#comment-13398247 ] Julien Nioche commented on NUTCH-1406: -- See http://wiki.apache.org/n

[jira] [Commented] (NUTCH-1031) Delegate parsing of robots.txt to crawler-commons

2012-06-21 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398340#comment-13398340 ] Julien Nioche commented on NUTCH-1031: -- crawler-commons is not super active a

[jira] [Commented] (NUTCH-1388) Optionally maintain custom fetch interval despite AdaptiveFetchSchedule

2012-06-21 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398363#comment-13398363 ] Julien Nioche commented on NUTCH-1388: -- Let's release 1.

[jira] [Commented] (NUTCH-1341) NotModified time set to now but page not modified

2012-06-21 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398362#comment-13398362 ] Julien Nioche commented on NUTCH-1341: -- Let's release 1.5.1 first then add

[jira] [Commented] (NUTCH-1406) Metatags-index/-parse plugin: conversion to Solr date format and prevents parsing/indexing of empty tags

2012-06-21 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398420#comment-13398420 ] Julien Nioche commented on NUTCH-1406: -- bq. index-metatags plugin (sometimes

[jira] [Commented] (NUTCH-1406) metadata-index plugin: conversion to Solr date format

2012-06-22 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13399250#comment-13399250 ] Julien Nioche commented on NUTCH-1406: -- BTW we have formatting rules for Eclips

[jira] [Commented] (NUTCH-1405) Allow to overwrite CrawlDatum's with injected entries

2012-06-26 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401465#comment-13401465 ] Julien Nioche commented on NUTCH-1405: -- Correct me if I 'm wrong but doe

[jira] [Commented] (NUTCH-1405) Allow to overwrite CrawlDatum's with injected entries

2012-06-28 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402940#comment-13402940 ] Julien Nioche commented on NUTCH-1405: -- Can you please add some tests for thi

[jira] [Commented] (NUTCH-1405) Allow to overwrite CrawlDatum's with injected entries

2012-06-28 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402973#comment-13402973 ] Julien Nioche commented on NUTCH-1405: -- what about the command "nu

[jira] [Commented] (NUTCH-1405) Allow to overwrite CrawlDatum's with injected entries

2012-06-28 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403052#comment-13403052 ] Julien Nioche commented on NUTCH-1405: -- Markus, make sure you generate a patch

[jira] [Updated] (NUTCH-1087) Deprecate crawl command and replace with example script

2012-07-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1087: - Attachment: crawl WORK IN PROGRESS Need to add more comments + include the injection, linkd and

[jira] [Assigned] (NUTCH-1087) Deprecate crawl command and replace with example script

2012-07-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche reassigned NUTCH-1087: Assignee: Julien Nioche > Deprecate crawl command and replace with example scr

[jira] [Commented] (NUTCH-1405) Allow to overwrite CrawlDatum's with injected entries

2012-07-04 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406551#comment-13406551 ] Julien Nioche commented on NUTCH-1405: -- db.injector.preserve.metadata is prob

[jira] [Commented] (NUTCH-1405) Allow to overwrite CrawlDatum's with injected entries

2012-07-04 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406564#comment-13406564 ] Julien Nioche commented on NUTCH-1405: -- the way I was thinking about it was

[jira] [Commented] (NUTCH-1405) Allow to overwrite CrawlDatum's with injected entries

2012-07-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406942#comment-13406942 ] Julien Nioche commented on NUTCH-1405: -- Passes the tests, all good! +1 Thanks Ma

[jira] [Commented] (NUTCH-1414) Date extraction parse filter

2012-07-06 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408055#comment-13408055 ] Julien Nioche commented on NUTCH-1414: -- I'm concerned about the prolife

[jira] [Commented] (NUTCH-1360) Suport the storing of IP address connected to when web crawling

2012-07-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409578#comment-13409578 ] Julien Nioche commented on NUTCH-1360: -- Guys, unless a change is trivial pleas

[jira] [Updated] (NUTCH-1360) Suport the storing of IP address connected to when web crawling

2012-07-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1360: - Fix Version/s: (was: nutchgora) 2.1 > Suport the storing of

[jira] [Updated] (NUTCH-1087) Deprecate crawl command and replace with example script

2012-07-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1087: - Attachment: NUTCH-1087.patch First version of the nutch crawl script. Please test and review

[jira] [Updated] (NUTCH-1087) Deprecate crawl command and replace with example script

2012-07-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1087: - Attachment: (was: crawl) > Deprecate crawl command and replace with example scr

[jira] [Commented] (NUTCH-1087) Deprecate crawl command and replace with example script

2012-07-10 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410344#comment-13410344 ] Julien Nioche commented on NUTCH-1087: -- Good catch Markus. Ideally we'd ne

[jira] [Updated] (NUTCH-1087) Deprecate crawl command and replace with example script

2012-07-10 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1087: - Attachment: NUTCH-1087-1.6-3.patch The script now determines where the nutch script is located

<    4   5   6   7   8   9   10   11   12   13   >