[jira] [Resolved] (NUTCH-1640) OOM in ParseSegment Phase

2013-10-07 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1640. -- Resolution: Fixed Committed revision 1529802. Thanks Mitesh. OOM in ParseSegment Phase

[jira] [Commented] (NUTCH-1562) Order of execution for scoring filters

2013-10-07 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788039#comment-13788039 ] Julien Nioche commented on NUTCH-1562: -- Hi Seb You are right about the order from

[jira] [Resolved] (NUTCH-1562) Order of execution for scoring filters

2013-10-07 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1562. -- Resolution: Fixed Committed revision 1529813. Order of execution for scoring filters

[jira] [Updated] (NUTCH-1606) Check that Factory classes use the cache in a thread safe way

2013-10-07 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1606: - Attachment: NUTCH-1606.patch Synchronized methods on ObjectCache + calls from

[jira] [Created] (NUTCH-1652) Avoid instanciation of MimeUtil for each Content object created

2013-10-07 Thread Julien Nioche (JIRA)
Julien Nioche created NUTCH-1652: Summary: Avoid instanciation of MimeUtil for each Content object created Key: NUTCH-1652 URL: https://issues.apache.org/jira/browse/NUTCH-1652 Project: Nutch

[jira] [Updated] (NUTCH-1653) AbstractScoringFilter

2013-10-10 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1653: - Attachment: NUTCH-1653.patch AbstractScoringFilter - Key

[jira] [Created] (NUTCH-1653) AbstractScoringFilter

2013-10-10 Thread Julien Nioche (JIRA)
Julien Nioche created NUTCH-1653: Summary: AbstractScoringFilter Key: NUTCH-1653 URL: https://issues.apache.org/jira/browse/NUTCH-1653 Project: Nutch Issue Type: Improvement Affects

[jira] [Updated] (NUTCH-1653) AbstractScoringFilter

2013-10-10 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1653: - Priority: Minor (was: Major) AbstractScoringFilter

[jira] [Commented] (NUTCH-1568) port pluggable indexing architecture to 2.x

2013-10-10 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13791396#comment-13791396 ] Julien Nioche commented on NUTCH-1568: -- It would probably be simpler to first port

[jira] [Resolved] (NUTCH-1653) AbstractScoringFilter

2013-10-10 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1653. -- Resolution: Fixed Committed revision 1530979. thanks Seb and Markus AbstractScoringFilter

[jira] [Commented] (NUTCH-1606) Check that Factory classes use the cache in a thread safe way

2013-10-11 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792635#comment-13792635 ] Julien Nioche commented on NUTCH-1606: -- Will commit shortly unless someone objects

[jira] [Resolved] (NUTCH-1606) Check that Factory classes use the cache in a thread safe way

2013-10-14 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1606. -- Resolution: Fixed Committed revision 1531833. Check that Factory classes use the cache

[jira] [Commented] (NUTCH-1371) Replace Ivy with Maven Ant tasks

2013-10-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796626#comment-13796626 ] Julien Nioche commented on NUTCH-1371: -- Does anyone have a bit of time to test

[jira] [Commented] (NUTCH-1377) Add option to index via CloudSolrServer instead

2013-10-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796642#comment-13796642 ] Julien Nioche commented on NUTCH-1377: -- Hi, What about having SOLR 4 as a separate

[jira] [Commented] (NUTCH-1656) ParseMeta not passed to CrawlDatum for not_modified

2013-10-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796650#comment-13796650 ] Julien Nioche commented on NUTCH-1656: -- nice one. +1 ParseMeta not passed

[jira] [Commented] (NUTCH-1371) Replace Ivy with Maven Ant tasks

2013-10-17 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797729#comment-13797729 ] Julien Nioche commented on NUTCH-1371: -- Hi Talat. That would be great, the latest one

[jira] [Commented] (NUTCH-1541) Indexer plugin to write CSV

2013-10-17 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797768#comment-13797768 ] Julien Nioche commented on NUTCH-1541: -- Hi line 342 needs to be {code} while

[jira] [Updated] (NUTCH-656) DeleteDuplicates based on crawlDB only

2013-10-19 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-656: Attachment: NUTCH-656.v2.patch Attached is a new patch which creates a new db status

[jira] [Commented] (NUTCH-1640) OOM in ParseSegment Phase

2013-10-23 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13802733#comment-13802733 ] Julien Nioche commented on NUTCH-1640: -- Can't quite believe I'd managed to screw what

[jira] [Commented] (NUTCH-1640) OOM in ParseSegment Phase

2013-11-01 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811182#comment-13811182 ] Julien Nioche commented on NUTCH-1640: -- Ian, Branches such as 1.7 are snapshots done

[jira] [Closed] (NUTCH-1664) Support for Hadoop 2.x

2013-11-06 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-1664. Resolution: Invalid Ask the mailing list if you have any specific issues when running Nutch. You

[jira] [Created] (NUTCH-1666) Optimisation for BasicURLNormalizer

2013-11-11 Thread Julien Nioche (JIRA)
Julien Nioche created NUTCH-1666: Summary: Optimisation for BasicURLNormalizer Key: NUTCH-1666 URL: https://issues.apache.org/jira/browse/NUTCH-1666 Project: Nutch Issue Type: Improvement

[jira] [Updated] (NUTCH-1666) Optimisation for BasicURLNormalizer

2013-11-11 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1666: - Attachment: NUTCH-1666.patch Optimisation for BasicURLNormalizer

[jira] [Resolved] (NUTCH-1666) Optimisation for BasicURLNormalizer

2013-11-11 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1666. -- Resolution: Fixed Committed revision 1540654. Thanks Markus! Optimisation

[jira] [Resolved] (NUTCH-1100) SolrDedup broken

2013-11-11 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1100. -- Resolution: Fixed Committed revision 1540758. We'll probably move to a more generic approach

[jira] [Updated] (NUTCH-656) DeleteDuplicates based on crawlDB only

2013-11-14 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-656: Attachment: NUTCH-656.v3.patch Thanks for your comments Seb. This new patch addresses some

[jira] [Updated] (NUTCH-656) DeleteDuplicates based on crawlDB only

2013-11-14 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-656: Attachment: (was: NUTCH-656.v3.patch) DeleteDuplicates based on crawlDB only

[jira] [Updated] (NUTCH-656) DeleteDuplicates based on crawlDB only

2013-11-14 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-656: Attachment: NUTCH-656.v3.patch correct attachment DeleteDuplicates based on crawlDB only

[jira] [Resolved] (NUTCH-656) DeleteDuplicates based on crawlDB only

2013-11-14 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-656. - Resolution: Fixed Committed revision 1541883. Committed with a few minor changes compared

[jira] [Created] (NUTCH-1668) Remove package org.apache.nutch.indexer.solr

2013-11-14 Thread Julien Nioche (JIRA)
Julien Nioche created NUTCH-1668: Summary: Remove package org.apache.nutch.indexer.solr Key: NUTCH-1668 URL: https://issues.apache.org/jira/browse/NUTCH-1668 Project: Nutch Issue Type: Task

[jira] [Resolved] (NUTCH-1621) Deprecated class o.a.n.crawl.Crawler is still in code base

2013-11-14 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1621. -- Resolution: Fixed Trunk : Committed revision 1541885. 2.x : Committed revision 1541886. I

[jira] [Updated] (NUTCH-1668) Remove package org.apache.nutch.indexer.solr

2013-11-14 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1668: - Attachment: NUTCH-1668.patch Patch which removes the indexer.solr subpackage and deprecates

[jira] [Commented] (NUTCH-656) DeleteDuplicates based on crawlDB only

2013-11-15 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13823716#comment-13823716 ] Julien Nioche commented on NUTCH-656: - [~wastl-nagel] yep, I did that as part of NUTCH

[jira] [Resolved] (NUTCH-828) Fetch Filter

2013-11-15 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-828. - Resolution: Won't Fix A better approach is to operate within the parsing step, as explained

[jira] [Resolved] (NUTCH-1607) Make inproper multiValued field configurable

2013-11-15 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1607. -- Resolution: Not A Problem Sorry for the later reply. A simple workaround is to modify

[jira] [Resolved] (NUTCH-1558) CharEncodingForConversion in ParseData's ParseMeta, not in ParseData's ContentMeta

2013-11-15 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1558. -- Resolution: Won't Fix see comments CharEncodingForConversion in ParseData's ParseMeta

[jira] [Resolved] (NUTCH-1382) Adding support for EmbeddedSolrServer to SolrIndexer

2013-11-15 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1382. -- Resolution: Won't Fix The SOLR indexer has been replaced with a generic indexing mechanism

[jira] [Resolved] (NUTCH-1668) Remove package org.apache.nutch.indexer.solr

2013-11-18 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1668. -- Resolution: Fixed Committed revision 1543010. Remove package org.apache.nutch.indexer.solr

[jira] [Resolved] (NUTCH-1309) fetch queue management

2013-11-22 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1309. -- Resolution: Incomplete Not clear what the problem or improvement is. Please reopen

[jira] [Commented] (NUTCH-1297) it is better for fetchItemQueues to select items from greater queues first

2013-11-27 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13833588#comment-13833588 ] Julien Nioche commented on NUTCH-1297: -- I think it was long as in 'has many URLs

[jira] [Commented] (NUTCH-1630) How to achieve finishing fetch approximately at the same time for each queue (a.k.a adaptive queue size)

2013-11-27 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13833653#comment-13833653 ] Julien Nioche commented on NUTCH-1630: -- This is a large patch which seems to affect

[jira] [Created] (NUTCH-1676) Add rudimentary SSL support to protocol-http

2013-11-28 Thread Julien Nioche (JIRA)
Julien Nioche created NUTCH-1676: Summary: Add rudimentary SSL support to protocol-http Key: NUTCH-1676 URL: https://issues.apache.org/jira/browse/NUTCH-1676 Project: Nutch Issue Type

[jira] [Updated] (NUTCH-1676) Add rudimentary SSL support to protocol-http

2013-11-28 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1676: - Attachment: NUTCH-1676.patch Add rudimentary SSL support to protocol-http

[jira] [Commented] (NUTCH-656) DeleteDuplicates based on crawlDB only

2013-12-08 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13842518#comment-13842518 ] Julien Nioche commented on NUTCH-656: - Please open a new issue with your patch for 2.x

[jira] [Commented] (NUTCH-1676) Add rudimentary SSL support to protocol-http

2013-12-17 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850516#comment-13850516 ] Julien Nioche commented on NUTCH-1676: -- Have been using this for a few weeks without

[jira] [Commented] (NUTCH-1676) Add rudimentary SSL support to protocol-http

2013-12-18 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851537#comment-13851537 ] Julien Nioche commented on NUTCH-1676: -- Thanks for your comments Markus. Shall we

[jira] [Commented] (NUTCH-1360) Suport the storing of IP address connected to when web crawling

2013-12-23 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13855745#comment-13855745 ] Julien Nioche commented on NUTCH-1360: -- Looks good mate, +1 to commit Suport

[jira] [Commented] (NUTCH-1371) Replace Ivy with Maven Ant tasks

2014-01-07 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864190#comment-13864190 ] Julien Nioche commented on NUTCH-1371: -- Talat, Moving to Maven altogether won't

[jira] [Commented] (NUTCH-1707) DummyIndexingWriter

2014-01-17 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874706#comment-13874706 ] Julien Nioche commented on NUTCH-1707: -- Doesn't https://issues.apache.org/jira/browse

[jira] [Commented] (NUTCH-1707) DummyIndexingWriter

2014-01-17 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13874713#comment-13874713 ] Julien Nioche commented on NUTCH-1707: -- makes sense. We do need a generic way

[jira] [Commented] (NUTCH-1676) Add rudimentary SSL support to protocol-http

2014-01-24 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880935#comment-13880935 ] Julien Nioche commented on NUTCH-1676: -- Hi Markus. Isn't this patch for a different

[jira] [Commented] (NUTCH-1371) Replace Ivy with Maven Ant tasks

2014-02-04 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890616#comment-13890616 ] Julien Nioche commented on NUTCH-1371: -- Hi Talat bq. Actually I have some problems

[jira] [Commented] (NUTCH-710) Support for rel=canonical attribute

2014-02-07 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894304#comment-13894304 ] Julien Nioche commented on NUTCH-710: - Nope. The version tag is more of a reminder

[jira] [Commented] (NUTCH-1707) DummyIndexingWriter

2014-02-10 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13896403#comment-13896403 ] Julien Nioche commented on NUTCH-1707: -- looks fine. +1 DummyIndexingWriter

[jira] [Created] (NUTCH-1729) Upgrade to Tika 1.5

2014-02-20 Thread Julien Nioche (JIRA)
Julien Nioche created NUTCH-1729: Summary: Upgrade to Tika 1.5 Key: NUTCH-1729 URL: https://issues.apache.org/jira/browse/NUTCH-1729 Project: Nutch Issue Type: Task Components

[jira] [Updated] (NUTCH-1729) Upgrade to Tika 1.5

2014-02-20 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1729: - Attachment: NUTCH-1729-2.x.patch patch for 2.x Upgrade to Tika 1.5

[jira] [Resolved] (NUTCH-1729) Upgrade to Tika 1.5

2014-02-21 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1729. -- Resolution: Fixed Upgrade to Tika 1.5 --- Key: NUTCH-1729

[jira] [Commented] (NUTCH-1729) Upgrade to Tika 1.5

2014-02-21 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13908211#comment-13908211 ] Julien Nioche commented on NUTCH-1729: -- Thanks Markus Trunk Committed revision

[jira] [Commented] (NUTCH-1729) Upgrade to Tika 1.5

2014-02-21 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13908229#comment-13908229 ] Julien Nioche commented on NUTCH-1729: -- Not sure it was there in the first place

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-02-28 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915917#comment-13915917 ] Julien Nioche commented on NUTCH-1113: -- Well done, thanks guys! Merging segments

[jira] [Commented] (NUTCH-1736) Can't fetch page if http response header contains Transfer-Encoding:chunked

2014-03-27 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13949513#comment-13949513 ] Julien Nioche commented on NUTCH-1736: -- Looks good and seems to have fixed the issue

[jira] [Created] (NUTCH-1745) Upgrade to ElasticSearch 1.1.0

2014-04-02 Thread Julien Nioche (JIRA)
Julien Nioche created NUTCH-1745: Summary: Upgrade to ElasticSearch 1.1.0 Key: NUTCH-1745 URL: https://issues.apache.org/jira/browse/NUTCH-1745 Project: Nutch Issue Type: Improvement

[jira] [Updated] (NUTCH-1745) Upgrade to ElasticSearch 1.1.0

2014-04-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1745: - Attachment: NUTCH-1745.trunk.patch Upgrade to ElasticSearch 1.1.0

[jira] [Resolved] (NUTCH-351) Protocol forward proxy

2014-04-03 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-351. - Resolution: Won't Fix This issue has received no interest in nearly 8 years. Protocol forward

[jira] [Resolved] (NUTCH-1739) ExecutorService field in ParseUtil.java not be right use and cause memory leak

2014-04-03 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1739. -- Resolution: Not a Problem Marked as not a problem. [~yangshangchuan] please close the issue

[jira] [Resolved] (NUTCH-1745) Upgrade to ElasticSearch 1.1.0

2014-04-04 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1745. -- Resolution: Fixed Fix Version/s: 1.9 Trunk = Committed revision 1584722. Thanks

[jira] [Created] (NUTCH-1747) Use AtomicInteger as semaphore in Fetcher

2014-04-05 Thread Julien Nioche (JIRA)
Julien Nioche created NUTCH-1747: Summary: Use AtomicInteger as semaphore in Fetcher Key: NUTCH-1747 URL: https://issues.apache.org/jira/browse/NUTCH-1747 Project: Nutch Issue Type

[jira] [Updated] (NUTCH-1747) Use AtomicInteger as semaphore in Fetcher

2014-04-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1747: - Attachment: NUTCH-1747-trunk.patch Use AtomicInteger as semaphore in Fetcher

[jira] [Assigned] (NUTCH-207) Bandwidth target for fetcher rather than a thread count

2014-04-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche reassigned NUTCH-207: --- Assignee: Julien Nioche Will see if I can port this patch to the current version

[jira] [Commented] (NUTCH-1735) code dedup fetcher queue redirects

2014-04-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961002#comment-13961002 ] Julien Nioche commented on NUTCH-1735: -- +1 Nice to simplify the code of the Fetcher

[jira] [Commented] (NUTCH-1687) Pick queue in Round Robin

2014-04-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961009#comment-13961009 ] Julien Nioche commented on NUTCH-1687: -- I like the idea but am a bit concerned

[jira] [Resolved] (NUTCH-385) Server delay feature conflicts with maxThreadsPerHost

2014-04-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-385. - Resolution: Not a Problem This is not a problem but a discussion of how things work

[jira] [Updated] (NUTCH-490) Extension point with filters for Neko HTML parser (with patch)

2014-04-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-490: Component/s: (was: fetcher) parser Extension point with filters for Neko HTML

[jira] [Resolved] (NUTCH-1297) it is better for fetchItemQueues to select items from greater queues first

2014-04-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1297. -- Resolution: Won't Fix NUTCH-1687 is a nicer approach + no feedback from original contributor

[jira] [Resolved] (NUTCH-1278) Fetch Improvement in threads per host

2014-04-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1278. -- Resolution: Won't Fix No follow up from contributor + solution proposed quite invasive

[jira] [Updated] (NUTCH-827) HTTP POST Authentication

2014-04-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-827: Component/s: (was: fetcher) protocol HTTP POST Authentication

[jira] [Updated] (NUTCH-1342) Read time out protocol-http

2014-04-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1342: - Component/s: (was: fetcher) protocol Read time out protocol-http

[jira] [Created] (NUTCH-1750) Improvement of Fetcher's reportStatus

2014-04-06 Thread Julien Nioche (JIRA)
Julien Nioche created NUTCH-1750: Summary: Improvement of Fetcher's reportStatus Key: NUTCH-1750 URL: https://issues.apache.org/jira/browse/NUTCH-1750 Project: Nutch Issue Type: Improvement

[jira] [Updated] (NUTCH-1750) Improvement of Fetcher's reportStatus

2014-04-06 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1750: - Attachment: NUTCH-1750.patch Improvement of Fetcher's reportStatus

[jira] [Commented] (NUTCH-1750) Improvement of Fetcher's reportStatus

2014-04-06 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961401#comment-13961401 ] Julien Nioche commented on NUTCH-1750: -- The patch attached improves a few things

[jira] [Reopened] (NUTCH-385) Server delay feature conflicts with maxThreadsPerHost

2014-04-07 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche reopened NUTCH-385: - Reopening as per Chris' comments. Chris, do you want to contribute a better description

[jira] [Closed] (NUTCH-1750) Improvement of Fetcher's reportStatus

2014-04-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-1750. Improvement of Fetcher's reportStatus - Key

[jira] [Resolved] (NUTCH-1750) Improvement of Fetcher's reportStatus

2014-04-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1750. -- Resolution: Fixed Thanks Sebastian Committed revision 1585905. Improvement of Fetcher's

[jira] [Commented] (NUTCH-1676) Add rudimentary SSL support to protocol-http

2014-04-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971452#comment-13971452 ] Julien Nioche commented on NUTCH-1676: -- Hi Markus - any progress on this issue? Would

[jira] [Resolved] (NUTCH-1720) Duplicate lines in HttpBase.java

2014-04-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1720. -- Resolution: Fixed Thanks Walter! Committed revision 1587923. Duplicate lines

[jira] [Commented] (NUTCH-1147) WebGraph nodeDumper uses only 1 reducer

2014-04-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971471#comment-13971471 ] Julien Nioche commented on NUTCH-1147: -- Good idea not to force it to 1 but what about

[jira] [Resolved] (NUTCH-1603) ZIP parser complains about truncated PDF file

2014-04-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1603. -- Resolution: Fixed Committed revision 1587928. ZIP parser complains about truncated PDF file

[jira] [Commented] (NUTCH-1521) CrawlDbFilter pass null url to urlNormailzers

2014-04-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971486#comment-13971486 ] Julien Nioche commented on NUTCH-1521: -- Can we close this one? CrawlDbFilter pass

[jira] [Commented] (NUTCH-1697) SegmentMerger to implement Tool

2014-04-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971477#comment-13971477 ] Julien Nioche commented on NUTCH-1697: -- Hi Markus. Actually it does matter and BTW

[jira] [Resolved] (NUTCH-1743) parsechecker to show outlinks

2014-04-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1743. -- Resolution: Fixed Committed revision 1587935. parsechecker to show outlinks

[jira] [Comment Edited] (NUTCH-1743) parsechecker to show outlinks

2014-04-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971508#comment-13971508 ] Julien Nioche edited comment on NUTCH-1743 at 4/16/14 2:56 PM

[jira] [Commented] (NUTCH-1743) parsechecker to show outlinks

2014-04-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971511#comment-13971511 ] Julien Nioche commented on NUTCH-1743: -- 2-x : Committed revision 1587936

[jira] [Issue Comment Deleted] (NUTCH-1743) parsechecker to show outlinks

2014-04-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1743: - Comment: was deleted (was: Trunk Committed revision 1587935. ) parsechecker to show outlinks

[jira] [Created] (NUTCH-1757) ParserChecker to take custom metadata as input

2014-04-16 Thread Julien Nioche (JIRA)
Julien Nioche created NUTCH-1757: Summary: ParserChecker to take custom metadata as input Key: NUTCH-1757 URL: https://issues.apache.org/jira/browse/NUTCH-1757 Project: Nutch Issue Type

[jira] [Updated] (NUTCH-1757) ParserChecker to take custom metadata as input

2014-04-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1757: - Attachment: NUTCH-1757.patch ParserChecker to take custom metadata as input

[jira] [Created] (NUTCH-1758) IndexChecker to send document to IndexWriters

2014-04-16 Thread Julien Nioche (JIRA)
Julien Nioche created NUTCH-1758: Summary: IndexChecker to send document to IndexWriters Key: NUTCH-1758 URL: https://issues.apache.org/jira/browse/NUTCH-1758 Project: Nutch Issue Type

[jira] [Updated] (NUTCH-1758) IndexChecker to send document to IndexWriters

2014-04-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1758: - Attachment: NUTCH-1758.patch IndexChecker to send document to IndexWriters

[jira] [Commented] (NUTCH-1758) IndexChecker to send document to IndexWriters

2014-04-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971551#comment-13971551 ] Julien Nioche commented on NUTCH-1758: -- The parameter -D doIndex=true must be either

[jira] [Resolved] (NUTCH-1760) Crawl script fails to find job file if called from outside bin dir

2014-04-17 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1760. -- Resolution: Duplicate Crawl script fails to find job file if called from outside bin dir

[jira] [Resolved] (NUTCH-1761) Crawl script fails to find job file if not started from inside bin dir

2014-04-17 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1761. -- Resolution: Fixed Fix Version/s: 1.9 2.3 Thanks David. I have

<    5   6   7   8   9   10   11   12   13   14   >