[
https://issues.apache.org/jira/browse/NUTCH-817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche reassigned NUTCH-817:
---
Assignee: Julien Nioche
parse-(html)does follow links of full html page, parse-(tika) does
[
https://issues.apache.org/jira/browse/NUTCH-710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12859286#action_12859286
]
Julien Nioche commented on NUTCH-710:
-
As suggested previously we could either treat
[
https://issues.apache.org/jira/browse/NUTCH-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856349#action_12856349
]
Julien Nioche commented on NUTCH-808:
-
Hi Enis,
{quote}
On the other hand, current
[
https://issues.apache.org/jira/browse/NUTCH-808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-808:
Fix Version/s: 2.0
Evaluate ORM Frameworks which support non-relational column-oriented
Upgrade to Tika 0.7
---
Key: NUTCH-810
URL: https://issues.apache.org/jira/browse/NUTCH-810
Project: Nutch
Issue Type: Improvement
Components: parser
Affects Versions: 1.0.0
Reporter: Julien Nioche
[
https://issues.apache.org/jira/browse/NUTCH-789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-789:
Component/s: (was: fetcher)
parser
Fix Version/s: (was: 1.1)
Have
[
https://issues.apache.org/jira/browse/NUTCH-810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche closed NUTCH-810.
---
Resolution: Fixed
Committed in rev 931098.
http://issues.apache.org/jira/browse/TIKA-317 changed the
[
https://issues.apache.org/jira/browse/NUTCH-789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853251#action_12853251
]
Julien Nioche commented on NUTCH-789:
-
Will upgrade as soon as 0.7 is available from
Parse-metatags plugin
-
Key: NUTCH-809
URL: https://issues.apache.org/jira/browse/NUTCH-809
Project: Nutch
Issue Type: New Feature
Components: parser
Reporter: Julien Nioche
Assignee:
[
https://issues.apache.org/jira/browse/NUTCH-809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-809:
Attachment: NUTCH-809.patch
Parse-metatags plugin
-
Key:
[
https://issues.apache.org/jira/browse/NUTCH-809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-809:
Attachment: (was: NUTCH-809.patch)
Parse-metatags plugin
-
[
https://issues.apache.org/jira/browse/NUTCH-809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-809:
Attachment: NUTCH-809.patch
Modified version of the plugin which is compatible with parse-tika
[
https://issues.apache.org/jira/browse/NUTCH-809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-809:
Description:
h2. Parse-metatags plugin
The parse-metatags plugin consists of a HTMLParserFilter
[
https://issues.apache.org/jira/browse/NUTCH-706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-706:
Fix Version/s: (was: 1.1)
Both variants of the substitution rule above break existing tests.
[
https://issues.apache.org/jira/browse/NUTCH-779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-779.
-
Resolution: Fixed
Fix Version/s: 1.1
Committed revision 929038.
Thanks Andrzej for your
[
https://issues.apache.org/jira/browse/NUTCH-785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche closed NUTCH-785.
---
Resolution: Fixed
Committed revision 929039
Thanks Andrzej for reviewing it
Fetcher : copy
[
https://issues.apache.org/jira/browse/NUTCH-789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851316#action_12851316
]
Julien Nioche commented on NUTCH-789:
-
Shall we postpone the work on this issue to after
[
https://issues.apache.org/jira/browse/NUTCH-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851545#action_12851545
]
Julien Nioche commented on NUTCH-570:
-
{quote}Julien, want to take this?{quote}
Not
[
https://issues.apache.org/jira/browse/NUTCH-784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche closed NUTCH-784.
---
Resolution: Fixed
Committed revision 928746
CrawlDBScanner
---
Key:
[
https://issues.apache.org/jira/browse/NUTCH-784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-784:
Fix Version/s: 1.1
CrawlDBScanner
---
Key: NUTCH-784
Merge CrawlDBScanner with CrawlDBReader
---
Key: NUTCH-806
URL: https://issues.apache.org/jira/browse/NUTCH-806
Project: Nutch
Issue Type: Improvement
Reporter: Julien Nioche
[
https://issues.apache.org/jira/browse/NUTCH-783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-783:
Fix Version/s: (was: 1.1)
Removed tag 1.1
Will rename to IndexingPluginsChecker later
[
https://issues.apache.org/jira/browse/NUTCH-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850912#action_12850912
]
Julien Nioche commented on NUTCH-785:
-
Could anyone please review this issue? I would
[
https://issues.apache.org/jira/browse/NUTCH-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850915#action_12850915
]
Julien Nioche commented on NUTCH-779:
-
Could anyone please review this issue? I would
[
https://issues.apache.org/jira/browse/NUTCH-776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-776:
Fix Version/s: (was: 1.1)
Moving this issue post 1.1
Needs a patch file, some description of
[
https://issues.apache.org/jira/browse/NUTCH-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche closed NUTCH-740.
---
Resolution: Fixed
Assignee: Julien Nioche
Committed in rev 926003
Thanks Marcin for
[
https://issues.apache.org/jira/browse/NUTCH-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-762:
Fix Version/s: 1.1
Alternative Generator which can generate several segments in one parse of the
[
https://issues.apache.org/jira/browse/NUTCH-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-762:
Attachment: NUTCH-762-v3.patch
new patch which reintroduces the 'generator.update.crawldb'
[
https://issues.apache.org/jira/browse/NUTCH-762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848095#action_12848095
]
Julien Nioche commented on NUTCH-762:
-
{quote}
I just noticed that the new Generator
[
https://issues.apache.org/jira/browse/NUTCH-762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848140#action_12848140
]
Julien Nioche commented on NUTCH-762:
-
The change of prefix also reflected that we now
[
https://issues.apache.org/jira/browse/NUTCH-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche closed NUTCH-762.
---
Resolution: Fixed
Committed revision 926155
Have reverted the prefix for params to 'generate.' +
[
https://issues.apache.org/jira/browse/NUTCH-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-740:
Attachment: NUTCH-740.patch
Slightly modified version of the patch with modifs for protocol-http.
[
https://issues.apache.org/jira/browse/NUTCH-762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846910#action_12846910
]
Julien Nioche commented on NUTCH-762:
-
OK, there was indeed an assumption that the
[
https://issues.apache.org/jira/browse/NUTCH-762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846930#action_12846930
]
Julien Nioche commented on NUTCH-762:
-
Yes, I came across that situation too on a large
[
https://issues.apache.org/jira/browse/NUTCH-469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-469:
Fix Version/s: (was: 1.1)
There has not been any changes to this issue since February 09 and it
[
https://issues.apache.org/jira/browse/NUTCH-740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845886#action_12845886
]
Julien Nioche commented on NUTCH-740:
-
A nice contribution but should not this be
[
https://issues.apache.org/jira/browse/NUTCH-762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846141#action_12846141
]
Julien Nioche commented on NUTCH-762:
-
If I am not mistaken the point of having
[
https://issues.apache.org/jira/browse/NUTCH-710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-710:
Fix Version/s: (was: 1.1)
Great idea. Won't be included in 1.1 though so moving to *fix :
[
https://issues.apache.org/jira/browse/NUTCH-692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-692.
-
Resolution: Cannot Reproduce
Fix Version/s: 1.1
I cannot reproduce the issue since we
[
https://issues.apache.org/jira/browse/NUTCH-798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-798.
-
Resolution: Fixed
Updated SOLRJ's dependencies at the same time :
Deleting
[
https://issues.apache.org/jira/browse/NUTCH-801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-801.
-
Resolution: Fixed
Committed revision 921840.
Remove RTF and MP3 parse plugins
[
https://issues.apache.org/jira/browse/NUTCH-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-762:
Attachment: (was: NUTCH-762-MultiGenerator.patch)
Alternative Generator which can generate
[
https://issues.apache.org/jira/browse/NUTCH-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-762:
Attachment: NUTCH-762-v2.patch
Improved version of the patch :
- fixed a few minor bugs
- renamed
SOLRIndexer to commit once all reducers have finished
-
Key: NUTCH-799
URL: https://issues.apache.org/jira/browse/NUTCH-799
Project: Nutch
Issue Type: Improvement
Components:
[
https://issues.apache.org/jira/browse/NUTCH-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-799:
Attachment: NUTCH-799.patch
SOLRIndexer to commit once all reducers have finished
[
https://issues.apache.org/jira/browse/NUTCH-782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche closed NUTCH-782.
---
Resolution: Fixed
Committed revision 917557
Ability to order htmlparsefilters
Upgrade to SOLR1.4
--
Key: NUTCH-798
URL: https://issues.apache.org/jira/browse/NUTCH-798
Project: Nutch
Issue Type: Improvement
Components: indexer
Reporter: Julien Nioche
Fix For: 1.1
[
https://issues.apache.org/jira/browse/NUTCH-719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-719.
-
Resolution: Fixed
Fix Version/s: 1.1
Committed revision 911905.
Thanks to S. Dennis for
[
https://issues.apache.org/jira/browse/NUTCH-719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche closed NUTCH-719.
---
fetchQueues.totalSize incorrect in Fetcher2
---
[
https://issues.apache.org/jira/browse/NUTCH-705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-705.
-
Resolution: Fixed
RTF parsing is now handled by the TikaPlugin (NUTCH-766). Please open an issue
[
https://issues.apache.org/jira/browse/NUTCH-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-644.
-
Resolution: Fixed
RTF parsing is now handled by the TikaPlugin (NUTCH-766) which solves the issue
Tika parser does not keep attributes on html tag
Key: NUTCH-794
URL: https://issues.apache.org/jira/browse/NUTCH-794
Project: Nutch
Issue Type: Bug
Reporter: Julien Nioche
[
https://issues.apache.org/jira/browse/NUTCH-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-794:
Description:
The following HTML document :
html lang=fiheaddocument 1 title/headbodyjotain
[
https://issues.apache.org/jira/browse/NUTCH-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-794:
Attachment: NUTCH-794.patch
Tika parser does identify lang attributes on html tag
[
https://issues.apache.org/jira/browse/NUTCH-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834147#action_12834147
]
Julien Nioche commented on NUTCH-794:
-
Committed patch in revision 910454
Waiting for
[
https://issues.apache.org/jira/browse/NUTCH-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-794:
Summary: Language Identification must use check the parse metadata for
language values (was: Tika
[
https://issues.apache.org/jira/browse/NUTCH-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on NUTCH-794 started by Julien Nioche.
Language Identification must use check the parse metadata for language values
[
https://issues.apache.org/jira/browse/NUTCH-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-794:
Component/s: parser
Language Identification must use check the parse metadata for language values
[
https://issues.apache.org/jira/browse/NUTCH-782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-782:
Component/s: parser
Ability to order htmlparsefilters
-
[
https://issues.apache.org/jira/browse/NUTCH-766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche closed NUTCH-766.
---
Have added small improvement in revision 910187 (Prioritise default Tika parser
when discovering plugins
[
https://issues.apache.org/jira/browse/NUTCH-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832454#action_12832454
]
Julien Nioche commented on NUTCH-766:
-
@Chris : I just did a fresh co from svn, applied
[
https://issues.apache.org/jira/browse/NUTCH-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832564#action_12832564
]
Julien Nioche commented on NUTCH-766:
-
I had a closer look at the HTML parsing issue.
[
https://issues.apache.org/jira/browse/NUTCH-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832564#action_12832564
]
Julien Nioche edited comment on NUTCH-766 at 2/11/10 5:22 PM:
--
[
https://issues.apache.org/jira/browse/NUTCH-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832583#action_12832583
]
Julien Nioche commented on NUTCH-766:
-
@Chris : did you do
ant -f
[
https://issues.apache.org/jira/browse/NUTCH-787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-787:
Fix Version/s: 1.1
Upgrade Lucene to 3.0.0.
Key:
Better list of suffix domains
-
Key: NUTCH-786
URL: https://issues.apache.org/jira/browse/NUTCH-786
Project: Nutch
Issue Type: Improvement
Affects Versions: 1.0.0
Reporter: Julien Nioche
[
https://issues.apache.org/jira/browse/NUTCH-786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-786:
Attachment: NUTCH-786.patch
Small improvement to the content of domain-suffixes.xml : added
[
https://issues.apache.org/jira/browse/NUTCH-786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche closed NUTCH-786.
---
Resolution: Fixed
Committed revision 906907
Better list of suffix domains
[
https://issues.apache.org/jira/browse/NUTCH-781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828548#action_12828548
]
Julien Nioche commented on NUTCH-781:
-
did you forgot to update conf/tika-mimetypes.xml
Update Tika to v0.6 for the MimeType detection
---
Key: NUTCH-781
URL: https://issues.apache.org/jira/browse/NUTCH-781
Project: Nutch
Issue Type: Improvement
Reporter: Julien Nioche
[
https://issues.apache.org/jira/browse/NUTCH-781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-781.
-
Resolution: Fixed
Committed revision 905228
Update Tika to v0.6 for the MimeType detection
[
https://issues.apache.org/jira/browse/NUTCH-781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche closed NUTCH-781.
---
Update Tika to v0.6 for the MimeType detection
---
[
https://issues.apache.org/jira/browse/NUTCH-766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-766:
Attachment: NUTCH-766-v3.patch
Updated version of the plugin : uses Tika 0.6
Tika parser
[
https://issues.apache.org/jira/browse/NUTCH-766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-766:
Attachment: (was: Nutch-766.ParserFactory.patch)
Tika parser
---
[
https://issues.apache.org/jira/browse/NUTCH-766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-766:
Attachment: (was: NUTCH-766.tika.patch)
Tika parser
---
Key:
Ability to order htmlparsefilters
-
Key: NUTCH-782
URL: https://issues.apache.org/jira/browse/NUTCH-782
Project: Nutch
Issue Type: New Feature
Reporter: Julien Nioche
Assignee: Julien
[
https://issues.apache.org/jira/browse/NUTCH-782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-782:
Attachment: NUTCH-782.patch
Ability to order htmlparsefilters
-
IndexerChecker Utilty
-
Key: NUTCH-783
URL: https://issues.apache.org/jira/browse/NUTCH-783
Project: Nutch
Issue Type: New Feature
Components: indexer
Reporter: Julien Nioche
Fix For:
[
https://issues.apache.org/jira/browse/NUTCH-783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche reassigned NUTCH-783:
---
Assignee: Julien Nioche
IndexerChecker Utilty
-
Key:
[
https://issues.apache.org/jira/browse/NUTCH-783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-783:
Attachment: NUTCH-783.patch
IndexerChecker Utilty
-
Key:
[
https://issues.apache.org/jira/browse/NUTCH-779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche reassigned NUTCH-779:
---
Assignee: Julien Nioche
Mechanism for passing metadata from parse to crawldb
[
https://issues.apache.org/jira/browse/NUTCH-779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-779:
Attachment: NUTCH-779-v2.patch
Improved version of the patch. Followed AB's recommendations and
CrawlDBScanner
---
Key: NUTCH-784
URL: https://issues.apache.org/jira/browse/NUTCH-784
Project: Nutch
Issue Type: New Feature
Reporter: Julien Nioche
Assignee: Julien Nioche
Attachments:
[
https://issues.apache.org/jira/browse/NUTCH-784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-784:
Attachment: NUTCH-784.patch
CrawlDBScanner
---
Key: NUTCH-784
Fetcher : copy metadata from origin URL when redirecting + call
scfilters.initialScore on newly created URL
---
Key: NUTCH-785
URL:
[
https://issues.apache.org/jira/browse/NUTCH-785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-785:
Attachment: NUTCH-785.patch
Fetcher : copy metadata from origin URL when redirecting + call
[
https://issues.apache.org/jira/browse/NUTCH-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805892#action_12805892
]
Julien Nioche commented on NUTCH-766:
-
Here is a slightly better version of the patch
[
https://issues.apache.org/jira/browse/NUTCH-766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-766:
Attachment: NUTCH-766.v2
sample.tar.gz
new version of the patch + archive
[
https://issues.apache.org/jira/browse/NUTCH-778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-778.
-
Resolution: Invalid
Fix Version/s: (was: 1.0.0)
This is likely to be a problem with
[
https://issues.apache.org/jira/browse/NUTCH-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803670#action_12803670
]
Julien Nioche commented on NUTCH-766:
-
I think the end result of this plugin should be
[
https://issues.apache.org/jira/browse/NUTCH-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12802172#action_12802172
]
Julien Nioche commented on NUTCH-779:
-
The property needs some documentation in
Mechanism for passing metadata from parse to crawldb
Key: NUTCH-779
URL: https://issues.apache.org/jira/browse/NUTCH-779
Project: Nutch
Issue Type: New Feature
Reporter:
[
https://issues.apache.org/jira/browse/NUTCH-779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-779:
Attachment: NUTCH-779
Mechanism for passing metadata from parse to crawldb
[
https://issues.apache.org/jira/browse/NUTCH-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche closed NUTCH-767.
---
Resolution: Fixed
Committed revision 897825
Update Tika to v0.5 for the MimeType detection
[
https://issues.apache.org/jira/browse/NUTCH-751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-751.
-
Resolution: Later
The changes in the underlying API are quite substantial and this would need a
[
https://issues.apache.org/jira/browse/NUTCH-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12798727#action_12798727
]
Julien Nioche commented on NUTCH-766:
-
Hi Chris,
No worries, I'd rather wait for you
[
https://issues.apache.org/jira/browse/NUTCH-269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche reassigned NUTCH-269:
---
Assignee: Julien Nioche
CrawlDbReducer: OOME because no upper-bound on inlinks count
[
https://issues.apache.org/jira/browse/NUTCH-269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797990#action_12797990
]
Julien Nioche commented on NUTCH-269:
-
I will shortly commit a variant of this approach
[
https://issues.apache.org/jira/browse/NUTCH-269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-269.
-
Resolution: Fixed
Fix Version/s: 1.1
Committed revision 897180
CrawlDbReducer: OOME
[
https://issues.apache.org/jira/browse/NUTCH-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797653#action_12797653
]
Julien Nioche commented on NUTCH-776:
-
Did you notice any improvement in the fetch rate
1 - 100 of 188 matches
Mail list logo