[
https://issues.apache.org/jira/browse/NUTCH-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171725#comment-15171725
]
Tien Nguyen Manh commented on NUTCH-2236:
-
No problem, just to make it run on Hadoop 2.7.1
>
[
https://issues.apache.org/jira/browse/NUTCH-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171264#comment-15171264
]
Tien Nguyen Manh commented on NUTCH-2234:
-
elasticsearch 2.1.1 use httpclient 4.3.6
> Upgrade to
[
https://issues.apache.org/jira/browse/NUTCH-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tien Nguyen Manh updated NUTCH-2236:
Attachment: NUTCH-2236.patch
I run Nutch 1.11 on Hadoop 2.7.1 with this patch.
We also need
Tien Nguyen Manh created NUTCH-2236:
---
Summary: Upgrade to Hadoop 2.7.1
Key: NUTCH-2236
URL: https://issues.apache.org/jira/browse/NUTCH-2236
Project: Nutch
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/NUTCH-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tien Nguyen Manh updated NUTCH-2234:
Attachment: NUTCH-2234.patch
> Upgrade to elasticsearch 2.1.1
>
[
https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tien Nguyen Manh updated NUTCH-1687:
Attachment: NUTCH-1687-2.patch
Here it is:
I update my initial patch for version 1.11.
I
[
https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tien Nguyen Manh updated NUTCH-1687:
Attachment: (was: NUTCH-1687-2.patch)
> Pick queue in Round Robin
>
[
https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tien Nguyen Manh updated NUTCH-1687:
Comment: was deleted
(was: I update my initial patch for ver 1.11.
I crawl large number of
[
https://issues.apache.org/jira/browse/NUTCH-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tien Nguyen Manh updated NUTCH-2234:
Attachment: (was: NUTCH-2234.patch)
> Upgrade to elasticsearch 2.1.1
>
[
https://issues.apache.org/jira/browse/NUTCH-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tien Nguyen Manh updated NUTCH-2234:
Attachment: NUTCH-2234.patch
> Upgrade to elasticsearch 2.1.1
>
Tien Nguyen Manh created NUTCH-2234:
---
Summary: Upgrade to elasticsearch 2.1.1
Key: NUTCH-2234
URL: https://issues.apache.org/jira/browse/NUTCH-2234
Project: Nutch
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tien Nguyen Manh updated NUTCH-1687:
Attachment: NUTCH-1687-2.patch
I update my initial patch for ver 1.11.
I crawl large number
[
https://issues.apache.org/jira/browse/NUTCH-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tien Nguyen Manh updated NUTCH-2225:
Attachment: NUTCH-2225.patch
> Parsed time not include time to parse
>
Tien Nguyen Manh created NUTCH-2225:
---
Summary: Parsed time not include time to parse
Key: NUTCH-2225
URL: https://issues.apache.org/jira/browse/NUTCH-2225
Project: Nutch
Issue Type: Bug
[
https://issues.apache.org/jira/browse/NUTCH-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tien Nguyen Manh updated NUTCH-2224:
Attachment: NUTCH-2224.patch
> Wrong metric compute in Fetcher status report
>
Tien Nguyen Manh created NUTCH-2224:
---
Summary: Wrong metric compute in Fetcher status report
Key: NUTCH-2224
URL: https://issues.apache.org/jira/browse/NUTCH-2224
Project: Nutch
Issue
[
https://issues.apache.org/jira/browse/NUTCH-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tien Nguyen Manh updated NUTCH-2223:
Attachment: NUTCH-2223.patch
Patch for nutch 1.11
> Upgrade xercesImpl to 2.11.0 to fix
[
https://issues.apache.org/jira/browse/NUTCH-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tien Nguyen Manh updated NUTCH-2223:
Fix Version/s: 1.13
> Upgrade xercesImpl to 2.11.0 to fix hang on issue in tika mimetype
Tien Nguyen Manh created NUTCH-2223:
---
Summary: Upgrade xercesImpl to 2.11.0 to fix hang on issue in tika
mimetype detection
Key: NUTCH-2223
URL: https://issues.apache.org/jira/browse/NUTCH-2223
[
https://issues.apache.org/jira/browse/NUTCH-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tien Nguyen Manh updated NUTCH-2223:
Fix Version/s: (was: 1.13)
> Upgrade xercesImpl to 2.11.0 to fix hang on issue in tika
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15117020#comment-15117020
]
Tien Nguyen Manh commented on NUTCH-961:
Can NUTCH-1233: use tika to extract outlink solve that
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116772#comment-15116772
]
Tien Nguyen Manh edited comment on NUTCH-961 at 1/26/16 6:57 AM:
-
AH yes,
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116772#comment-15116772
]
Tien Nguyen Manh commented on NUTCH-961:
AH yes, Could you explain why we need to parse it twice?
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114658#comment-15114658
]
Tien Nguyen Manh commented on NUTCH-961:
One note with boilerpipe support, it is significant slower
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110217#comment-15110217
]
Tien Nguyen Manh commented on NUTCH-961:
i'm using this patch NUTCH-961-1.11-1.patch, it works fine
[
https://issues.apache.org/jira/browse/NUTCH-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tien Nguyen Manh updated NUTCH-1679:
Attachment: NUTCH-1679-2.patch
I have another solution.
With a new link in DbUpdaterReducer
Tien Nguyen Manh created NUTCH-1702:
---
Summary: Port HostNormalizer to 2.x
Key: NUTCH-1702
URL: https://issues.apache.org/jira/browse/NUTCH-1702
Project: Nutch
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/NUTCH-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tien Nguyen Manh updated NUTCH-1702:
Attachment: NUTCH-1702.patch
Port HostNormalizer to 2.x
--
[
https://issues.apache.org/jira/browse/NUTCH-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tien Nguyen Manh updated NUTCH-1702:
Fix Version/s: 2.3
Port HostNormalizer to 2.x
--
[
https://issues.apache.org/jira/browse/NUTCH-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tien Nguyen Manh updated NUTCH-1702:
Attachment: NUTCH-1702.patch
Port HostNormalizer to 2.x
--
[
https://issues.apache.org/jira/browse/NUTCH-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tien Nguyen Manh updated NUTCH-1702:
Attachment: (was: NUTCH-1702.patch)
Port HostNormalizer to 2.x
[
https://issues.apache.org/jira/browse/NUTCH-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tien Nguyen Manh updated NUTCH-1704:
Attachment: NUTCH-1704.patch
Port DomainBlacklist urlfilter to 2.x
[
https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tien Nguyen Manh updated NUTCH-1478:
Attachment: NUTCH-1478-parse-v2.patch
i port parse-metatags to 2.x, this patch support
[
https://issues.apache.org/jira/browse/NUTCH-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tien Nguyen Manh updated NUTCH-1705:
Attachment: NUTCH-1705.patch
Make configuration option for HtmlParser TikaParser to
Tien Nguyen Manh created NUTCH-1705:
---
Summary: Make configuration option for HtmlParser TikaParser to
extract text or title for noIndex page
Key: NUTCH-1705
URL:
Tien Nguyen Manh created NUTCH-1701:
---
Summary: Make Solr Document Boost as an option
Key: NUTCH-1701
URL: https://issues.apache.org/jira/browse/NUTCH-1701
Project: Nutch
Issue Type:
[
https://issues.apache.org/jira/browse/NUTCH-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tien Nguyen Manh updated NUTCH-1701:
Fix Version/s: 1.8
2.3
Make Solr Document Boost as an option
[
https://issues.apache.org/jira/browse/NUTCH-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tien Nguyen Manh updated NUTCH-1701:
Attachment: NUTCH-1701-2x.patch
Make Solr Document Boost as an option
[
https://issues.apache.org/jira/browse/NUTCH-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861142#comment-13861142
]
Tien Nguyen Manh commented on NUTCH-1686:
-
In this patch i also fixed an bug with
[
https://issues.apache.org/jira/browse/NUTCH-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tien Nguyen Manh updated NUTCH-1693:
Issue Type: New Feature (was: Bug)
TextMD5Signatue compute on textual content
[
https://issues.apache.org/jira/browse/NUTCH-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tien Nguyen Manh updated NUTCH-1693:
Fix Version/s: 2.3
TextMD5Signatue compute on textual content
[
https://issues.apache.org/jira/browse/NUTCH-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861195#comment-13861195
]
Tien Nguyen Manh commented on NUTCH-1693:
-
this patch only work with a minor
[
https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859364#comment-13859364
]
Tien Nguyen Manh commented on NUTCH-1687:
-
It is nice!
Pick queue in Round Robin
[
https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tien Nguyen Manh updated NUTCH-1687:
Attachment: NUTCH-1687.patch
add Apache Header
fixed lost tail pointer when deleting
[
https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tien Nguyen Manh updated NUTCH-1687:
Attachment: (was: NUTCH-1687.patch)
Pick queue in Round Robin
45 matches
Mail list logo