[jira] [Commented] (NUTCH-1854) ./bin/crawl fails with a parsing fetcher

2015-04-08 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485299#comment-14485299 ] lufeng commented on NUTCH-1854: --- if we set fetcher.store.content=false and

[jira] [Commented] (NUTCH-1939) Fetcher fails to follow redirects

2015-02-10 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315374#comment-14315374 ] lufeng commented on NUTCH-1939: --- Hi Sebastian One question. How do you use the FetchItem

[jira] [Comment Edited] (NUTCH-1939) Fetcher fails to follow redirects

2015-02-10 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315374#comment-14315374 ] lufeng edited comment on NUTCH-1939 at 2/11/15 2:16 AM: I think

[jira] [Commented] (NUTCH-1829) Generator : unable to distinguish real errors

2014-08-25 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110193#comment-14110193 ] lufeng commented on NUTCH-1829: --- yes, I think we should distinguish different return result

[jira] [Commented] (NUTCH-385) Improve description of thread related configuration for Fetcher

2014-06-26 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045525#comment-14045525 ] lufeng commented on NUTCH-385: -- Hi Julien I see the description of fetcher.threads.per.queue

[jira] [Commented] (NUTCH-1785) Ability to index raw content

2014-05-28 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010889#comment-14010889 ] lufeng commented on NUTCH-1785: --- +1 elasticsearch 1.2.0 test ok. one question is why

[jira] [Closed] (NUTCH-1521) CrawlDbFilter pass null url to urlNormailzers

2014-04-16 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng closed NUTCH-1521. - Resolution: Fixed Fix Version/s: (was: 2.4) 1.9 CrawlDbFilter pass null url to

[jira] [Commented] (NUTCH-1726) HeadingsFilter does not find nested nodes

2014-04-15 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969601#comment-13969601 ] lufeng commented on NUTCH-1726: --- Hi all, Can someone free to check this patch? thanks.

[jira] [Commented] (NUTCH-1752) cache robots.txt rules per protocol:host:port

2014-04-09 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13964219#comment-13964219 ] lufeng commented on NUTCH-1752: --- Do you mean different port with same protocol and host has

[jira] [Commented] (NUTCH-1733) parse-html to support HTML5 charset definitions

2014-03-18 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938867#comment-13938867 ] lufeng commented on NUTCH-1733: --- +1 pass all tests parse-html to support HTML5 charset

[jira] [Commented] (NUTCH-1736) Can't fetch page if http response header contains Transfer-Encoding:chunked

2014-03-16 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937426#comment-13937426 ] lufeng commented on NUTCH-1736: --- Hi ysc you can check the content size to fix this issue

[jira] [Commented] (NUTCH-1726) HeadingsFilter does not find nested nodes

2014-02-24 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910355#comment-13910355 ] lufeng commented on NUTCH-1726: --- Hi Markus It seems that HeadingsFilter does not find

[jira] [Comment Edited] (NUTCH-1726) HeadingsFilter does not find nested nodes

2014-02-24 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910355#comment-13910355 ] lufeng edited comment on NUTCH-1726 at 2/24/14 2:41 PM: Hi Markus

[jira] [Commented] (NUTCH-1726) HeadingsFilter does not find nested nodes

2014-02-13 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900432#comment-13900432 ] lufeng commented on NUTCH-1726: --- Hi Markus. But I didn't find any error using your newest

[jira] [Updated] (NUTCH-1726) HeadingsFilter does not find nested nodes

2014-02-12 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1726: -- Attachment: NUTCH-1726-trunk-v2.patch add a test case to check HeadingsFilter patch. :) HeadingsFilter does

[jira] [Commented] (NUTCH-1691) DomainBlacklist url filter does not allow -D filter file override

2014-01-03 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861502#comment-13861502 ] lufeng commented on NUTCH-1691: --- like urlfilter-prefix plugin, we can move WARN code to

[jira] [Commented] (NUTCH-1667) Updatedb always ignore batchId

2013-11-22 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830525#comment-13830525 ] lufeng commented on NUTCH-1667: --- yes, u are right. +1 Updatedb always ignore batchId

[jira] [Commented] (NUTCH-1671) indexchecker to add digest field

2013-11-22 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830530#comment-13830530 ] lufeng commented on NUTCH-1671: --- yes, this field can be used by indexing filters. +1

[jira] [Created] (NUTCH-1670) set same crawldb directory in mergedb parameter

2013-11-20 Thread lufeng (JIRA)
lufeng created NUTCH-1670: - Summary: set same crawldb directory in mergedb parameter Key: NUTCH-1670 URL: https://issues.apache.org/jira/browse/NUTCH-1670 Project: Nutch Issue Type: Bug

[jira] [Updated] (NUTCH-1670) set same crawldb directory in mergedb parameter

2013-11-20 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1670: -- Attachment: NUTCH-1670.patch set same crawldb directory in mergedb parameter

[jira] [Work started] (NUTCH-1670) set same crawldb directory in mergedb parameter

2013-11-20 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-1670 started by lufeng. set same crawldb directory in mergedb parameter --- Key: NUTCH-1670

[jira] [Commented] (NUTCH-1651) modifiedTime and prevmodifiedTime never set

2013-11-04 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812840#comment-13812840 ] lufeng commented on NUTCH-1651: --- Hi Lewis yes, the patch is ok, and this a way to set

[jira] [Commented] (NUTCH-1651) modifiedTime and prevmodifiedTime never set

2013-10-30 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809081#comment-13809081 ] lufeng commented on NUTCH-1651: --- Hi Talat yes, u are right, lastModified is a fetch

[jira] [Commented] (NUTCH-1651) modifiedTime and prevmodifiedTime never set

2013-10-29 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808045#comment-13808045 ] lufeng commented on NUTCH-1651: --- Hi Talat but I think get last modified from header is not

[jira] [Updated] (NUTCH-1645) Junit Test Case for Adaptive Fetch Schedule class

2013-10-28 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1645: -- Attachment: NUTCH-1645-v3.patch 1. add an implementation of reaches a lower number of misses would cause the

[jira] [Updated] (NUTCH-1645) Junit Test Case for Adaptive Fetch Schedule class

2013-10-06 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1645: -- Attachment: NUTCH-1645-v2.patch add two test case, one is use default parameters and another without open sync

[jira] [Commented] (NUTCH-1650) Adaptive Fetch Scheduler interval Wrong Set

2013-10-06 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787664#comment-13787664 ] lufeng commented on NUTCH-1650: --- yes , this code in Nutch 1.x is correct. +1 Adaptive

[jira] [Commented] (NUTCH-1556) enabling updatedb to accept batchId

2013-09-12 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13765410#comment-13765410 ] lufeng commented on NUTCH-1556: --- oh, I'm so sorry, I already fixed this problem. commit

[jira] [Commented] (NUTCH-1636) Indexer to normalize and filter repr URL

2013-09-09 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13761888#comment-13761888 ] lufeng commented on NUTCH-1636: --- yes, this patch can solve the issue reported by lain. +1

[jira] [Commented] (NUTCH-1556) enabling updatedb to accept batchId

2013-09-05 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13759123#comment-13759123 ] lufeng commented on NUTCH-1556: --- Committed revision 1520332 in 2.x HEAD Thanks kaveh.

[jira] [Resolved] (NUTCH-1556) enabling updatedb to accept batchId

2013-09-05 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng resolved NUTCH-1556. --- Resolution: Fixed enabling updatedb to accept batchId

[jira] [Commented] (NUTCH-1556) enabling updatedb to accept batchId

2013-09-02 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756080#comment-13756080 ] lufeng commented on NUTCH-1556: --- I will commit this unless there are objections

[jira] [Commented] (NUTCH-1556) enabling updatedb to accept batchId

2013-08-28 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752432#comment-13752432 ] lufeng commented on NUTCH-1556: --- thanks kaveh. +1 enabling updatedb to

[jira] [Updated] (NUTCH-1556) enabling updatedb to accept batchId

2013-08-27 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1556: -- Attachment: NUTCH-1556-v2.patch new patch merged with issue 1632 enabling updatedb to accept

[jira] [Created] (NUTCH-1632) add batchId argument for DbUpdaterJob

2013-08-26 Thread lufeng (JIRA)
lufeng created NUTCH-1632: - Summary: add batchId argument for DbUpdaterJob Key: NUTCH-1632 URL: https://issues.apache.org/jira/browse/NUTCH-1632 Project: Nutch Issue Type: Improvement

[jira] [Updated] (NUTCH-1632) add batchId argument for DbUpdaterJob

2013-08-26 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1632: -- Attachment: NUTCH-1632.patch add batchId argument for DbUpdaterJob -

[jira] [Closed] (NUTCH-1632) add batchId argument for DbUpdaterJob

2013-08-26 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng closed NUTCH-1632. - Resolution: Duplicate add batchId argument for DbUpdaterJob -

[jira] [Commented] (NUTCH-1556) enabling updatedb to accept batchId

2013-08-26 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750803#comment-13750803 ] lufeng commented on NUTCH-1556: --- Hi Lewis, I'm sorry, I generate a duplicate issue. I will

[jira] [Commented] (NUTCH-1632) add batchId argument for DbUpdaterJob

2013-08-26 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750804#comment-13750804 ] lufeng commented on NUTCH-1632: --- Hi kaveh, I'm sorry and I will close this issue and merge

[jira] [Commented] (NUTCH-1619) Writes Dmoz Description and Title information to db with snippet argument

2013-08-25 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749663#comment-13749663 ] lufeng commented on NUTCH-1619: --- Hi Julien,I have already fixed the compilation bug, and I

[jira] [Commented] (NUTCH-1619) Writes Dmoz Description and Title information to db with snippet argument

2013-08-24 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749409#comment-13749409 ] lufeng commented on NUTCH-1619: --- Committed @revision 1517147 in 2.x HEAD Thank you very much

[jira] [Resolved] (NUTCH-1619) Writes Dmoz Description and Title information to db with snippet argument

2013-08-24 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng resolved NUTCH-1619. --- Resolution: Fixed Writes Dmoz Description and Title information to db with snippet argument

[jira] [Commented] (NUTCH-1619) Writes Dmoz Description and Title information to db with snippet argument

2013-08-24 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749419#comment-13749419 ] lufeng commented on NUTCH-1619: --- I'm so sorry, DataStore may not throw IOException. It has

[jira] [Commented] (NUTCH-1631) Display Document Count Added To Solr Server

2013-08-23 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13748595#comment-13748595 ] lufeng commented on NUTCH-1631: --- Good statistical methods. +1 Display

[jira] [Commented] (NUTCH-1619) Writes Dmoz Description and Title information to db with snippet argument

2013-08-22 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747558#comment-13747558 ] lufeng commented on NUTCH-1619: --- Thanks Talat. +1 for commit. Writes Dmoz

[jira] [Commented] (NUTCH-1619) Writes Dmoz Description and Title information to db with snippet argument

2013-08-19 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13743621#comment-13743621 ] lufeng commented on NUTCH-1619: --- Hi Yasin, Do you forget to close the data store? good.

[jira] [Commented] (NUTCH-1294) IndexClean job with solr implementation.

2013-08-14 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739731#comment-13739731 ] lufeng commented on NUTCH-1294: --- Hi Lewis. Very pleasure. But What can I do something for

[jira] [Commented] (NUTCH-1613) Timeouts in protocol-httpclient when crawling same host with 2 threads and added cookie strings for both http protocols

2013-07-21 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13714701#comment-13714701 ] lufeng commented on NUTCH-1613: --- ok, Does this cookie will effect other urls that these urls

[jira] [Commented] (NUTCH-1613) Timeouts in protocol-httpclient when crawling same host with 2 threads and added cookie strings for both http protocols

2013-07-17 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13711150#comment-13711150 ] lufeng commented on NUTCH-1613: --- Does this specified cookie string will effect all crawling

[jira] [Commented] (NUTCH-1602) improve the readability of metadata in readdb dump normal

2013-07-04 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13700082#comment-13700082 ] lufeng commented on NUTCH-1602: --- Hi Markus, this output format only used in *normal* output

[jira] [Resolved] (NUTCH-1602) improve the readability of metadata in readdb dump normal

2013-07-04 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng resolved NUTCH-1602. --- Resolution: Fixed improve the readability of metadata in readdb dump normal

[jira] [Commented] (NUTCH-1600) Injector overwrite does not always work properly

2013-07-03 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13699034#comment-13699034 ] lufeng commented on NUTCH-1600: --- test work fine. +1 Injector overwrite

[jira] [Updated] (NUTCH-1602) improve the readability of metadata in readdb dump normal

2013-07-03 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1602: -- Attachment: NUTCH-1602.patch improve the readability of metadata in readdb dump normal

[jira] [Created] (NUTCH-1602) improve the readability of metadata in readdb dump normal

2013-07-03 Thread lufeng (JIRA)
lufeng created NUTCH-1602: - Summary: improve the readability of metadata in readdb dump normal Key: NUTCH-1602 URL: https://issues.apache.org/jira/browse/NUTCH-1602 Project: Nutch Issue Type:

[jira] [Commented] (NUTCH-1594) count variable is never changed in ParseUtil class

2013-07-01 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13696798#comment-13696798 ] lufeng commented on NUTCH-1594: --- Committed @revision 1498437 in 2.x HEAD. Thanks Canan and

[jira] [Commented] (NUTCH-1327) QueryStringNormalizer

2013-07-01 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13696854#comment-13696854 ] lufeng commented on NUTCH-1327: --- Hi Markus, I tested you patch, Do you forget to add deploy

[jira] [Created] (NUTCH-1594) count variable is never in ParseUtil

2013-06-29 Thread lufeng (JIRA)
lufeng created NUTCH-1594: - Summary: count variable is never in ParseUtil Key: NUTCH-1594 URL: https://issues.apache.org/jira/browse/NUTCH-1594 Project: Nutch Issue Type: Bug Components:

[jira] [Updated] (NUTCH-1594) count variable is never changed in ParseUtil class

2013-06-29 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1594: -- Description: in ParseUtil class the count variable is never change. the code is like this for (int i = 0;

[jira] [Updated] (NUTCH-1594) count variable is never changed in ParseUtil class

2013-06-29 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1594: -- Patch Info: Patch Available count variable is never changed in ParseUtil class

[jira] [Updated] (NUTCH-1594) count variable is never changed in ParseUtil class

2013-06-29 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1594: -- Attachment: NUTCH-1594.patch count variable is never changed in ParseUtil class

[jira] [Assigned] (NUTCH-1594) count variable is never changed in ParseUtil class

2013-06-29 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng reassigned NUTCH-1594: - Assignee: lufeng count variable is never changed in ParseUtil class

[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch

2013-06-18 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13686830#comment-13686830 ] lufeng commented on NUTCH-1527: --- Thanks Markus, I try the patch and can index the document

[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch

2013-06-17 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13685661#comment-13685661 ] lufeng commented on NUTCH-1527: --- Hi Markus, I have already tested the newest patch on my

[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch

2013-06-13 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682380#comment-13682380 ] lufeng commented on NUTCH-1527: --- Hi Markus 1. Elastic search will load the configure file

[jira] [Closed] (NUTCH-1575) support solr authentication in nutch 2.x

2013-06-03 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng closed NUTCH-1575. - support solr authentication in nutch 2.x Key:

[jira] [Updated] (NUTCH-1545) capture batchId and remove references to segments in 2.x crawl script.

2013-05-30 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1545: -- Fix Version/s: (was: 2.3) 2.2 capture batchId and remove references to segments in

[jira] [Commented] (NUTCH-1545) capture batchId and remove references to segments in 2.x crawl script.

2013-05-30 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13670376#comment-13670376 ] lufeng commented on NUTCH-1545: --- Committed for nutch 2.2 revision 1487875. by Feng. Thanks

[jira] [Resolved] (NUTCH-1545) capture batchId and remove references to segments in 2.x crawl script.

2013-05-30 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng resolved NUTCH-1545. --- Resolution: Fixed capture batchId and remove references to segments in 2.x crawl script.

[jira] [Resolved] (NUTCH-1563) FetchSchedule#getFields is never used by GeneraterJob

2013-05-29 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng resolved NUTCH-1563. --- Resolution: Fixed FetchSchedule#getFields is never used by GeneraterJob

[jira] [Closed] (NUTCH-1563) FetchSchedule#getFields is never used by GeneraterJob

2013-05-29 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng closed NUTCH-1563. - FetchSchedule#getFields is never used by GeneraterJob -

[jira] [Resolved] (NUTCH-1575) support solr authentication in nutch 2.x

2013-05-29 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng resolved NUTCH-1575. --- Resolution: Fixed support solr authentication in nutch 2.x

[jira] [Commented] (NUTCH-1575) support solr authentication in nutch 2.x

2013-05-29 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13669351#comment-13669351 ] lufeng commented on NUTCH-1575: --- Committed for 2.2 revision 1487521 by Feng. Thanks Lewis

[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch

2013-05-27 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13667766#comment-13667766 ] lufeng commented on NUTCH-1527: --- Hi luca,sorry for my delayed reply, yes, you can improve

[jira] [Updated] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch

2013-05-27 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1527: -- Assignee: (was: lufeng) Port nutch-elasticsearch-indexer to Nutch

[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch

2013-05-27 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13667775#comment-13667775 ] lufeng commented on NUTCH-1527: --- Hi luca, now you can click assign to me,and then attach you

[jira] [Updated] (NUTCH-1563) FetchSchedule#getFields is never used by GeneraterJob

2013-05-23 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1563: -- Fix Version/s: (was: 2.3) 2.2 FetchSchedule#getFields is never used by

[jira] [Commented] (NUTCH-1563) FetchSchedule#getFields is never used by GeneraterJob

2013-05-23 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13665161#comment-13665161 ] lufeng commented on NUTCH-1563: --- hi Tejas yes, I pushed this pathc to 2.x.

[jira] [Created] (NUTCH-1575) support solr authentication in nutch 2.x

2013-05-22 Thread lufeng (JIRA)
lufeng created NUTCH-1575: - Summary: support solr authentication in nutch 2.x Key: NUTCH-1575 URL: https://issues.apache.org/jira/browse/NUTCH-1575 Project: Nutch Issue Type: Improvement

[jira] [Work started] (NUTCH-1575) support solr authentication in nutch 2.x

2013-05-22 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-1575 started by lufeng. support solr authentication in nutch 2.x Key: NUTCH-1575

[jira] [Updated] (NUTCH-1575) support solr authentication in nutch 2.x

2013-05-22 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1575: -- Attachment: NUTCH-1575.patch add solr authentication support solr authentication in nutch

[jira] [Commented] (NUTCH-1563) FetchSchedule#getFields is never used by GeneraterJob

2013-05-20 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662003#comment-13662003 ] lufeng commented on NUTCH-1563: --- Committed for 2.2 revision 1484482 by Feng. Thanks Canan

[jira] [Commented] (NUTCH-1545) capture batchId and remove references to segments in 2.x crawl script.

2013-05-20 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662057#comment-13662057 ] lufeng commented on NUTCH-1545: --- Hi Tejas yes, the patch is just put random batchId

[jira] [Commented] (NUTCH-1486) Upgrade to Solr 4.2.1

2013-05-08 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13651936#comment-13651936 ] lufeng commented on NUTCH-1486: --- Hi Lewis The dependency version of solr-solrj in pom.xml is

[jira] [Assigned] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch

2013-05-08 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng reassigned NUTCH-1527: - Assignee: lufeng Port nutch-elasticsearch-indexer to Nutch

[jira] [Updated] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch

2013-05-08 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1527: -- Attachment: NUTCH-1527.patch port elasticsearch indexer plugin to nutch trunk. Before u install this patch,

[jira] [Updated] (NUTCH-1555) Move to commons-cli for command line parsing

2013-04-25 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1555: -- Attachment: NUTCH-1555-v1.patch Lewis: 1. fixed the fetch NPE bug 2. fixed the update not work bug Should we

[jira] [Comment Edited] (NUTCH-1555) Move to commons-cli for command line parsing

2013-04-25 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641869#comment-13641869 ] lufeng edited comment on NUTCH-1555 at 4/25/13 2:48 PM: Lewis: 1.

[jira] [Comment Edited] (NUTCH-1555) Move to commons-cli for command line parsing

2013-04-23 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13639131#comment-13639131 ] lufeng edited comment on NUTCH-1555 at 4/23/13 2:58 PM: already

[jira] [Commented] (NUTCH-1562) Order of execution for scoring filters

2013-04-20 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13637247#comment-13637247 ] lufeng commented on NUTCH-1562: --- Hi Julien, if someone define the scoring.filter.order like

[jira] [Assigned] (NUTCH-1563) FetchSchedule#getFields is never used by GeneraterJob

2013-04-20 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng reassigned NUTCH-1563: - Assignee: lufeng FetchSchedule#getFields is never used by GeneraterJob

[jira] [Created] (NUTCH-1563) FetchSchedule#getFields is never used by GeneraterJob

2013-04-18 Thread lufeng (JIRA)
lufeng created NUTCH-1563: - Summary: FetchSchedule#getFields is never used by GeneraterJob Key: NUTCH-1563 URL: https://issues.apache.org/jira/browse/NUTCH-1563 Project: Nutch Issue Type: Bug

[jira] [Assigned] (NUTCH-1555) Move to commons-cli for command line parsing

2013-04-16 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng reassigned NUTCH-1555: - Assignee: lufeng Move to commons-cli for command line parsing

[jira] [Commented] (NUTCH-1555) bug in 2.x ParserJob command line parsing

2013-04-10 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627917#comment-13627917 ] lufeng commented on NUTCH-1555: --- Hi Lewis, yes, like you said that we can choose an

[jira] [Commented] (NUTCH-1555) bug in 2.x ParserJob command line parsing

2013-04-08 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13625432#comment-13625432 ] lufeng commented on NUTCH-1555: --- Hi Lewis, as you said that FetchJob also has this bug too.

[jira] [Updated] (NUTCH-1545) capture batchId and remove references to segments in 2.x crawl script.

2013-04-06 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1545: -- Attachment: NUTCH-1545-v2.patch 1. remove any concept of crawldb and segments in bin/crawl script 2. fix the

[jira] [Resolved] (NUTCH-1547) BasicIndexingFilter - Problem to index full title

2013-03-28 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng resolved NUTCH-1547. --- Resolution: Fixed BasicIndexingFilter - Problem to index full title

[jira] [Commented] (NUTCH-1547) BasicIndexingFilter - Problem to index full title

2013-03-28 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616227#comment-13616227 ] lufeng commented on NUTCH-1547: --- Feng Committed revision 1462078 to trunk and 2.x revision

[jira] [Commented] (NUTCH-1538) tuning of loaded fields during fetcherJob start-up

2013-03-28 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616250#comment-13616250 ] lufeng commented on NUTCH-1538: --- yes, However, we can not guarantee that other plugin that

[jira] [Updated] (NUTCH-1547) BasicIndexingFilter - Problem to index full title

2013-03-27 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1547: -- Attachment: NUTCH-1547-2x.patch add patch to Nutch 2.x BasicIndexingFilter - Problem to

[jira] [Commented] (NUTCH-1389) parsechecker and indexchecker to report truncated content

2013-03-27 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13615360#comment-13615360 ] lufeng commented on NUTCH-1389: --- +1 Sebstian parsechecker and indexchecker

  1   2   >