[jira] Created: (NUTCH-624) Better parsed text

2008-03-30 Thread Vinci (JIRA)
Better parsed text -- Key: NUTCH-624 URL: https://issues.apache.org/jira/browse/NUTCH-624 Project: Nutch Issue Type: Improvement Reporter: Vinci I found the parsed text by default parser Neko is not easy to

[jira] Created: (NUTCH-625) Non-ascii character broken in dumped content for mixed encoding (utf-8 and multi-byte)

2008-03-30 Thread Vinci (JIRA)
Non-ascii character broken in dumped content for mixed encoding (utf-8 and multi-byte) -- Key: NUTCH-625 URL: https://issues.apache.org/jira/browse/NUTCH-625

Re: [jira] Created: (NUTCH-624) Better parsed text

2008-03-30 Thread ogjunk-nutch
Vinci, Please use the mailing list to ask questions and discuss first, not JIRA. Also, please include an example of what you are describing, if you can. Thanks, Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Vinci (JIRA) [EMAIL PROTECTED]

Re: Why is Nutch not involved in Google Summer of Code - 2008?

2008-03-30 Thread Dennis Kubes
How much of a time commitment would we need to make? Dennis [EMAIL PROTECTED] wrote: Hi Susam, Good question, and I'm afraid we may be a little late: http://wiki.apache.org/general/SummerOfCodeMentor I think the main problem is that nobody has time to be the mentor. As for ideas, I

Re: Why is Nutch not involved in Google Summer of Code - 2008?

2008-03-30 Thread Andrzej Bialecki
[EMAIL PROTECTED] wrote: Hi Susam, Good question, and I'm afraid we may be a little late: http://wiki.apache.org/general/SummerOfCodeMentor I think the main problem is that nobody has time to be the mentor. As for ideas, I think Solr integration would be very nice to have. Solr, with

Re: Why is Nutch not involved in Google Summer of Code - 2008?

2008-03-30 Thread Dennis Kubes
Ok, I should be able to be a mentor. Besides solr integration are there other ideas for the project? Also is it too late? Dennis Susam Pal wrote: I believe a couple of hours every week should be enough. Last year, I signed up as a mentor for OSVDB and we managed to get some useful job done.

Re: Why is Nutch not involved in Google Summer of Code - 2008?

2008-03-30 Thread ogjunk-nutch
Hi Dennis, Not too late, I think, just add Nutch + Solr idea to http://wiki.apache.org/general/SummerOfCode2008 on Monday. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Dennis Kubes [EMAIL PROTECTED] To: nutch-dev@lucene.apache.org Sent:

[jira] Resolved: (NUTCH-555) StackOverflowError in DomContentUtils

2008-03-30 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes resolved NUTCH-555. Resolution: Duplicate Solved by NUTCH-497 StackOverflowError in DomContentUtils

[jira] Closed: (NUTCH-555) StackOverflowError in DomContentUtils

2008-03-30 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes closed NUTCH-555. -- Assignee: Dennis Kubes Issue closed, fixed by NUTCH-497 StackOverflowError in DomContentUtils

[jira] Assigned: (NUTCH-500) Add hadoop masters configuration file into conf folder

2008-03-30 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes reassigned NUTCH-500: -- Assignee: Dennis Kubes Add hadoop masters configuration file into conf folder

[jira] Closed: (NUTCH-447) Dmoz Structure Parser Tool

2008-03-30 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes closed NUTCH-447. -- Closed Dmoz Structure Parser Tool -- Key: NUTCH-447

[jira] Resolved: (NUTCH-447) Dmoz Structure Parser Tool

2008-03-30 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes resolved NUTCH-447. Resolution: Won't Fix Tool is in JIRA, no need to add to main trunk. Dmoz Structure Parser Tool

[jira] Assigned: (NUTCH-249) black- white list url filtering

2008-03-30 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes reassigned NUTCH-249: -- Assignee: Dennis Kubes black- white list url filtering ---

[jira] Assigned: (NUTCH-291) OpenSearchServlet should return date as well as lastModified

2008-03-30 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes reassigned NUTCH-291: -- Assignee: Dennis Kubes OpenSearchServlet should return date as well as lastModified

[jira] Assigned: (NUTCH-295) More description for fetcher.threads.fetch property

2008-03-30 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes reassigned NUTCH-295: -- Assignee: Dennis Kubes More description for fetcher.threads.fetch property

[jira] Assigned: (NUTCH-213) checkstyle

2008-03-30 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes reassigned NUTCH-213: -- Assignee: Dennis Kubes checkstyle -- Key: NUTCH-213

[jira] Closed: (NUTCH-75) Patch for WebDBReader to get more detailed information about WebDBs

2008-03-30 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-75?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes closed NUTCH-75. - Resolution: Won't Fix WebDBReader is part of Version 0.7 which is no longer supported. Patch for

[jira] Assigned: (NUTCH-48) Did you mean query enhancement/refignment feature request

2008-03-30 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-48?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes reassigned NUTCH-48: - Assignee: Dennis Kubes (was: Sami Siren) Did you mean query enhancement/refignment feature

[jira] Assigned: (NUTCH-16) boost documents matching a url pattern

2008-03-30 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-16?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes reassigned NUTCH-16: - Assignee: Dennis Kubes boost documents matching a url pattern