Better parsed text
--
Key: NUTCH-624
URL: https://issues.apache.org/jira/browse/NUTCH-624
Project: Nutch
Issue Type: Improvement
Reporter: Vinci
I found the parsed text by default parser Neko is not easy to
Non-ascii character broken in dumped content for mixed encoding (utf-8 and
multi-byte)
--
Key: NUTCH-625
URL: https://issues.apache.org/jira/browse/NUTCH-625
Vinci,
Please use the mailing list to ask questions and discuss first, not JIRA.
Also, please include an example of what you are describing, if you can.
Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Vinci (JIRA) [EMAIL PROTECTED]
How much of a time commitment would we need to make?
Dennis
[EMAIL PROTECTED] wrote:
Hi Susam,
Good question, and I'm afraid we may be a little late:
http://wiki.apache.org/general/SummerOfCodeMentor
I think the main problem is that nobody has time to be the mentor.
As for ideas, I
[EMAIL PROTECTED] wrote:
Hi Susam,
Good question, and I'm afraid we may be a little late:
http://wiki.apache.org/general/SummerOfCodeMentor
I think the main problem is that nobody has time to be the mentor.
As for ideas, I think Solr integration would be very nice to have. Solr, with
Ok, I should be able to be a mentor. Besides solr integration are there
other ideas for the project? Also is it too late?
Dennis
Susam Pal wrote:
I believe a couple of hours every week should be enough. Last year, I
signed up as a mentor for OSVDB and we managed to get some useful job
done.
Hi Dennis,
Not too late, I think, just add Nutch + Solr idea to
http://wiki.apache.org/general/SummerOfCode2008 on Monday.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Dennis Kubes [EMAIL PROTECTED]
To: nutch-dev@lucene.apache.org
Sent:
[
https://issues.apache.org/jira/browse/NUTCH-555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes resolved NUTCH-555.
Resolution: Duplicate
Solved by NUTCH-497
StackOverflowError in DomContentUtils
[
https://issues.apache.org/jira/browse/NUTCH-555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes closed NUTCH-555.
--
Assignee: Dennis Kubes
Issue closed, fixed by NUTCH-497
StackOverflowError in DomContentUtils
[
https://issues.apache.org/jira/browse/NUTCH-500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes reassigned NUTCH-500:
--
Assignee: Dennis Kubes
Add hadoop masters configuration file into conf folder
[
https://issues.apache.org/jira/browse/NUTCH-447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes closed NUTCH-447.
--
Closed
Dmoz Structure Parser Tool
--
Key: NUTCH-447
[
https://issues.apache.org/jira/browse/NUTCH-447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes resolved NUTCH-447.
Resolution: Won't Fix
Tool is in JIRA, no need to add to main trunk.
Dmoz Structure Parser Tool
[
https://issues.apache.org/jira/browse/NUTCH-249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes reassigned NUTCH-249:
--
Assignee: Dennis Kubes
black- white list url filtering
---
[
https://issues.apache.org/jira/browse/NUTCH-291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes reassigned NUTCH-291:
--
Assignee: Dennis Kubes
OpenSearchServlet should return date as well as lastModified
[
https://issues.apache.org/jira/browse/NUTCH-295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes reassigned NUTCH-295:
--
Assignee: Dennis Kubes
More description for fetcher.threads.fetch property
[
https://issues.apache.org/jira/browse/NUTCH-213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes reassigned NUTCH-213:
--
Assignee: Dennis Kubes
checkstyle
--
Key: NUTCH-213
[
https://issues.apache.org/jira/browse/NUTCH-75?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes closed NUTCH-75.
-
Resolution: Won't Fix
WebDBReader is part of Version 0.7 which is no longer supported.
Patch for
[
https://issues.apache.org/jira/browse/NUTCH-48?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes reassigned NUTCH-48:
-
Assignee: Dennis Kubes (was: Sami Siren)
Did you mean query enhancement/refignment feature
[
https://issues.apache.org/jira/browse/NUTCH-16?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes reassigned NUTCH-16:
-
Assignee: Dennis Kubes
boost documents matching a url pattern
19 matches
Mail list logo