[jira] Closed: (NUTCH-71) Search web page doesn't not focus on query input
[ http://issues.apache.org/jira/browse/NUTCH-71?page=all ] Jerome Charron closed NUTCH-71: --- Fix Version: 0.8-dev Resolution: Fixed Assign To: Jerome Charron Thanks Christophe for reporting it and for your piece of code. Search web page doesn't not focus on query input Key: NUTCH-71 URL: http://issues.apache.org/jira/browse/NUTCH-71 Project: Nutch Type: Bug Components: searcher Reporter: Christophe Noel Assignee: Jerome Charron Priority: Minor Fix For: 0.8-dev Attachments: searchQueryFocus.patch In search.html and search.jsp , keyboard cursor does not focus in the form query input. I've made a patch for en and fr search.html and for search.jsp. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (NUTCH-74) French Analyzer Plugin
[ http://issues.apache.org/jira/browse/NUTCH-74?page=all ] Jerome Charron updated NUTCH-74: Component: indexer Fix Version: 0.8-dev Version: 0.7 0.6 0.8-dev French Analyzer Plugin -- Key: NUTCH-74 URL: http://issues.apache.org/jira/browse/NUTCH-74 Project: Nutch Type: New Feature Components: indexer Versions: 0.7, 0.8-dev, 0.6 Environment: Nutch Reporter: Christophe Noel Assignee: Jerome Charron Fix For: 0.8-dev Attachments: analyze-french.zip, analyzers-050705.patch This is DRAFT for a new plugin for French Analysis (all java file come from Lucene project sandbox)... This includes ISO LATIN1 accent filter, plurial forms removing, ... Analyze-frech should be used instead of NutchDocumentAnalysis as described by Jerome Charron in New Language Identifier project. It should be used also as a query-parser in Nutch searcher. We miss an EXTENSION-POINT to include this kind of plugin in Nutch. Could anyone help me to build this new Extension Point please ? -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
RE: [Nutch-dev] Outlink metadata?
To this end, would it suffice to abstract the Page and Link classes and make expanded implementations of these? Jeremy -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Erik Hatcher Sent: Thursday, August 18, 2005 11:45 AM To: nutch-dev@lucene.apache.org Subject: [Nutch-dev] Outlink metadata? First a question about the current behavior... does Nutch adhere to the a rel=nofollow... conventions? If so, where is that coded? On a related note, it seems carrying metadata around on Outlink would be beneficial, not just anchor text and URL. For example, my application will crawl HTML sites with a HEAD link to RDF data. I'd like to, in an HtmlParseFilter, add ParseData metadata so that an indexer (a custom one currently, not the Nutch one) can get at the RDF data that has been fetched by the URL stored in the metadata. Make sense? Would my use indicate that Outlink should carry along metadata or is there another way to achieve this (besides writing a custom HTML parser)? Thanks, Erik --- SF.Net email is Sponsored by the Better Software Conference EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile Plan-Driven Development * Managing Projects Teams * Testing QA Security * Process Improvement Measurement * http://www.sqe.com/bsce5sf ___ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers
Re: Parse-html should be enhanced!
Will an extension from existing point be a solution? Our on-going project also needs to deal specific crawling cases in some sites. We think about extending the current java class to fit our usage. Michael Ji, --- Jack Tang [EMAIL PROTECTED] wrote: Hi Nutchers I think parse-html parse should be enhanced. In some of my projects(Intranet search engine), we only need the content in the specified detectors and filter the junk, say the content between div class=start-here and /div or some detectors like XPath. Any thoughts on this enhancement? Regards /Jack -- Keep Discovering ... ... http://www.jroller.com/page/jmars __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
RE: Parse-html should be enhanced!
Existing PARSE-HTML plugin simply stores clean text (without HTML tags) for future indexing. It stores, for instance, content of huge OPTIONS tag which we don't need at all in 99.99% of cases. I found this idea very interesting, Web-SQL: http://www.lotontech.com I've bought a book, Tony Loton Web Content Mining with Java, it consists 90% from code which I don't really need... However, I am going to implement some kind of Web-SQL and Math. Statistics. Usually web-sites have 90% of similar HTML, and I need only subset. Also, I need to find a point in Nutch where I can replace Analyzer with my own non-analyzer; I don't need to remove stop-words etc. I'd like to use Lucene as a database too... To perform a lot of queries, to calc some statistics... -Fuad -Original Message- From: Jack Tang [mailto:[EMAIL PROTECTED] Sent: Thursday, August 18, 2005 10:15 PM To: nutch-dev@lucene.apache.org Subject: Parse-html should be enhanced! Hi Nutchers I think parse-html parse should be enhanced. In some of my projects(Intranet search engine), we only need the content in the specified detectors and filter the junk, say the content between div class=start-here and /div or some detectors like XPath. Any thoughts on this enhancement? Regards /Jack -- Keep Discovering ... ... http://www.jroller.com/page/jmars