[jira] Closed: (NUTCH-71) Search web page doesn't not focus on query input

2005-08-18 Thread Jerome Charron (JIRA)
 [ http://issues.apache.org/jira/browse/NUTCH-71?page=all ]
 
Jerome Charron closed NUTCH-71:
---

Fix Version: 0.8-dev
 Resolution: Fixed
  Assign To: Jerome Charron

Thanks Christophe for reporting it and for your piece of code.

 Search web page doesn't not focus on query input
 

  Key: NUTCH-71
  URL: http://issues.apache.org/jira/browse/NUTCH-71
  Project: Nutch
 Type: Bug
   Components: searcher
 Reporter: Christophe Noel
 Assignee: Jerome Charron
 Priority: Minor
  Fix For: 0.8-dev
  Attachments: searchQueryFocus.patch

 In search.html and search.jsp , keyboard cursor does not focus in the form 
 query input.
 I've made a patch for en and fr search.html and for search.jsp.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



[jira] Updated: (NUTCH-74) French Analyzer Plugin

2005-08-18 Thread Jerome Charron (JIRA)
 [ http://issues.apache.org/jira/browse/NUTCH-74?page=all ]

Jerome Charron updated NUTCH-74:


  Component: indexer
Fix Version: 0.8-dev
Version: 0.7
 0.6
 0.8-dev

 French Analyzer Plugin
 --

  Key: NUTCH-74
  URL: http://issues.apache.org/jira/browse/NUTCH-74
  Project: Nutch
 Type: New Feature
   Components: indexer
 Versions: 0.7, 0.8-dev, 0.6
  Environment: Nutch
 Reporter: Christophe Noel
 Assignee: Jerome Charron
  Fix For: 0.8-dev
  Attachments: analyze-french.zip, analyzers-050705.patch

 This is DRAFT for a new plugin for French Analysis (all java file come from 
 Lucene project sandbox)... This includes ISO LATIN1 accent filter, plurial 
 forms removing, ...
 Analyze-frech should be used instead of NutchDocumentAnalysis as described by 
 Jerome Charron in New Language Identifier project. It should be used also as 
 a query-parser in Nutch searcher.
 We miss an EXTENSION-POINT to include this kind of plugin in Nutch. Could 
 anyone help me to build this new Extension Point please ?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



RE: [Nutch-dev] Outlink metadata?

2005-08-18 Thread Jeremy Calvert
To this end, would it suffice to abstract the Page and Link classes and
make expanded implementations of these?

Jeremy

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Erik
Hatcher
Sent: Thursday, August 18, 2005 11:45 AM
To: nutch-dev@lucene.apache.org
Subject: [Nutch-dev] Outlink metadata?

First a question about the current behavior... does Nutch adhere to  
the a rel=nofollow... conventions?  If so, where is that coded?

On a related note, it seems carrying metadata around on Outlink would  
be beneficial, not just anchor text and URL.  For example, my  
application will crawl HTML sites with a HEAD link to RDF data.   
I'd like to, in an HtmlParseFilter, add ParseData metadata so that an  
indexer (a custom one currently, not the Nutch one) can get at the  
RDF data that has been fetched by the URL stored in the metadata.   
Make sense?

Would my use indicate that Outlink should carry along metadata or is  
there another way to achieve this (besides writing a custom HTML  
parser)?

Thanks,
 Erik



---
SF.Net email is Sponsored by the Better Software Conference  EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle
Practices
Agile  Plan-Driven Development * Managing Projects  Teams * Testing 
QA
Security * Process Improvement  Measurement *
http://www.sqe.com/bsce5sf
___
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers


Re: Parse-html should be enhanced!

2005-08-18 Thread Michael Ji
Will an extension from existing point be a solution? 

Our on-going project also needs to deal specific
crawling cases in some sites. We think about extending
the current java class to fit our usage.

Michael Ji,

--- Jack Tang [EMAIL PROTECTED] wrote:

 Hi Nutchers
 
 I think parse-html parse should be enhanced. In some
 of  my
 projects(Intranet search engine), we only need the
 content in the
 specified detectors and filter the junk, say the
 content between div
 class=start-here and /div or some detectors
 like XPath. Any
 thoughts on this enhancement?
 
 Regards
 /Jack
 -- 
 Keep Discovering ... ...
 http://www.jroller.com/page/jmars
 


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


RE: Parse-html should be enhanced!

2005-08-18 Thread Fuad Efendi
Existing PARSE-HTML plugin simply stores clean text (without HTML tags)
for future indexing. It stores, for instance, content of huge OPTIONS
tag which we don't need at all in 99.99% of cases.

I found this idea very interesting, Web-SQL:
http://www.lotontech.com
I've bought a book, Tony Loton Web Content Mining with Java, it
consists 90% from code which I don't really need...
However, I am going to implement some kind of Web-SQL and Math.
Statistics. Usually web-sites have 90% of similar HTML, and I need only
subset.

Also, I need to find a point in Nutch where I can replace Analyzer with
my own non-analyzer; I don't need to remove stop-words etc.

I'd like to use Lucene as a database too... To perform a lot of queries,
to calc some statistics...

-Fuad


-Original Message-
From: Jack Tang [mailto:[EMAIL PROTECTED] 
Sent: Thursday, August 18, 2005 10:15 PM
To: nutch-dev@lucene.apache.org
Subject: Parse-html should be enhanced!


Hi Nutchers

I think parse-html parse should be enhanced. In some of  my
projects(Intranet search engine), we only need the content in the
specified detectors and filter the junk, say the content between div
class=start-here and /div or some detectors like XPath. Any
thoughts on this enhancement?

Regards
/Jack
-- 
Keep Discovering ... ...
http://www.jroller.com/page/jmars