Confusion about query languages
-------------------------------
Key: NUTCH-316
URL: http://issues.apache.org/jira/browse/NUTCH-316
Project: Nutch
Type: Bug
Components: web gui
Versions: 0.8-dev
Environment: n/a
Reporter: KuroSaka TeruHiko
In 2006-6-16 nightly source code, src/web/jsp/search.jsp has these lines:
String queryLang = request.getParameter("lang");
if (queryLang == null) { queryLang = ""; }
Query query = Query.parse(queryString, queryLang, nutchConf);
According to the observation of URLs shown in the browser, the lang parameter
reflects the language
of the GUI (the language in which GUI elements are labeled) as the user clicks
on the two letter code
near the bottom of each Nutch GUI screen.
The Java API Doc on Query is not clear about what queryLang is meant. Is this
the language of
the query (how query should be lemmatized, if supported by the analyzer, and
what stop word list
should be applied), is is this the language of the documents to be searched?
Although the two concepts above are closely related, they are not tied to the
GUI language at all.
I, as Japanese user, might prefer to see all GUIs in Japanese, but I would
still need to
search English documents for Englsh words. The current implementation of
search.jsp seems
to restrict search domain to the documents of the GUI language in one way (by
treating the
terms to be from the GUI language), or the other (restricting the search domain
to the documents
of the GI language).
To be perfect, there should be a drop-down list from which the language of
query analyzer
is selected, and a set of check boxes from which the document languages can be
selected,
in addition to the existing line of two letter language codes from which the
GUI language is choosen.
But that would be too clutering.
Google uses a separate configuration screen to let the user to choose a set of
languages
of the documents to be searched. That might be a good middle-of-the-road
approach.
Because of the lack of language processing on search terms, Google does not
need to know
the language of the query. Nutch GUI might want to have a drop down list from
which a language
of the query can be choosen, with the GUI language pre-selected.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers