[Nutch Wiki] Update of LanguageIdentifier by JeromeCharron

2006-01-11 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Nutch Wiki for change 
notification.

The following page has been changed by JeromeCharron:
http://wiki.apache.org/nutch/LanguageIdentifier

The comment on the change is:
Add some doc about generating new language profiles

--
  
  == Generating some NGrams profiles ==
  
- TODO
+ Generating a new language profile in Nutch is really easy.
+ Simply launch the following command:
+ {{{
+ java org.apache.nutch.analysis.lang.NGramProfile -create profile-name 
filename encoding
+ }}}
+ where
+  * '''profile-name''' is the [http://www.w3.org/WAI/ER/IG/ert/iso639.htm 
ISO-639 2-letter codes] of the new language.
+  * '''filename''' is the name of the file used to build the new language 
profile (the biggest it is, and the most it contains different sources and 
subjects the better the profile will be).
+  * '''encoding''' is the encoding of the file used to build the new profile 
('''filename''').
+ 
  
  == Open Issues ==
  


svn commit: r368167 - in /lucene/nutch/trunk/src/java/org/apache/nutch: fetcher/Fetcher.java parse/ParseSegment.java

2006-01-11 Thread ab
Author: ab
Date: Wed Jan 11 15:24:40 2006
New Revision: 368167

URL: http://svn.apache.org/viewcvs?rev=368167view=rev
Log:
Make sure we always have the segment name and score values in
ParseData.metadata. Sometimes plugins would fail to copy them through,
or a parsing error would produce empty ParseData.metadata.

Modified:
lucene/nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java
lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseSegment.java

Modified: lucene/nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java
URL: 
http://svn.apache.org/viewcvs/lucene/nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java?rev=368167r1=368166r2=368167view=diff
==
--- lucene/nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java (original)
+++ lucene/nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java Wed Jan 
11 15:24:40 2006
@@ -223,6 +223,9 @@
 parse.getData().getMetadata().setProperty(SIGNATURE_KEY, 
StringUtil.toHexString(signature));
 datum.setSignature(signature);
   }
+  // add segment name and score to parseData metadata
+  parse.getData().getMetadata().setProperty(SEGMENT_NAME_KEY, segmentName);
+  parse.getData().getMetadata().setProperty(SCORE_KEY, 
Float.toString(datum.getScore()));
 
   try {
 output.collect

Modified: lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseSegment.java
URL: 
http://svn.apache.org/viewcvs/lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseSegment.java?rev=368167r1=368166r2=368167view=diff
==
--- lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseSegment.java 
(original)
+++ lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseSegment.java Wed 
Jan 11 15:24:40 2006
@@ -58,9 +58,16 @@
   status = new ParseStatus(e);
 }
 
+ContentProperties metadata = parse.getData().getMetadata();
 // compute the new signature
 byte[] signature = 
SignatureFactory.getSignature(getConf()).calculate(content, parse);
-parse.getData().getMetadata().setProperty(Fetcher.SIGNATURE_KEY, 
StringUtil.toHexString(signature));
+metadata.setProperty(Fetcher.SIGNATURE_KEY, 
StringUtil.toHexString(signature));
+// copy segment name and score
+String segmentName = 
content.getMetadata().getProperty(Fetcher.SEGMENT_NAME_KEY);
+String score = content.getMetadata().getProperty(Fetcher.SCORE_KEY);
+metadata.setProperty(Fetcher.SEGMENT_NAME_KEY, segmentName);
+metadata.setProperty(Fetcher.SCORE_KEY, score);
+
 if (status.isSuccess()) {
   output.collect(key, new ParseImpl(parse.getText(), parse.getData()));
 } else {




svn commit: r368172 - in /lucene/nutch/trunk: docs/ca/ docs/de/ docs/en/ docs/es/ docs/fi/ docs/fr/ docs/hu/ docs/jp/ docs/ms/ docs/nl/ docs/pl/ docs/pt/ docs/sv/ docs/th/ docs/zh/ src/java/org/apache

2006-01-11 Thread jerome
Author: jerome
Date: Wed Jan 11 15:50:13 2006
New Revision: 368172

URL: http://svn.apache.org/viewcvs?rev=368172view=rev
Log:
Add a style for summary highlight and ellipsis

Modified:
lucene/nutch/trunk/docs/ca/about.html
lucene/nutch/trunk/docs/ca/help.html
lucene/nutch/trunk/docs/ca/search.html
lucene/nutch/trunk/docs/de/about.html
lucene/nutch/trunk/docs/de/help.html
lucene/nutch/trunk/docs/de/search.html
lucene/nutch/trunk/docs/en/about.html
lucene/nutch/trunk/docs/en/help.html
lucene/nutch/trunk/docs/en/search.html
lucene/nutch/trunk/docs/es/about.html
lucene/nutch/trunk/docs/es/help.html
lucene/nutch/trunk/docs/es/search.html
lucene/nutch/trunk/docs/fi/about.html
lucene/nutch/trunk/docs/fi/help.html
lucene/nutch/trunk/docs/fi/search.html
lucene/nutch/trunk/docs/fr/about.html
lucene/nutch/trunk/docs/fr/search.html
lucene/nutch/trunk/docs/hu/about.html
lucene/nutch/trunk/docs/hu/help.html
lucene/nutch/trunk/docs/hu/search.html
lucene/nutch/trunk/docs/jp/about.html
lucene/nutch/trunk/docs/jp/help.html
lucene/nutch/trunk/docs/jp/search.html
lucene/nutch/trunk/docs/ms/about.html
lucene/nutch/trunk/docs/ms/help.html
lucene/nutch/trunk/docs/ms/search.html
lucene/nutch/trunk/docs/nl/about.html
lucene/nutch/trunk/docs/nl/help.html
lucene/nutch/trunk/docs/nl/search.html
lucene/nutch/trunk/docs/pl/about.html
lucene/nutch/trunk/docs/pl/help.html
lucene/nutch/trunk/docs/pl/search.html
lucene/nutch/trunk/docs/pt/about.html
lucene/nutch/trunk/docs/pt/help.html
lucene/nutch/trunk/docs/pt/search.html
lucene/nutch/trunk/docs/sv/about.html
lucene/nutch/trunk/docs/sv/help.html
lucene/nutch/trunk/docs/sv/search.html
lucene/nutch/trunk/docs/th/about.html
lucene/nutch/trunk/docs/th/help.html
lucene/nutch/trunk/docs/th/search.html
lucene/nutch/trunk/docs/zh/about.html
lucene/nutch/trunk/docs/zh/help.html
lucene/nutch/trunk/docs/zh/search.html
lucene/nutch/trunk/src/java/org/apache/nutch/searcher/Summary.java
lucene/nutch/trunk/src/web/include/style.html

Modified: lucene/nutch/trunk/docs/ca/about.html
URL: 
http://svn.apache.org/viewcvs/lucene/nutch/trunk/docs/ca/about.html?rev=368172r1=368171r2=368172view=diff
==
--- lucene/nutch/trunk/docs/ca/about.html (original)
+++ lucene/nutch/trunk/docs/ca/about.html Wed Jan 11 15:50:13 2006
@@ -16,6 +16,8 @@
 h3 {font-family: Arial, Helvetica, sans-serif; font-size: 16px; color: 
#00;}
 h4 {font-family: Arial, Helvetica, sans-serif; font-size: 14px; color: 
#00;}
 .url {color: #996600;}
+.highlight {font-weight: bold;}
+.ellipsis {font-weight: bold;}
 /style
 link type=image/x-icon href=../img/favicon.ico rel=icon
 link type=image/x-icon href=../img/favicon.ico rel=shortcut icon

Modified: lucene/nutch/trunk/docs/ca/help.html
URL: 
http://svn.apache.org/viewcvs/lucene/nutch/trunk/docs/ca/help.html?rev=368172r1=368171r2=368172view=diff
==
--- lucene/nutch/trunk/docs/ca/help.html (original)
+++ lucene/nutch/trunk/docs/ca/help.html Wed Jan 11 15:50:13 2006
@@ -16,6 +16,8 @@
 h3 {font-family: Arial, Helvetica, sans-serif; font-size: 16px; color: 
#00;}
 h4 {font-family: Arial, Helvetica, sans-serif; font-size: 14px; color: 
#00;}
 .url {color: #996600;}
+.highlight {font-weight: bold;}
+.ellipsis {font-weight: bold;}
 /style
 link type=image/x-icon href=../img/favicon.ico rel=icon
 link type=image/x-icon href=../img/favicon.ico rel=shortcut icon

Modified: lucene/nutch/trunk/docs/ca/search.html
URL: 
http://svn.apache.org/viewcvs/lucene/nutch/trunk/docs/ca/search.html?rev=368172r1=368171r2=368172view=diff
==
--- lucene/nutch/trunk/docs/ca/search.html (original)
+++ lucene/nutch/trunk/docs/ca/search.html Wed Jan 11 15:50:13 2006
@@ -16,6 +16,8 @@
 h3 {font-family: Arial, Helvetica, sans-serif; font-size: 16px; color: 
#00;}
 h4 {font-family: Arial, Helvetica, sans-serif; font-size: 14px; color: 
#00;}
 .url {color: #996600;}
+.highlight {font-weight: bold;}
+.ellipsis {font-weight: bold;}
 /style
 link type=image/x-icon href=../img/favicon.ico rel=icon
 link type=image/x-icon href=../img/favicon.ico rel=shortcut icon

Modified: lucene/nutch/trunk/docs/de/about.html
URL: 
http://svn.apache.org/viewcvs/lucene/nutch/trunk/docs/de/about.html?rev=368172r1=368171r2=368172view=diff
==
--- lucene/nutch/trunk/docs/de/about.html (original)
+++ lucene/nutch/trunk/docs/de/about.html Wed Jan 11 15:50:13 2006
@@ -16,6 +16,8 @@
 h3 {font-family: Arial, Helvetica, sans-serif; font-size: 16px; color: 
#00;}
 h4 {font-family: Arial, Helvetica, sans-serif; font-size: 14px; color: 
#00;}
 .url