[jira] Updated: (NUTCH-162) country code jp is used instead of language code ja for Japanese

2010-05-10 Thread Hiroaki Kawai (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hiroaki Kawai updated NUTCH-162:


Attachment: anchors_ja.properties
cached_ja.properties
explain_ja.properties

We need some japanaese property files to make ja for the default language 
selection (Because of String language = 
ResourceBundle.getBundle(org.nutch.jsp.search, 
request.getLocale()).getLocale().getLanguage(); in seach.jsp for example).

I'll submit those property files.

 country code jp is used instead of language code ja for Japanese
 

 Key: NUTCH-162
 URL: https://issues.apache.org/jira/browse/NUTCH-162
 Project: Nutch
  Issue Type: Bug
  Components: web gui
Affects Versions: 0.7.1
 Environment: n/a
Reporter: KuroSaka TeruHiko
Priority: Trivial
 Attachments: anchors_ja.properties, cached_ja.properties, 
 explain_ja.properties


 In locale switching link for Japanese, jp is used as language code but it 
 is an ISO country code.  The language code ja should be used.
 By the way, I don't think many users are familiar with the ISO language 
 codes.  A Canadian user may click on ca uknowoing that ca stands for 
 Catalan, not Canadian English or French. Rather than listing the language 
 code, listing the language names in the prospective languages may be better. 
 (I say may be because the browser could show some language names in 
 corrupted text if the current font does not support that language --- this is 
 a difficult problem.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-162) country code jp is used instead of language code ja for Japanese

2010-05-10 Thread Hiroaki Kawai (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hiroaki Kawai updated NUTCH-162:


Attachment: search_ja.properties
text_ja.properties

Please put these property files in src/web/locale/org/nutch/jsp/ .


 country code jp is used instead of language code ja for Japanese
 

 Key: NUTCH-162
 URL: https://issues.apache.org/jira/browse/NUTCH-162
 Project: Nutch
  Issue Type: Bug
  Components: web gui
Affects Versions: 0.7.1
 Environment: n/a
Reporter: KuroSaka TeruHiko
Priority: Trivial
 Attachments: anchors_ja.properties, cached_ja.properties, 
 explain_ja.properties, search_ja.properties, text_ja.properties


 In locale switching link for Japanese, jp is used as language code but it 
 is an ISO country code.  The language code ja should be used.
 By the way, I don't think many users are familiar with the ISO language 
 codes.  A Canadian user may click on ca uknowoing that ca stands for 
 Catalan, not Canadian English or French. Rather than listing the language 
 code, listing the language names in the prospective languages may be better. 
 (I say may be because the browser could show some language names in 
 corrupted text if the current font does not support that language --- this is 
 a difficult problem.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[Nutch Wiki] Update of RunningNutchAndSolr by Dmitriu s

2010-05-10 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Nutch Wiki for change 
notification.

The RunningNutchAndSolr page has been changed by Dmitrius.
The comment on this change is: Fixed commang (single quotes missed).
http://wiki.apache.org/nutch/RunningNutchAndSolr?action=diffrev1=28rev2=29

--

  = New in Nutch 1.0-dev =
- Please note that in the nightly version of Apache Nutch there is now a Solr 
integration embedded so you can start to use a lot easier. Just download a 
nightly version from [[http://hudson.zones.apache.org/hudson/job/Nutch-trunk/]].
+ Please note that in the nightly version of Apache Nutch there is now a Solr 
integration embedded so you can start to use a lot easier. Just download a 
nightly version from http://hudson.zones.apache.org/hudson/job/Nutch-trunk/.
  
  = Pre Solr Nutch integration =
- This is just a quick first pass at a guide for getting Nutch running with 
Solr.  I'm sure there are better ways of doing some/all of it, but I'm not 
aware of them.  By all means, please do correct/update this if someone has a 
better idea.  Many thanks to [[http://variogram.com||Brian Whitman at 
Variogr.am]] and [[http://blog.foofactory.fi||Sami Siren at FooFactory]] for 
all the help!  You guys saved me a lot of time! :)
+ This is just a quick first pass at a guide for getting Nutch running with 
Solr.  I'm sure there are better ways of doing some/all of it, but I'm not 
aware of them.  By all means, please do correct/update this if someone has a 
better idea.  Many thanks to http://variogram.com and http://blog.foofactory.fi 
for all the help!  You guys saved me a lot of time! :)
  
  I'm posting it under Nutch rather than Solr on the presumption that people 
are more likely to be learning/using Solr first, then come here looking to 
combine it with Nutch.  I'm going to skip over doing command by command for 
right now.  I'm running/building on Ubuntu 7.10 using Java 1.6.0_05.  I'm 
assuming that the Solr trunk code is checked out into solr-trunk and Nutch 
trunk code is checked out into nutch-trunk.
  
@@ -12, +12 @@

   * apt-get install sun-java6-jdk subversion ant patch unzip
  
  == Steps ==
- 
  The first step to get started is to download the required software 
components, namely Apache Solr and Nutch.
  
  '''1.''' Download Solr version 1.3.0 or LucidWorks for Solr from Download page
@@ -23, +22 @@

  
  '''4.''' Extract the Nutch package   tar xzf apache-nutch-1.0.tar.gz
  
+ '''5.''' Configure Solr For the sake of simplicity we are going to use the 
example configuration of Solr as a base.
- '''5.''' Configure Solr
- For the sake of simplicity we are going to use the example
- configuration of Solr as a base.
  
- '''a.''' Copy the provided Nutch schema from directory
- apache-nutch-1.0/conf to directory apache-solr-1.3.0/example/solr/conf 
(override the existing file)
+ '''a.''' Copy the provided Nutch schema from directory apache-nutch-1.0/conf 
to directory apache-solr-1.3.0/example/solr/conf (override the existing file)
  
  We want to allow Solr to create the snippets for search results so we need to 
store the content in addition to indexing it:
  
@@ -52, +48 @@

  
  str name=qf
  
- content^0.5 anchor^1.0 title^1.2
+ content^0.5 anchor^1.0 title^1.2 /str
- /str
  
- str name=pf
- content^0.5 anchor^1.5 title^1.2 site^1.5
+ str name=pf content^0.5 anchor^1.5 title^1.2 site^1.5 /str
- /str
  
+ str name=fl url /str
- str name=fl
- url
- /str
  
+ str name=mm 2-1 5-2 690% /str
- str name=mm
- 2lt;-1 5lt;-2 6lt;90%
- /str
  
  int name=ps100/int
  
@@ -91, +80 @@

  
  '''6.''' Start Solr
  
+ cd apache-solr-1.3.0/example java -jar start.jar
- cd apache-solr-1.3.0/example
- java -jar start.jar
  
  '''7. Configure Nutch'''
  
  a. Open nutch-site.xml in directory apache-nutch-1.0/conf, replace it’s 
contents with the following (we specify our crawler name, active plugins and 
limit maximum url count for single host per run to be 100) :
  
+ ?xml version=1.0? configuration
- ?xml version=1.0?
- configuration
  
  property
  
@@ -109, +96 @@

  
  /property
  
- property
- namegenerate.max.per.host/name
+ property namegenerate.max.per.host/name
  
  value100/value
  
@@ -126, +112 @@

  
  /configuration
  
- 
  '''b.''' Open regex-urlfilter.txt in directory apache-nutch-1.0/conf,replace 
it’s content with following:
  
  -^(https|telnet|file|ftp|mailto):
+ 
-  
- # skip some suffixes
- 
-\.(swf|SWF|doc|DOC|mp3|MP3|WMV|wmv|txt|TXT|rtf|RTF|avi|AVI|m3u|M3U|flv|FLV|WAV|wav|mp4|MP4|avi|AVI|rss|RSS|xml|XML|pdf|PDF|js|JS|gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe|jpeg|JPEG|bmp|BMP)$
+ # skip some suffixes 
-\.(swf|SWF|doc|DOC|mp3|MP3|WMV|wmv|txt|TXT|rtf|RTF|avi|AVI|m3u|M3U|flv|FLV|WAV|wav|mp4|MP4|avi|AVI|rss|RSS|xml|XML|pdf|PDF|js|JS|gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe|jpeg|JPEG|bmp|BMP)$
-  
+ 
- # skip URLs 

[Nutch Wiki] Update of RunningNutchAndSolr by Dmitriu s

2010-05-10 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Nutch Wiki for change 
notification.

The RunningNutchAndSolr page has been changed by Dmitrius.
The comment on this change is: It's a problem to make wiki to display grave 
assent. Managed to do that using html codes.
http://wiki.apache.org/nutch/RunningNutchAndSolr?action=diffrev1=29rev2=30

--

  
  The above command will generate a new segment directory under crawl/segments 
that at this point contains files that store the url(s) to be fetched. In the 
following commands we need the latest segment dir as parameter so we’ll store 
it in an environment variable:
  
- export SEGMENT=crawl/segments/``ls -tr crawl/segments|tail -1``
+ export SEGMENT=crawl/segments/`ls -tr crawl/segments|tail -1`
  
  Now I launch the fetcher that actually goes to get the content:
  


[Nutch Wiki] Update of RunningNutchAndSolr by Dmitriu s

2010-05-10 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Nutch Wiki for change 
notification.

The RunningNutchAndSolr page has been changed by Dmitrius.
http://wiki.apache.org/nutch/RunningNutchAndSolr?action=diffrev1=30rev2=31

--

  
  The above command will generate a new segment directory under crawl/segments 
that at this point contains files that store the url(s) to be fetched. In the 
following commands we need the latest segment dir as parameter so we’ll store 
it in an environment variable:
  
- export SEGMENT=crawl/segments/`ls -tr crawl/segments|tail -1`
+ export SEGMENT=crawl/segments/#96;ls -tr crawl/segments|tail -1#96;
  
  Now I launch the fetcher that actually goes to get the content: