RE: implement thai lanaguage analyzer in nutch

2006-11-10 Thread Teruhiko Kurosaka
Oct 27 02:08 .svn/ -rw-rw-r-- 1 otis otis 1528 Jun 5 14:27 ThaiAnalyzer.java -rw-rw-r-- 1 otis otis 2437 Jun 5 14:27 ThaiWordFilter.java Otis - Original Message From: Teruhiko Kurosaka [EMAIL PROTECTED] To: sanjeev [EMAIL PROTECTED]; nutch-dev@lucene.apache.org Sent

RE: implement thai lanaguage analyzer in nutch

2006-11-08 Thread Teruhiko Kurosaka
Sanjay, I don't think you should follow the Chinese example and extend the CJK range. This was needed because Chinese and Japanese don't use space to separate words. I believe Thai uses spaces, right? If so, you should extend LETTER range to include Thai character rather than CJK. Another place

RE: What javacc options should I use to compile NutchAnalysis.jj?

2006-10-19 Thread Teruhiko Kurosaka
Please disregard this posting. It was my oversight. build.xml does have a javacc rule. So this is just a version difference of javacc? -kuro -Original Message- From: Teruhiko Kurosaka Sent: 2006-10-18 17:42 To: nutch-dev@lucene.apache.org Cc: Teruhiko Kurosaka Subject: What

What javacc options should I use to compile NutchAnalysis.jj?

2006-10-18 Thread Teruhiko Kurosaka
I am trying to modify the java CC rules in NutchAnalysis.jj. As a preparation, I ran javacc (ver 3.2) to compile NutchAnalysis.jj of Nutch 0.8 but the generated Java files are little bit different than those found in the src/java directory. Am I supposed to use some javacc command line options?

Why nutch plugin says the plugin is not present or inactive?

2006-09-05 Thread Teruhiko Kurosaka
I developed a plugin and tried to run it using nutch plugin plugin-name plugin-fully-qualified-class-name arg1 arg2 of Nutch 0.8. But it says my plugin is not present or inactive. I tried the nutch plugin command with a known plugin language-identifier as: ./nutch plugin languageidentifier

Why are lib- plugins needed?

2006-08-31 Thread Teruhiko Kurosaka
Hello, I see many plugins named lib- which are wrappers around other non-plugin .jar files. For example, analysis-de plugin uses lib-lucene-analyzers plugin, which in turn reference to the jar file that contains GermanAnalyzer. What is the reason for this indirection? The plugins called by

RE: 0.8 release

2006-07-07 Thread Teruhiko Kurosaka
May I suggest someone take a look at NUTCH-266 before releaseing 0.8? Nutch build as of half a month ago was not working for me and another person. -kuro -Original Message- From: Stefan Groschupf [mailto:[EMAIL PROTECTED] Sent: 2006-7-05 11:53 To: nutch-dev@lucene.apache.org

RE: [jira] Commented: (NUTCH-266) hadoop bug when doing updatedb

2006-06-22 Thread Teruhiko Kurosaka
Thank you for your reply, Sami. I am not intend to run hadoop at all, so this hadoop-site.xlm is empty. ... You should at least set values for 'mapred.system.dir' and 'mapred.local.dir' and point them to a dir that has enough space available (I think they default to under /tmp at least on

RE: IncrediBILL's Random Rants: How Much Nutch is TOO MUCH Nutch?

2006-06-16 Thread Teruhiko Kurosaka
How about introducing these changes in an effort to force the nutch admins to properly edit the bot identity strings? 1. Add the http.agent.* entries to nutch-site.xml with the value being EDITME. The description should clearly state that these values *must* be edited to reflect the true

how to turn on logging, excersize analyzer, tips on debugging plugins?

2006-06-01 Thread Teruhiko Kurosaka
Nutch develpers, I'm writing my a language analyzer and have three questions. Any pointer will be appreciated. 1. How do I turn on the logging facility? 2. Is there an easy way to run just an analyzer plugin, rather than running nutch crawl? 3. How do I run debugger (eclipse, in may case) over

i18n in nutch home page is misnomor

2006-06-01 Thread Teruhiko Kurosaka
Dear Webmaster of http://lucene.apache.org/nutch/ In the menu bar, under the Documentation heading there is an item called i18n. The web page linked from i18n talks about how to translate (localize) the search GUI. This is not i18n (internationalization) which should mean designing and

Do analyzer plugins have acces to the Configuration?

2006-05-30 Thread Teruhiko Kurosaka
Jérôme, or anybody familiar with language plugin architecture, I am writing a language analyzer plugin. This plugin has configurable parameters, which I am hoping I can add to nutch-site.xml. But the German and French plugin examples don't access to the Configuration object. Does the current

Status of language plugin

2006-05-19 Thread Teruhiko Kurosaka
Hello Jérôme, Because of other issues at work, I was away from Nutch. Now I'm back, and I see you are making progresses according to your notes in jira. Is there an API doc or design doc that I can read to understand where you are? Is the language plugin architecture already in the main trunk?