NO tutorial this can be solved by plugins mechanism, pls refer other language implement at plugin folder src\plugin\analysis-de.
On Wed, Oct 20, 2010 at 9:53 AM, Dennis <[email protected]> wrote: > Hi, > I am trying to crawl Chinese web pages using nutch-1.2. Does anyone have > any tutorial on this? I got a lot errors during the configuration. See the > following: > ThanksDennis > b...@ubuntu:~/workspacecloud2/analysis$ javacc NutchAnalysis.jj Java > Compiler Compiler Version 5.0 (Parser Generator)(type "javacc" with no > arguments for help)Reading from file NutchAnalysis.jj . . .Warning: Line 23, > Column 3: Bad option name "OPTIMIZE_TOKEN_MANAGER". Option setting will be > ignored.Note: UNICODE_INPUT option is specified. Please make sure you create > the parser/lexer using a Reader with the correct character encoding.File > "TokenMgrError.java" does not exist. Will create one.File > "ParseException.java" does not exist. Will create one.File "Token.java" > does not exist. Will create one.File "CharStream.java" does not exist. > Will create one.Parser generated with 0 errors and 1 warnings. > > b...@ubuntu:~/workspacecloud2/nutch-1.2$ bin/nutch crawl urls -dir crawl > -depth 2crawl started in: crawlrootUrlDir = urlsthreads = 10depth = > 2indexer=luceneInjector: starting at 2010-10-20 09:35:50Injector: crawlDb: > crawl/crawldbInjector: urlDir: urlsInjector: Converting injected urls to > crawl db entries.Injector: Merging injected urls into crawl db.Injector: > finished at 2010-10-20 09:36:29, elapsed: 00:00:38Generator: starting at > 2010-10-20 09:36:29Generator: Selecting best-scoring urls due for > fetch.Generator: filtering: trueGenerator: normalizing: trueGenerator: > jobtracker is 'local', generating exactly one partition.Exception in thread > "main" java.io.IOException: Job failed! at > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252) at > org.apache.nutch.crawl.Generator.generate(Generator.java:526) at > org.apache.nutch.crawl.Generator.generate(Generator.java:431) at > org.apache.nutch.crawl.Crawl.main(Crawl.java:126) > > > >

