You're storing the language value in your session isn't it? Well, there is your problem.
On Thursday 02 December 2010 12:56:31 Klaus Tachtler wrote: > Hi List, > > i'm new to nutch, and try to index my own (very small) homepage, with > success! > > My homepage is reachable in german an english, but when I try to crawl > it with nutch, I only get the german content? > > Here the command-line I used to crawl my site: > > # bin/nutch crawl urls -dir crawl -depth 10 > > Here my crawl-urlfilter.txt > > # skip file:, ftp:, & mailto: urls > -^(file|ftp|mailto): > > # skip image and other suffixes we can't yet parse > -\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm| > tgz|mov|MOV|exe|jpeg|JPEG|bmp|BMP)$ > > # skip URLs containing certain characters as probable queries, etc. > -[...@=] > > # skip URLs with slash-delimited segment that repeats 3+ times, to break > loops -.*(/[^/]+)/[^/]+\1/[^/]+\1/ > > # accept hosts in MY.DOMAIN.NAME > # Tachtler > # default: +^http://([a-z0-9]*\.)*MY.DOMAIN.NAME/ > +^http://www.tachtler.net/ > > # skip everything else > -. > > Here my nutch-default.xml (section: plugin.includes) > > <property> > <name>plugin.includes</name> > > <value>protocol-http|urlfilter-regex|parse-(text|html|js|tika)|index-(basic > |anchor|more)|query-(basic|site|url|lang)|response-(json|xml)|summary-basic > |scoring-opic|urlnormalizer-(pass|regex|basic)|analysis-(de|en)|language-id > entifier</value> > > Please, can anyone help me? > > > Klaus. > > > -- > > ------------------------------------------------ > e-Mail : [email protected] > Homepage: http://www.tachtler.net > DokuWiki: http://www.dokuwiki.tachtler.net > ------------------------------------------------ -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350

