help with parse-rss

2009-01-04 Thread Vlad Cananau

Hello
I'm trying to make RSSParser do something simmilar to FeedParser (which 
doesn't work quite right) - that is, instead of indexing the whole 
contents of the feed, I want it to show individual items, with their 
respective title and and proper link to the article.


For each item in each RSS channel (the code does not differ much for 
getParse() of RSSParser.java) I do something like


 Outlink[] outlinks = new Outlink[1];
 try{
   outlinks[0] = new Outlink(whichLink, theRSSItem.getTitle());
 } catch (Exception e) {
   continue;
 }

 parseResult.put(
   whichLink,
   new ParseText(theRSSItem.getDescription()),
   new ParseData(
 ParseStatus.STATUS_SUCCESS,
 theRSSItem.getTitle(),
 outlinks,
 new Metadata() //was content.getMetadata()
   )
 );

The problem is, however, that only one item from the whole RSS gets into 
the index, although in the log I can see them all ( I've tried it with 
feeds from cnn and reuters). What happens? Why do they get overwritten 
in a seemingly random order? The item that makes it into the index is 
neither the first nor the last, but appears to be the same until new 
items appear in the feed.


Thank you,
Vlad


Build failed in Hudson: Nutch-trunk #683

2009-01-04 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Nutch-trunk/683/changes

--
[...truncated 6523 lines...]
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 2.363 sec

init:

init-plugin:

deps-jar:

init:

init-plugin:

deps-jar:

compile:
 [echo] Compiling plugin: lib-regex-filter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.195 sec

jar:

init:

init-plugin:

deps-jar:

compile:
 [echo] Compiling plugin: lib-regex-filter

compile-test:

compile:
 [echo] Compiling plugin: urlfilter-regex

compile-test:
[javac] Compiling 1 source file to 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-regex/test
 

init:

init-plugin:

deps-jar:

compile:
 [echo] Compiling plugin: urlfilter-suffix

compile-test:
[javac] Compiling 1 source file to 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix/test
 

jar:

deps-test:

init:

init-plugin:

deps-jar:

compile:
 [echo] Compiling plugin: lib-regex-filter

jar:

deps-test:

deploy:

copy-generated-lib:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: urlfilter-regex

jar:

deps-test:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: urlfilter-suffix
[junit] Running org.apache.nutch.urlfilter.suffix.TestSuffixURLFilter
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 0.17 sec

init:

init-plugin:

deps-jar:

compile:
 [echo] Compiling plugin: urlnormalizer-basic

compile-test:
[javac] Compiling 1 source file to 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic/test
 
[junit] Running org.apache.nutch.urlfilter.regex.TestRegexURLFilter

jar:

deps-test:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: urlnormalizer-basic
[junit] Running 
org.apache.nutch.net.urlnormalizer.basic.TestBasicURLNormalizer
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.028 sec

init:

init-plugin:

deps-jar:

compile:
 [echo] Compiling plugin: urlnormalizer-pass

compile-test:
[javac] Compiling 1 source file to 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass/test
 

jar:

deps-test:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: urlnormalizer-pass
[junit] Running 
org.apache.nutch.net.urlnormalizer.pass.TestPassURLNormalizer
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.173 sec

init:

init-plugin:

deps-jar:

compile:
 [echo] Compiling plugin: urlnormalizer-regex

compile-test:
[javac] Compiling 1 source file to 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-regex/test
 
[javac] Note: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/src/plugin/urlnormalizer-regex/src/test/org/apache/nutch/net/urlnormalizer/regex/TestRegexURLNormalizer.java
  uses unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

jar:

deps-test:

init:

init-plugin:

compile:

jar:
  [jar] Warning: skipping jar archive 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/nutch-extensionpoints/nutch-extensionpoints.jar
  because no files were included.

deps-test:

deploy:

copy-generated-lib:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: urlnormalizer-regex
[junit] Running 
org.apache.nutch.net.urlnormalizer.regex.TestRegexURLNormalizer
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.259 sec
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 6.392 sec

BUILD FAILED
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build.xml :306: 
The following error occurred while executing this line:
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/src/plugin/build.xml
 :109: The following error occurred while executing this line:
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/src/plugin/build-plugin.xml
 :200: Tests failed!

Total time: 7 minutes 2 seconds
Publishing Javadoc
FATAL: Unable to copy Javadoc from 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/docs/api  
to /export/home/hudson/hudson/jobs/Nutch-trunk/javadoc
hudson.util.IOException2: hudson.util.IOException2: null
stream=
at hudson.FilePath.readFromTar(FilePath.java:1013)
at hudson.FilePath.copyRecursiveTo(FilePath.java:936)
at hudson.FilePath.copyRecursiveTo(FilePath.java:848)
at hudson.tasks.JavadocArchiver.perform(JavadocArchiver.java:68)
at 
hudson.model.AbstractBuild$AbstractRunner.performAllBuildStep(AbstractBuild.java:322)
at 
hudson.model.AbstractBuild$AbstractRunner.performAllBuildStep(AbstractBuild.java:310)
at hudson.model.Build$RunnerImpl.post2(Build.java:126)
at 
hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:295)
at 

Re: RSS-fecter and index individul-how can i realize this function

2009-01-04 Thread Vlad Cananau
Hello
I'm trying to make RSSParser do something simmilar to FeedParser (which
doesn't work quite right) - that is, instead of indexing the whole contents
of the feed, I want it to show individual items, with their respective title
and and proper link to the article I realize that I could index 1 depth
more, but I'd like to index just the feed, not the articles that go with it
(keep the index small and the crawl fast).

For each item in each RSS channel (the code does not differ much for
getParse() of RSSParser.java) I do something like

 Outlink[] outlinks = new Outlink[1];
 try{
  outlinks[0] = new Outlink(whichLink, theRSSItem.getTitle());
 } catch (Exception e) {
  continue;
 }

 parseResult.put(
  whichLink,
  new ParseText(theRSSItem.getTitle() + theRSSItem.getDescription()),
  new ParseData(
ParseStatus.STATUS_SUCCESS,
theRSSItem.getTitle(),
outlinks,
new Metadata() //was content.getMetadata()
  )
 );

The problem is, however, that only one item from the whole RSS gets into the
index, although in the log I can see them all ( I've tried it with feeds
from cnn and reuters). What happens? Why do they get overwritten in a
seemingly random order? The item that makes it into the index is neither the
first nor the last, but appears to be the same until new items appear in the
feed.

Thank you,
Vlad