It was http://www.sipri.org/yearbook/2011/files/SIPRIYB11summaryNL.pdf IIRC.
On Mon, Nov 21, 2011 at 3:12 PM, Markus Jelsma <[email protected]>wrote: > Can you pass me the URL? > > > Nothing shows up for me. It just sits there like it's waiting on > something > > or processing. > > > > On Thu, Nov 10, 2011 at 3:30 PM, Markus Jelsma > > > > <[email protected]>wrote: > > > Uh, the filter checker immediately produces output. > > > > > > > Interesting. What kind of output should I expect to see? So far > it's > > > > > > been > > > > > > > running for a while with no output. > > > > > > > > On Thu, Nov 10, 2011 at 1:51 PM, Markus Jelsma > > > > > > > > <[email protected]>wrote: > > > > > You can use bin/nutch org.apache.nutch.net.URLFilterChecker > > > > > > -allCombined > > > > > > > > to test. > > > > > > > > > > > Okay. So I would just put that above the +. line, right? > > > > > > > > > > > > Thanks. > > > > > > > > > > > > On Thu, Nov 10, 2011 at 10:42 AM, Markus Jelsma > > > > > > > > > > > > <[email protected]>wrote: > > > > > > > if i want to remove example.org from my CrawlDB using regex > > > > > > filters > > > > > > > > i'll > > > > > > > > > > > > add: > > > > > > > > > > > > > > -^http://example\.org/ > > > > > > > > > > > > > > and run updatedb with filtering enabled. The URL's will then be > > > > > > > > > > deleted. > > > > > > > > > > > > On Thursday 10 November 2011 16:36:24 Bai Shen wrote: > > > > > > > > Can you give me an example of how would I set my URL filter > to > > > > > > > > do > > > > > > > > > > this? > > > > > > > > > > > > > Right now I'm just using the default. > > > > > > > > > > > > > > > > On Mon, Oct 31, 2011 at 3:47 PM, Markus Jelsma > > > > > > > > > > > > > > > > <[email protected]>wrote: > > > > > > > > > Hi > > > > > > > > > > > > > > > > > > Write an regex URL filter and use it the next time you > update > > > > > > the > > > > > > > > db; > > > > > > > > > > > > it > > > > > > > > > > > > > > > > will > > > > > > > > > disappear. Be sure to backup the db first in case your > regex > > > > > > > > > > catches > > > > > > > > > > > > > > valid URL's. Nutch 1.5 will have an option to keep the > > > > > > > > > previous version of the DB after update. > > > > > > > > > > > > > > > > > > cheers > > > > > > > > > > > > > > > > > > > We accidentally injected some urls into the crawl > database > > > > > > and > > > > > > > > > > > > > I need to > > > > > > > > > > > > > > > > > > go > > > > > > > > > > > > > > > > > > > remove them. From what I understand, in 1.4 I can view > and > > > > > > > > > > modify > > > > > > > > > > > > the > > > > > > > > > > > > > > > > urls > > > > > > > > > > > > > > > > > > > and indexes. But I can't seem to find any information on > > > > > > > > > > how to > > > > > > > > > > do > > > > > > > > > > > > > > > this. > > > > > > > > > > > > > > > > > > > > Is there anything regarding this available? > > > > > > > > > > > > > > -- > > > > > > > Markus Jelsma - CTO - Openindex > > > > > > > http://www.linkedin.com/in/markus17 > > > > > > > 050-8536620 / 06-50258350 >

