Re: ontology implementation

2010-01-07 Thread Brian Ulicny
whole query. If you had queried on [Italian wine], I don't believe it would match. In any case, what it does out of the box is pretty crude, but the infrastructure is probably there for doing something more interesting. Brian Ulicny On Thu, 07 Jan 2010 17:21 +0100, "Claudio Marte

Re: Event search engine

2009-09-23 Thread Brian Ulicny
Here's a recently announced event search engine: http://searchengineland.com/what-where-when-travel-local-search-combine-goby-com-26395 Just heard of it today. Brian Ulicny On Wed, 23 Sep 2009 09:27 +0200, "Michael Wechner" wrote: > Mitia Notaras schrieb: > > Hi ther

Re: Why cant I inject a google link to the database?

2009-07-17 Thread Brian Ulicny
sv&start=100&sa=N > >>>> > >>>> And all I get is "no more URLs to fetch" > >>>> > >>>> The reason for why I want to do this is because I had a tought on maby > >>>> I > >>>> could use google to generate my start list of urls by injecting pages > >>&

Re: Fetch only Blogs.

2009-02-05 Thread Brian Ulicny
This again will help identify self-hosted blogs, which are the main issue here. In any case, there is no foolproof formula for identifying all blogs by URL format only, so if you only want to crawl blogs, you'll have to do some upfront work to identify the ones you want to crawl. Brian Ulicny

Re: test

2009-01-13 Thread Brian Ulicny
got it. On Wed, 14 Jan 2009 01:32:25 +0530, "Mayank Kamthan" said: > test to check if i am able to post to this.. > somebody plz reply to confirm as just now i received a mailer daemon for > the > last mai. > > regards, > Mayank. > > -- > Mayank Kamt

Re: issue with search.jsp in nutch-0.9.war

2008-10-07 Thread Brian Ulicny
ght index > database--that is,"crawl" in my case > still thanks for your advice > Mr Shore > > Brian Ulicny wrote: > > > > No, that's a relative path. It says, look inside the subdirectory root > > of the current directory from which you've d

Re: issue with search.jsp in nutch-0.9.war

2008-10-07 Thread Brian Ulicny
On Tue, 7 Oct 2008 07:51:57 -0700 (PDT), "Mr Shore" <[EMAIL PROTECTED]> said: > > you mean in nutch-site.xml,the value of "seach.dir"? > it's already absolute value, > in my case,it's /root/crawl > Mr Shore > > Brian Ulicny wrote:

Re: issue with search.jsp in nutch-0.9.war

2008-10-07 Thread Brian Ulicny
tly appreciated! > -- > View this message in context: > http://www.nabble.com/issue-with-search.jsp-in-nutch-0.9.war-tp19855907p19855907.html > Sent from the Nutch - User mailing list archive at Nabble.com. > -- Brian Ulicny bulicny at alum dot mit dot edu home: 781-721-5746 fax: 360-361-5746

Re: Most Common Anchor Text list?

2008-08-20 Thread Brian Ulicny
r whatever. Brian Ulicny On Tue, 19 Aug 2008 22:47:19 -0700 (PDT), "dealmaker" <[EMAIL PROTECTED]> said: > > Hi, > I am looking for a list of most common anchor text (words and phases). > Something like a list of stop words, but I am looking for one that is > specif

RE: Extracting Embedded Outlinks

2008-04-23 Thread Brian Ulicny
"src", 0)); > > linkParams.put("script", new LinkParams("script", "src", 0)); > > linkParams.put("link", new LinkParams("link", "href", 0)); > > linkParams.put("img", new LinkParams("img", "src", 0)); > > } > > > > But nothing happens. These links are always ignored. In fact, the > > print statement never prints. > > > > How can I extract these outlinks? > > > > Brian > > > > > > -- > > Brian Ulicny > > bulicny at alum dot mit dot edu > > home: 781-721-5746 > > fax: 360-361-5746 > > > > > > _ > In a rush? Get real-time answers with Windows Live Messenger. > http://www.windowslive.com/messenger/overview.html?ocid=TXT_TAGLM_WL_Refresh_realtime_042008 -- Brian Ulicny bulicny at alum dot mit dot edu home: 781-721-5746 fax: 360-361-5746

Extracting Embedded Outlinks

2008-04-23 Thread Brian Ulicny
linkParams.put("form", new LinkParams("form", "action", 1)); } linkParams.put("frame", new LinkParams("frame", "src", 0)); linkParams.put("iframe", new LinkParams("iframe", "src", 0)); linkParams.put("

Re: Search for Just PDF documents

2008-04-16 Thread Brian Ulicny
Have you tried using url:pdf ? You'll need to use the index-more and query-more plugins for this. Brian Ulicny On Wed, 16 Apr 2008 06:12:33 -0700 (PDT), "oddaniel" <[EMAIL PROTECTED]> said: > > Hi please how can I do a Nutch search for just PDF document results o

RE: nutch: creating new plugins: query plugin

2008-03-26 Thread Brian Ulicny
t; > Direct: +41 (0)22 596 10 35 > > Cross Systems - Groupe Micropole Univers > Route des Acacias 45 B > 1227 Carouge / Genève > Tél: +41 (0)22 308 48 60 > Fax: +41 (0)22 308 48 68 > > > > > > > -Original Message- > From: Brian Ulicny

RE: nutch: creating new plugins: query plugin

2008-03-26 Thread Brian Ulicny
y got indexed and you are pointing at the right index. Brian Ulicny On Wed, 26 Mar 2008 11:50:43 +0100, "POIRIER David" <[EMAIL PROTECTED]> said: > Hello, > > I really need your help here please. I tried a few more things; I > deleted my two plugins and instead of

Re: Dublin core metadata fields

2008-02-28 Thread Brian Ulicny
Did you write a query filter? Brian Ulicny On Thu, 28 Feb 2008 19:40:58 +0100, "Syed Ahmed" <[EMAIL PROTECTED]> said: > yes i did it, and it also includes it when parsing...but the result is > the > same...i cant search for dc fields. > > On Thu, Feb 28, 2008 a

Re: Dublin core metadata fields

2008-02-28 Thread Brian Ulicny
Did you include it in the list of plugins to be invoked in nutch-site.xml? What happens when you try it? Brian Ulicny On Tue, 26 Feb 2008 20:28:22 +0100, "Syed Ahmed" <[EMAIL PROTECTED]> said: > hello, > I have written a parser and indexer for dublin core metadata. is

Re: Indexing Feeds & Blog Posts with Nutch

2007-10-11 Thread Brian Ulicny
at present? There's something in the code that looks for application/rss+xml as the mime type. Brian Ulicny On Thu, 11 Oct 2007 15:23:04 -0700, "Chris Mattmann" <[EMAIL PROTECTED]> said: > Hi Rick, > > Glad to hear that you're interested in using Nutch! > >

Re: searching on date field

2007-09-05 Thread Brian Ulicny
Or you could simply use this hack: query [url:http date:200700206-20070326] (assuming that all of your documents have URLs beginning with http.) Brian Ulicny On Wed, 05 Sep 2007 08:32:45 -0500, "Sagar Naik" <[EMAIL PROTECTED]> said: > Hey, > Two options : > 1. s

Re: opensearch error nutch 9

2007-08-30 Thread Brian Ulicny
.doGet > (OpenSearchServlet.java:250) > javax.servlet.http.HttpServlet.service(HttpServlet.java:690) > javax.servlet.http.HttpServlet.service(HttpServlet.java:803) > > note The full stack trace of the root cause is available in the > Apache Tomcat/5.5.23 logs. > > >

Re: search by field

2007-08-30 Thread Brian Ulicny
;apache" > > > > Using luke, query the same index using query > > date:20070101-20070701 apache > > returned 2 documents. > > > > What did I miss here? is there some logics implemented for "site:" > > keyword that I have to do for other keyword? > > > > --Kevin > > > > > > > -- Brian Ulicny bulicny at alum dot mit dot edu home: 781-721-5746 fax: 360-361-5746

Re: Nutch on Windows

2007-05-23 Thread Brian Ulicny
The latter: text files with urls in them. They must be compatible with your URL filter to actually get crawled, of course. Brian Ulicny On Wed, 23 May 2007 13:52:29 -0500, "Aaron Green" <[EMAIL PROTECTED]> said: > Thanks for your reply. I'm still a little cloudy ab

Re: Nutch on Windows

2007-05-23 Thread Brian Ulicny
name of a directory, not the name of a file. Hope that helps. Brian Ulicny On Wed, 23 May 2007 11:11:26 -0500, "Aaron Green" <[EMAIL PROTECTED]> said: > I have read through an archive message dealing with Nutch on Windows. It > was helpful, but I'm still having proble

Re: Ontology plugin in 0.8

2006-09-22 Thread Brian Ulicny
ou, after that proceed to more complex > ontologies (with what > I have no experience with) > > -- > Sami Siren > > Brian Ulicny wrote: > > Florian, > > > > Thanks for replying: my nutch-site.xml under webapps was wrong. > > Correcting it gets me a

Re: Ontology plugin in 0.8

2006-09-22 Thread Brian Ulicny
> query-(basic|site|url)|lib-jakarta-poi|lib-lucene-analyzers|scoring-opic > > > That's it... > > Regards > > > Brian Ulicny wrote: > > I'm trying to get the Ontology plugin to work in an 0.8 environment. > > > > I set extension.ontology.extension-nam

Ontology plugin in 0.8

2006-09-21 Thread Brian Ulicny
tologyImpl', trying the default What am I doing wrong? B Ulicny -- Brian Ulicny [EMAIL PROTECTED]