whole query. If you had queried
on [Italian wine], I don't believe it would match. In any case, what it
does out of the box is pretty crude, but the infrastructure is probably
there for doing something more interesting.
Brian Ulicny
On Thu, 07 Jan 2010 17:21 +0100, "Claudio Marte
Here's a recently announced event search engine:
http://searchengineland.com/what-where-when-travel-local-search-combine-goby-com-26395
Just heard of it today.
Brian Ulicny
On Wed, 23 Sep 2009 09:27 +0200, "Michael Wechner"
wrote:
> Mitia Notaras schrieb:
> > Hi ther
sv&start=100&sa=N
> >>>>
> >>>> And all I get is "no more URLs to fetch"
> >>>>
> >>>> The reason for why I want to do this is because I had a tought on maby
> >>>> I
> >>>> could use google to generate my start list of urls by injecting pages
> >>&
This again
will help identify self-hosted blogs, which are the main issue here.
In any case, there is no foolproof formula for identifying all blogs by
URL format only, so if you only want to crawl blogs, you'll have to do
some upfront work to identify the ones you want to crawl.
Brian Ulicny
got it.
On Wed, 14 Jan 2009 01:32:25 +0530, "Mayank Kamthan"
said:
> test to check if i am able to post to this..
> somebody plz reply to confirm as just now i received a mailer daemon for
> the
> last mai.
>
> regards,
> Mayank.
>
> --
> Mayank Kamt
ght index
> database--that is,"crawl" in my case
> still thanks for your advice
> Mr Shore
>
> Brian Ulicny wrote:
> >
> > No, that's a relative path. It says, look inside the subdirectory root
> > of the current directory from which you've d
On Tue, 7 Oct 2008 07:51:57 -0700 (PDT), "Mr Shore"
<[EMAIL PROTECTED]> said:
>
> you mean in nutch-site.xml,the value of "seach.dir"?
> it's already absolute value,
> in my case,it's /root/crawl
> Mr Shore
>
> Brian Ulicny wrote:
tly appreciated!
> --
> View this message in context:
> http://www.nabble.com/issue-with-search.jsp-in-nutch-0.9.war-tp19855907p19855907.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
--
Brian Ulicny
bulicny at alum dot mit dot edu
home: 781-721-5746
fax: 360-361-5746
r whatever.
Brian Ulicny
On Tue, 19 Aug 2008 22:47:19 -0700 (PDT), "dealmaker" <[EMAIL PROTECTED]>
said:
>
> Hi,
> I am looking for a list of most common anchor text (words and phases).
> Something like a list of stop words, but I am looking for one that is
> specif
"src", 0));
> > linkParams.put("script", new LinkParams("script", "src", 0));
> > linkParams.put("link", new LinkParams("link", "href", 0));
> > linkParams.put("img", new LinkParams("img", "src", 0));
> > }
> >
> > But nothing happens. These links are always ignored. In fact, the
> > print statement never prints.
> >
> > How can I extract these outlinks?
> >
> > Brian
> >
> >
> > --
> > Brian Ulicny
> > bulicny at alum dot mit dot edu
> > home: 781-721-5746
> > fax: 360-361-5746
> >
> >
>
> _
> In a rush? Get real-time answers with Windows Live Messenger.
> http://www.windowslive.com/messenger/overview.html?ocid=TXT_TAGLM_WL_Refresh_realtime_042008
--
Brian Ulicny
bulicny at alum dot mit dot edu
home: 781-721-5746
fax: 360-361-5746
linkParams.put("form", new LinkParams("form", "action", 1));
}
linkParams.put("frame", new LinkParams("frame", "src", 0));
linkParams.put("iframe", new LinkParams("iframe", "src", 0));
linkParams.put("
Have you tried using url:pdf ?
You'll need to use the index-more and query-more plugins for this.
Brian Ulicny
On Wed, 16 Apr 2008 06:12:33 -0700 (PDT), "oddaniel" <[EMAIL PROTECTED]>
said:
>
> Hi please how can I do a Nutch search for just PDF document results o
t;
> Direct: +41 (0)22 596 10 35
>
> Cross Systems - Groupe Micropole Univers
> Route des Acacias 45 B
> 1227 Carouge / Genève
> Tél: +41 (0)22 308 48 60
> Fax: +41 (0)22 308 48 68
>
>
>
>
>
>
> -Original Message-
> From: Brian Ulicny
y got indexed and you
are pointing at the right index.
Brian Ulicny
On Wed, 26 Mar 2008 11:50:43 +0100, "POIRIER David"
<[EMAIL PROTECTED]> said:
> Hello,
>
> I really need your help here please. I tried a few more things; I
> deleted my two plugins and instead of
Did you write a query filter?
Brian Ulicny
On Thu, 28 Feb 2008 19:40:58 +0100, "Syed Ahmed"
<[EMAIL PROTECTED]> said:
> yes i did it, and it also includes it when parsing...but the result is
> the
> same...i cant search for dc fields.
>
> On Thu, Feb 28, 2008 a
Did you include it in the list of plugins to be invoked in
nutch-site.xml? What happens when you try it?
Brian Ulicny
On Tue, 26 Feb 2008 20:28:22 +0100, "Syed Ahmed"
<[EMAIL PROTECTED]> said:
> hello,
> I have written a parser and indexer for dublin core metadata. is
at present? There's
something in the code that looks for application/rss+xml as the mime
type.
Brian Ulicny
On Thu, 11 Oct 2007 15:23:04 -0700, "Chris Mattmann"
<[EMAIL PROTECTED]> said:
> Hi Rick,
>
> Glad to hear that you're interested in using Nutch!
>
>
Or you could simply use this hack:
query [url:http date:200700206-20070326]
(assuming that all of your documents have URLs beginning with http.)
Brian Ulicny
On Wed, 05 Sep 2007 08:32:45 -0500, "Sagar Naik" <[EMAIL PROTECTED]> said:
> Hey,
> Two options :
> 1. s
.doGet
> (OpenSearchServlet.java:250)
> javax.servlet.http.HttpServlet.service(HttpServlet.java:690)
> javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
>
> note The full stack trace of the root cause is available in the
> Apache Tomcat/5.5.23 logs.
>
>
>
;apache"
> >
> > Using luke, query the same index using query
> > date:20070101-20070701 apache
> > returned 2 documents.
> >
> > What did I miss here? is there some logics implemented for "site:"
> > keyword that I have to do for other keyword?
> >
> > --Kevin
> >
> >
> >
>
--
Brian Ulicny
bulicny at alum dot mit dot edu
home: 781-721-5746
fax: 360-361-5746
The latter: text files with urls in them. They must be compatible with
your URL filter to actually get crawled, of course.
Brian Ulicny
On Wed, 23 May 2007 13:52:29 -0500, "Aaron Green" <[EMAIL PROTECTED]>
said:
> Thanks for your reply. I'm still a little cloudy ab
name of a directory, not the name of a file.
Hope that helps.
Brian Ulicny
On Wed, 23 May 2007 11:11:26 -0500, "Aaron Green" <[EMAIL PROTECTED]>
said:
> I have read through an archive message dealing with Nutch on Windows. It
> was helpful, but I'm still having proble
ou, after that proceed to more complex
> ontologies (with what
> I have no experience with)
>
> --
> Sami Siren
>
> Brian Ulicny wrote:
> > Florian,
> >
> > Thanks for replying: my nutch-site.xml under webapps was wrong.
> > Correcting it gets me a
> query-(basic|site|url)|lib-jakarta-poi|lib-lucene-analyzers|scoring-opic
>
>
> That's it...
>
> Regards
>
>
> Brian Ulicny wrote:
> > I'm trying to get the Ontology plugin to work in an 0.8 environment.
> >
> > I set extension.ontology.extension-nam
tologyImpl', trying the default
What am I doing wrong?
B Ulicny
--
Brian Ulicny
[EMAIL PROTECTED]
25 matches
Mail list logo