from:"alessio crisantemi"

Re: exclude some urls from crawling

2012-04-13 Thread alessio crisantemi

org.apache.nutch.net.URLFilterChecker -allCombined Remi On Tuesday, April 10, 2012, alessio crisantemi wrote: Dear All, I try to exclude some urls of my website to the crawling process, but without success. For exclude it, I add this code on my regex-urlfilter.txt file BEFORE to write the home page

Re: request about snippets (with attachement)

2012-04-07 Thread alessio crisantemi

and port number plus search query... If you can provide the URL you wish to remove some particular HTML tag from then at least we can see what it is that you are having trouble with. Sorry if I've not made myself clear enough. Lewis 2012/4/6 alessio crisantemi alessio.crisant...@gmail.com

Re: request about snippets (with attachement)

2012-04-07 Thread alessio crisantemi

to speed with plugins on our wiki.[0] Once you have something that requires help get on to the list and let us know. Lewis [0] http://wiki.apache.org/nutch/PluginCentral On Sat, Apr 7, 2012 at 2:33 PM, alessio crisantemi alessio.crisant...@gmail.com wrote: may be it'd my cause with my

Re: request about snippets (with attachement)

2012-04-06 Thread alessio crisantemi

? 2012/4/6 alessio crisantemi alessio.crisant...@gmail.com any suggestions for my cause? Il giorno 05 aprile 2012 23:20, alessio crisantemi alessio.crisant...@gmail.com ha scritto: here a part of results: [2] Live Score - GiocoNews - Tutto su casinò, poker, giochi online http

Fwd: request about snippets (with attachement)

2012-04-06 Thread alessio crisantemi

or this: http://pc-alessio:8983/*WoWSolrWebApp/search?query=giocosubmit=Search* -- Messaggio inoltrato -- Da: alessio crisantemi alessio.crisant...@gmail.com Date: 06 aprile 2012 22:42 Oggetto: Re: request about snippets (with attachement) A: user@nutch.apache.org that's can

Fwd: request about snippets (with attachement)

2012-04-05 Thread alessio crisantemi

-- Messaggio inoltrato -- Da: alessio crisantemi alessio.crisant...@gmail.com Date: 05 aprile 2012 22:32 Oggetto: request about snippets A: user@nutch.apache.org Dear all, I configured my Nutch (1.4) for works with Solr (1.4.1) and I crawl and index with success my website. I

Re: request about snippets (with attachement)

2012-04-05 Thread alessio crisantemi

scritto: Hi Alessio, You need to determine in which field the unwanted content exists. Once you've done this you could write an indexing filter to remove this from your document prior to indexing. Lewis On Thu, Apr 5, 2012 at 9:41 PM, alessio crisantemi alessio.crisant...@gmail.com wrote

Re: request about snippets (with attachement)

2012-04-05 Thread alessio crisantemi

wrote: I can't see any of your attachments as they're not permitted on list. Can you provide an URL? On Thu, Apr 5, 2012 at 9:56 PM, alessio crisantemi alessio.crisant...@gmail.com wrote: Dear Lewis, thank you for your fast reply. But just thiat's my problem! I don't compred wich

Re: request about snippets (with attachement)

2012-04-05 Thread alessio crisantemi

on list. Can you provide an URL? On Thu, Apr 5, 2012 at 9:56 PM, alessio crisantemi alessio.crisant...@gmail.com wrote: Dear Lewis, thank you for your fast reply. But just thiat's my problem! I don't compred wich is the field that crates this raw. But I see a date (eg: Mercoledì Apr 04

Re: crawling a website

2012-04-02 Thread alessio crisantemi

/alpha, http://ww.mywebsite.com/beta , http://ww.mywebsite.com/gamma *- ^http://ww.mywebsite.com/.*/$* This will exclude any URL that ends with / I would suggest you get familiar with regular expressions (in case you don't yet) Remi On Sun, Apr 1, 2012 at 6:27 PM, alessio crisantemi

Re: nutch crawling file system SOLVED

2012-03-17 Thread alessio crisantemi

name=urlfile:/C:/Documents and Settings/Alessio/Documenti//str /doc suggestions? tx alessio Il giorno 12 marzo 2012 09:39, alessio crisantemi alessio.crisant...@gmail.com ha scritto: I add the path of my directory on regex-urlfilter but nutch crawl also other directories... And more: I follow

Re: nutch crawling file system SOLVED

2012-03-17 Thread alessio crisantemi

I would that the result of my search be the text of my pdf file and not the list of documents into the directory and the path address.. Il giorno 17 marzo 2012 21:11, Lewis John Mcgibbney lewis.mcgibb...@gmail.com ha scritto: Hi Alessio, On Sat, Mar 17, 2012 at 5:31 PM, alessio crisantemi

Re: nutch crawling file system SOLVED

2012-03-12 Thread alessio crisantemi

06:06, remi tassing tassingr...@gmail.com ha scritto: Using crawl-ulrfilter (or regex-urlfilter depending on which one you're using), you should be able to solve this. Unless you're not clear on what folders to exclude...? On Sunday, March 11, 2012, alessio crisantemi alessio.crisant

Re: nutch crawling file system SOLVED

2012-03-11 Thread alessio crisantemi

file, I have just this raw...And that's not a simple mode There is another method, I suppose? thank you alessio Il giorno 11 marzo 2012 18:32, Lewis John Mcgibbney lewis.mcgibb...@gmail.com ha scritto: Please see below On Sun, Mar 11, 2012 at 5:10 PM, alessio crisantemi alessio.crisant

Re: nutch crawling file system SOLVED

2012-03-10 Thread alessio crisantemi

that the guide write for the crawl-urlfilter on regex-urlfilter, all works. I would know this case. thank you alessio Il giorno 04 marzo 2012 17:02, alessio crisantemi alessio.crisant...@gmail.com ha scritto: Hi all, I need to crawl a directory with a lot of pdf file. But I know onlye the step-by-step

Re: nutch craling file system

2012-03-04 Thread alessio crisantemi

/nutch/FAQ#How_do_I_index_my_local_file_system.3F [2] http://www.folge2.de/tp/search/1/crawling-the-local-filesystem-with-nutch [3] http://stackoverflow.com/questions/941519/how-to-make-nutch-crawl-file-system On Sun, Mar 4, 2012 at 5:02 PM, alessio crisantemi alessio.crisant...@gmail.com

Re: exclude some urls from crawling

Re: request about snippets (with attachement)

Re: request about snippets (with attachement)

Re: request about snippets (with attachement)

Fwd: request about snippets (with attachement)

Fwd: request about snippets (with attachement)

Re: request about snippets (with attachement)

Re: request about snippets (with attachement)

Re: request about snippets (with attachement)

Re: crawling a website

Re: nutch crawling file system SOLVED

Re: nutch crawling file system SOLVED

Re: nutch crawling file system SOLVED

Re: nutch crawling file system SOLVED

Re: nutch crawling file system SOLVED

Re: nutch craling file system

16 matches

Site Navigation

Mail list logo

Footer information