Re: crawling local file system

2009-10-03 Thread Niall Pemberton
The FAQ has the following: http://wiki.apache.org/nutch/FAQ#How_do_I_index_my_local_file_system.3F Niall On Sat, Oct 3, 2009 at 2:48 PM, jkimathi wrote: > > Hi, > I have installed nutch on Ubuntu 8.04 and I need it to search the local file > system. How can I configure nutch to

crawling local file system

2009-10-03 Thread jkimathi
Hi, I have installed nutch on Ubuntu 8.04 and I need it to search the local file system. How can I configure nutch to achieve this? How also can I map the local file system to the http interfacce? Regards, John. -- View this message in context: http://www.nabble.com/crawling-local-file-system

Re: File system

2008-12-16 Thread Dennis Kubes
If you are talking about Nutch Contents which are stored in the segments during fetching of pages, then you would need to write MapReduce job to read in the Contents object and do whatever processing you desire. Dennis oSilvio wrote: Very useful information, thanks! But in order to extract t

Re: File system

2008-12-16 Thread oSilvio
t;> files and writable formats. >> >> Dennis >> >> oSilvio wrote: >>> Do somebody know how do the file structure works, briefly? >>> It seems that the data are compressed or something, its not possible to >>> understand whats recorded in th

Re: File system

2008-12-16 Thread oSilvio
; It seems that the data are compressed or something, its not possible to >> understand whats recorded in the data nor index files. >> Thanks >> Silvio > > -- View this message in context: http://www.nabble.com/File-system-tp21022587p21032357.html Sent from the Nutch - Dev mailing list archive at Nabble.com.

Re: File system

2008-12-15 Thread Dennis Kubes
The nutch databases are either SequenceFile or MapFile formats which store key and value pairs. Their keys and values are Writable implementations which translate an object into it byte equivalent and vice versa. Data and index files are MapFile format. Data is a SequenceFile, index is an i

File system

2008-12-15 Thread oSilvio
Do somebody know how do the file structure works, briefly? It seems that the data are compressed or something, its not possible to understand whats recorded in the data nor index files. Thanks Silvio -- View this message in context: http://www.nabble.com/File-system-tp21022587p21022587.html

Enable Nutch to search for local file system

2007-12-23 Thread Torontoer
Hi guys, I am very new to Ntuch. I did follow the Nutch web site for configuration and it works fine eventually. However, the web site didn't mention how to configure the nutch to search for local file system. If anyone has experience on it, please help. Thanks Benson -- View this me

Solved: Downloading file types to file system

2007-10-09 Thread eyal edri
; > > > property in nutch-default.xml) > > > > 3. create a dump with the "readseg" command and the "-dump" option > > > > 4. process the dump file and cut out what is necessary > > > > > > > > Just interested if that could work . .

Re: Downloading file types to file system

2007-10-09 Thread eyal edri
work . . . however: > > > I had a look at the class implementing the readseg command and found > > > that > > > the dump file is created with a "PrintWriter". This will create > > > trouble I > > > think. Maybe you can modify the SegmentReader (

Re: Downloading file types to file system

2007-09-22 Thread eyal edri
ader (use an OutputStream). > > Regarding the fetcher - it's using a binary stream to store the content > > (FSDataOutputStream). > > > > > > Cheers, > > > > Martin > > > > > > On 9/11/07, eyal edri <[EMAIL PROTECTED]> wrote: &g

Re: Downloading file types to file system

2007-09-20 Thread eyal edri
the SegmentReader (use an OutputStream). > Regarding the fetcher - it's using a binary stream to store the content > (FSDataOutputStream). > > > Cheers, > > Martin > > > On 9/11/07, eyal edri <[EMAIL PROTECTED]> wrote: > > > > Hi, > > > > I&

Downloading file types to file system

2007-09-11 Thread eyal edri
Hi, I've asked this question before on a different mail list, with no real response. I hope someone saw the need for this actions and could help. I'm trying to config nutch to download certain file types (exe/zip) to the file system while crawling. I know nutch doesn't have a p

Re: File system watching for intranets

2006-09-13 Thread Ben Ogle
a map/reduce job to generate fetch list, fetch, update, etc. - name node (master node?) would be notified of the change to the file system and index is updated I don't really know how well that would work, though. Can slave nodes can start map/reduce jobs? Should they? Would the ta

Re: File system watching for intranets

2006-09-13 Thread Michael Wechner
content and a simpler solution, IMO, would be to monitor file system events and just recrawl the necessary pages each time something changes. That way our index would always be up to date and there would be no reason to do a brute force recrawl every night. I am willing to write this functionality

File system watching for intranets

2006-09-12 Thread Ben Ogle
simpler solution, IMO, would be to monitor file system events and just recrawl the necessary pages each time something changes. That way our index would always be up to date and there would be no reason to do a brute force recrawl every night. I am willing to write this functionality and contribute it

Re: best file system for NDFS?

2005-12-13 Thread Leen Toelen
t;[EMAIL PROTECTED]> wrote: > Stefan Groschupf wrote: > > > Hi geeks, > > > > I have not that much much deep knowledge about the unix file systems, > > so my questions what would be the best file system for nutch > > distributed file systems data nodes? > >

Re: best file system for NDFS?

2005-12-13 Thread Rod Taylor
On Tue, 2005-12-13 at 21:43 +0100, Andrzej Bialecki wrote: > > Most of the time we deal with very large files, with sequential > access. > Only in few places we deal with a lot of small files (e.g. indexing). > So, I think the best would be an FS optimized for efficient > sequential > write/rea

Re: best file system for NDFS?

2005-12-13 Thread Andrzej Bialecki
Stefan Groschupf wrote: Hi geeks, I have not that much much deep knowledge about the unix file systems, so my questions what would be the best file system for nutch distributed file systems data nodes? Does it make any different using the one or the other file system? Would reiserFS a

best file system for NDFS?

2005-12-13 Thread Stefan Groschupf
Hi geeks, I have not that much much deep knowledge about the unix file systems, so my questions what would be the best file system for nutch distributed file systems data nodes? Does it make any different using the one or the other file system? Would reiserFS a good choice? Thanks for any

Re: use nutch file system independence ...

2005-09-18 Thread Doug Cutting
NDFS is not recommended in 0.7. The version of NDFS in the mapred branch is much improved. Note however that the mapred branch is substantially different than 0.7 and is still incomplete. Doug Transbuerg Tian wrote: hi, all friends, I download nutch0.7 ,and want use ndfs independence. so

use nutch file system independence ...

2005-09-18 Thread Transbuerg Tian
hi, all friends, I download nutch0.7 ,and want use ndfs independence. so , First I start NameNode , It sucessfuly started , console m essage is below: -- D:\workspace\nutch_src\bin>java -cp D:\ja