subject:"File system"

Re: crawling local file system

2009-10-03 Thread Niall Pemberton

The FAQ has the following: http://wiki.apache.org/nutch/FAQ#How_do_I_index_my_local_file_system.3F Niall On Sat, Oct 3, 2009 at 2:48 PM, jkimathi wrote: > > Hi, > I have installed nutch on Ubuntu 8.04 and I need it to search the local file > system. How can I configure nutch to

crawling local file system

2009-10-03 Thread jkimathi

Hi, I have installed nutch on Ubuntu 8.04 and I need it to search the local file system. How can I configure nutch to achieve this? How also can I map the local file system to the http interfacce? Regards, John. -- View this message in context: http://www.nabble.com/crawling-local-file-system

Re: File system

2008-12-16 Thread Dennis Kubes

If you are talking about Nutch Contents which are stored in the segments during fetching of pages, then you would need to write MapReduce job to read in the Contents object and do whatever processing you desire. Dennis oSilvio wrote: Very useful information, thanks! But in order to extract t

Re: File system

2008-12-16 Thread oSilvio

t;> files and writable formats. >> >> Dennis >> >> oSilvio wrote: >>> Do somebody know how do the file structure works, briefly? >>> It seems that the data are compressed or something, its not possible to >>> understand whats recorded in th

Re: File system

2008-12-16 Thread oSilvio

; It seems that the data are compressed or something, its not possible to >> understand whats recorded in the data nor index files. >> Thanks >> Silvio > > -- View this message in context: http://www.nabble.com/File-system-tp21022587p21032357.html Sent from the Nutch - Dev mailing list archive at Nabble.com.

Re: File system

2008-12-15 Thread Dennis Kubes

The nutch databases are either SequenceFile or MapFile formats which store key and value pairs. Their keys and values are Writable implementations which translate an object into it byte equivalent and vice versa. Data and index files are MapFile format. Data is a SequenceFile, index is an i

File system

2008-12-15 Thread oSilvio

Do somebody know how do the file structure works, briefly? It seems that the data are compressed or something, its not possible to understand whats recorded in the data nor index files. Thanks Silvio -- View this message in context: http://www.nabble.com/File-system-tp21022587p21022587.html

Enable Nutch to search for local file system

2007-12-23 Thread Torontoer

Hi guys, I am very new to Ntuch. I did follow the Nutch web site for configuration and it works fine eventually. However, the web site didn't mention how to configure the nutch to search for local file system. If anyone has experience on it, please help. Thanks Benson -- View this me

Solved: Downloading file types to file system

2007-10-09 Thread eyal edri

; > > > property in nutch-default.xml) > > > > 3. create a dump with the "readseg" command and the "-dump" option > > > > 4. process the dump file and cut out what is necessary > > > > > > > > Just interested if that could work . .

Re: Downloading file types to file system

2007-10-09 Thread eyal edri

work . . . however: > > > I had a look at the class implementing the readseg command and found > > > that > > > the dump file is created with a "PrintWriter". This will create > > > trouble I > > > think. Maybe you can modify the SegmentReader (

Re: Downloading file types to file system

2007-09-22 Thread eyal edri

ader (use an OutputStream). > > Regarding the fetcher - it's using a binary stream to store the content > > (FSDataOutputStream). > > > > > > Cheers, > > > > Martin > > > > > > On 9/11/07, eyal edri <[EMAIL PROTECTED]> wrote: &g

Re: Downloading file types to file system

2007-09-20 Thread eyal edri

the SegmentReader (use an OutputStream). > Regarding the fetcher - it's using a binary stream to store the content > (FSDataOutputStream). > > > Cheers, > > Martin > > > On 9/11/07, eyal edri <[EMAIL PROTECTED]> wrote: > > > > Hi, > > > > I&

Downloading file types to file system

2007-09-11 Thread eyal edri

Hi, I've asked this question before on a different mail list, with no real response. I hope someone saw the need for this actions and could help. I'm trying to config nutch to download certain file types (exe/zip) to the file system while crawling. I know nutch doesn't have a p

Re: File system watching for intranets

2006-09-13 Thread Ben Ogle

a map/reduce job to generate fetch list, fetch, update, etc. - name node (master node?) would be notified of the change to the file system and index is updated I don't really know how well that would work, though. Can slave nodes can start map/reduce jobs? Should they? Would the ta

Re: File system watching for intranets

2006-09-13 Thread Michael Wechner

content and a simpler solution, IMO, would be to monitor file system events and just recrawl the necessary pages each time something changes. That way our index would always be up to date and there would be no reason to do a brute force recrawl every night. I am willing to write this functionality

File system watching for intranets

2006-09-12 Thread Ben Ogle

simpler solution, IMO, would be to monitor file system events and just recrawl the necessary pages each time something changes. That way our index would always be up to date and there would be no reason to do a brute force recrawl every night. I am willing to write this functionality and contribute it

Re: best file system for NDFS?

2005-12-13 Thread Leen Toelen

t;[EMAIL PROTECTED]> wrote: > Stefan Groschupf wrote: > > > Hi geeks, > > > > I have not that much much deep knowledge about the unix file systems, > > so my questions what would be the best file system for nutch > > distributed file systems data nodes? > >

Re: best file system for NDFS?

2005-12-13 Thread Rod Taylor

On Tue, 2005-12-13 at 21:43 +0100, Andrzej Bialecki wrote: > > Most of the time we deal with very large files, with sequential > access. > Only in few places we deal with a lot of small files (e.g. indexing). > So, I think the best would be an FS optimized for efficient > sequential > write/rea

Re: best file system for NDFS?

2005-12-13 Thread Andrzej Bialecki

Stefan Groschupf wrote: Hi geeks, I have not that much much deep knowledge about the unix file systems, so my questions what would be the best file system for nutch distributed file systems data nodes? Does it make any different using the one or the other file system? Would reiserFS a

best file system for NDFS?

2005-12-13 Thread Stefan Groschupf

Hi geeks, I have not that much much deep knowledge about the unix file systems, so my questions what would be the best file system for nutch distributed file systems data nodes? Does it make any different using the one or the other file system? Would reiserFS a good choice? Thanks for any

Re: use nutch file system independence ...

2005-09-18 Thread Doug Cutting

NDFS is not recommended in 0.7. The version of NDFS in the mapred branch is much improved. Note however that the mapred branch is substantially different than 0.7 and is still incomplete. Doug Transbuerg Tian wrote: hi, all friends, I download nutch0.7 ,and want use ndfs independence. so

use nutch file system independence ...

2005-09-18 Thread Transbuerg Tian

hi, all friends, I download nutch0.7 ,and want use ndfs independence. so , First I start NameNode , It sucessfuly started , console m essage is below: -- D:\workspace\nutch_src\bin>java -cp D:\ja

Re: crawling local file system

crawling local file system

Re: File system

Re: File system

Re: File system

Re: File system

File system

Enable Nutch to search for local file system

Solved: Downloading file types to file system

Re: Downloading file types to file system

Re: Downloading file types to file system

Re: Downloading file types to file system

Downloading file types to file system

Re: File system watching for intranets

Re: File system watching for intranets

File system watching for intranets

Re: best file system for NDFS?

Re: best file system for NDFS?

Re: best file system for NDFS?

best file system for NDFS?

Re: use nutch file system independence ...

use nutch file system independence ...

22 matches

Site Navigation

Mail list logo

Footer information