Dan - this sounds really good! Participation in an Open Source project is new to me as well, but hey, that's why we get to start in the sandbox :)
I was also thinking about source control. We definitely need a repository, don't you think? Rgrds, Thomas On 6/3/06, Dan Morrill <[EMAIL PROTECTED]> wrote: > Well I can do the project management side of it, and can volunteer some > time, but have never done this in an open source model before. But I can do > documentation, project management support, and make a decent cheer leader as > well. > > Let me know. > r/d > > -----Original Message----- > From: TDLN [mailto:[EMAIL PROTECTED] > Sent: Saturday, June 03, 2006 9:59 AM > To: [email protected] > Subject: Re: Re[2]: Image Search > > Ok, I created a Jira Issue for this: > > http://issues.apache.org/jira/browse/NUTCH-296 > > I did not assign the Issue to any component. Maybe we can have a > "Sandbox" component? > > Now, the question is how we can support several people working on this > from a "project management" or code management perspective? > > I mean, if we want the Sandbox to flourish, we need some kind of > infrastructure, right? > > Rgrds, Thomas Delnoij > > > > On 6/3/06, Dan Morrill <[EMAIL PROTECTED]> wrote: > > Sounds like everyone, even me is interested in being able to provide this > > service. > > > > If the process requires that we break it off of nutch code, what all would > > be required to make this happen? > > > > r/d > > > > -----Original Message----- > > From: Zaheed Haque [mailto:[EMAIL PROTECTED] > > Sent: Saturday, June 03, 2006 9:28 AM > > To: [email protected] > > Subject: Re: Re[2]: Image Search > > > > Yes! I am very interested. > > > > Regards > > > > > > On 6/3/06, Dima Mazmanov <[EMAIL PROTECTED]> wrote: > > > Hi,Stefan. > > > > > > That would be great!!! > > > I think many people would vote for this. > > > Since nutch is really powerfull search engine, it would be nice to > > > see several types of search in it. > > > > > > You wrote 3 июня 2006 г., 20:17:06: > > > > > > > Having a image search component for nutch would be nice. > > > > However I think we need to implement this as a kind of separated tool > > > > outside of the nutch code itself, since it is not 100 % integrateable > > > > into the nutch code. > > > > (E.G. Nutch define one url == one index document.) > > > > May be this would be a nice project for a nutch sandbox. > > > > If you like you can open an issue to request a nutch sandbox project > > > > "image search". > > > > If we got enough people vote for this issue we may have a chance to > > > > got it created. > > > > > > > Stefan > > > > > > > Am 03.06.2006 um 10:38 schrieb TDLN: > > > > > > >> I am interested in developing such a solution as well. > > > >> > > > >> I am currently storing the thumbnails on the file system under a > > > >> system generated name. My indexing plugin stores the filename in the > > > >> index. Thumbnails are later served to the client by seperate Apache > > > >> HTTP server. This required some changes but is otherwise pretty > > > >> straight forward and performs very well for my current 300.000+ > > > >> images, around 15kb each. > > > >> > > > >> If you are developing the more "Nutch-like" solution I could > > > >> contribute to that. For instance; I have some code that generates the > > > >> thumbs using ImageJ that yields very good results. > > > >> > > > >> But I would definitely need some guidance in writing the hadoop map > > > >> reduce job. we could even contribute this back and base a small > > > >> tutorial on this work. > > > >> > > > >> What do you think? > > > >> > > > >> Rgrds, Thomas > > > >> > > > >> On 6/2/06, Stefan Groschupf <[EMAIL PROTECTED]> wrote: > > > >>> Hi, > > > >>> using search http is a bad idea, since you get many but not all > > > >>> pages. > > > >>> Just write a hadoop map reduce job that process the fetched content > > > >>> in your segments, that should be easy. > > > >>> Storing images in a file system will be very slow as soon you have > > > >>> too many. > > > >>> I personal don't like databases since compared to nutch they are > slow > > > >>> as a snail. > > > >>> For a other project also related to images I had created a own > > > >>> ImageWritable that contained the binary data of a compressed image > > > >>> compared with some meta data. > > > >>> If you use a MapFile finding a image based on a key should be very > > > >>> fast. I think much faster than a database with binary content. > > > >>> > > > >>> HTH > > > >>> Stefan > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> Am 02.06.2006 um 21:10 schrieb Marco Pereira: > > > >>> > > > >>> > Hi Everybody, > > > >>> > > > > >>> > I've got nutch to index images searching it's url and alt and > title > > > >>> > tags. > > > >>> > But the problem comes when storing the thumbnails. > > > >>> > I`ve indexed 3million images for a national search engine. > > > >>> > I was in doubt wheter I use a file system scheme or a database to > > > >>> > store the > > > >>> > thumbnails. > > > >>> > The thumbnails are created with a script that gets the image > > > >>> urls from > > > >>> > nutch index doing a search for http (search.jsp?query=http). > > > >>> > > > > >>> > Do you have any tips, ideas on this? > > > >>> > > > > >>> > Thanks you, > > > >>> > Marco > > > >>> > > > >>> --------------------------------------------- > > > >>> blog: http://www.find23.org > > > >>> company: http://www.media-style.com > > > >>> > > > >>> > > > >>> > > > >> > > > > > > > > > > > > > > > > __________ NOD32 1.1576 (20060602) Information __________ > > > > > > > This message was checked by NOD32 antivirus system. > > > > http://www.eset.com > > > > > > > > > > > > > > > -- > > > Regards, > > > Dima mailto:[EMAIL PROTECTED] > > > > > > > > > > > > _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
