Re: [Nutch-general] Image Search

TDLN Sat, 03 Jun 2006 10:25:48 -0700

Dan - this sounds really good! Participation in an Open Source project
is new to me as well, but hey, that's why we get to start in the
sandbox :)


I was also thinking about source control. We definitely need a
repository, don't you think?

Rgrds, Thomas

On 6/3/06, Dan Morrill <[EMAIL PROTECTED]> wrote:
> Well I can do the project management side of it, and can volunteer some
> time, but have never done this in an open source model before. But I can do
> documentation, project management support, and make a decent cheer leader as
> well.
>
> Let me know.
> r/d
>
> -----Original Message-----
> From: TDLN [mailto:[EMAIL PROTECTED]
> Sent: Saturday, June 03, 2006 9:59 AM
> To: [email protected]
> Subject: Re: Re[2]: Image Search
>
> Ok, I created a Jira Issue for this:
>
> http://issues.apache.org/jira/browse/NUTCH-296
>
> I did not assign the Issue to any component. Maybe we can have a
> "Sandbox" component?
>
> Now, the question is how we can support several people working on this
> from a "project management" or code management perspective?
>
> I mean, if we want the Sandbox to flourish, we need some kind of
> infrastructure, right?
>
> Rgrds, Thomas Delnoij
>
>
>
> On 6/3/06, Dan Morrill <[EMAIL PROTECTED]> wrote:
> > Sounds like everyone, even me is interested in being able to provide this
> > service.
> >
> > If the process requires that we break it off of nutch code, what all would
> > be required to make this happen?
> >
> > r/d
> >
> > -----Original Message-----
> > From: Zaheed Haque [mailto:[EMAIL PROTECTED]
> > Sent: Saturday, June 03, 2006 9:28 AM
> > To: [email protected]
> > Subject: Re: Re[2]: Image Search
> >
> > Yes! I am very interested.
> >
> > Regards
> >
> >
> > On 6/3/06, Dima Mazmanov <[EMAIL PROTECTED]> wrote:
> > > Hi,Stefan.
> > >
> > > That would be great!!!
> > > I think many people would vote for this.
> > > Since nutch is really  powerfull  search engine, it would be nice to
> > > see several types of search in it.
> > >
> > > You wrote 3 июня 2006 г., 20:17:06:
> > >
> > > > Having a image search component for nutch would be nice.
> > > > However I think we need to implement this as a kind of separated tool
> > > > outside of the nutch code itself, since it is not 100 % integrateable
> > > > into the nutch code.
> > > > (E.G. Nutch define one url == one index document.)
> > > > May be this would be a nice project for a nutch sandbox.
> > > > If you like you can open an issue to request a nutch sandbox project
> > > > "image search".
> > > > If we got enough people vote for this issue we may have a chance to
> > > > got it created.
> > >
> > > > Stefan
> > >
> > > > Am 03.06.2006 um 10:38 schrieb TDLN:
> > >
> > > >> I am interested in developing such a solution as well.
> > > >>
> > > >> I am currently storing the thumbnails on the file system under a
> > > >> system generated name. My indexing plugin stores the filename in the
> > > >> index. Thumbnails are later served to the client by seperate Apache
> > > >> HTTP server. This required some changes but is otherwise pretty
> > > >> straight forward and performs very well for my current 300.000+
> > > >> images, around 15kb each.
> > > >>
> > > >> If you are developing the more "Nutch-like" solution I could
> > > >> contribute to that. For instance; I have some code that generates the
> > > >> thumbs using ImageJ that yields very good results.
> > > >>
> > > >> But I would definitely need some guidance in writing the hadoop map
> > > >> reduce job. we could even contribute this back and base a small
> > > >> tutorial on this work.
> > > >>
> > > >> What do you think?
> > > >>
> > > >> Rgrds, Thomas
> > > >>
> > > >> On 6/2/06, Stefan Groschupf <[EMAIL PROTECTED]> wrote:
> > > >>> Hi,
> > > >>> using search http is a bad idea, since you get many but not all
> > > >>> pages.
> > > >>> Just write a hadoop map reduce job that process the fetched content
> > > >>> in your segments, that should be easy.
> > > >>> Storing images in a file system will be very slow as soon you have
> > > >>> too many.
> > > >>> I personal don't like databases since compared to nutch they are
> slow
> > > >>> as a snail.
> > > >>> For a other project also related to images I had created a own
> > > >>> ImageWritable that contained the binary data of a compressed image
> > > >>> compared with some meta data.
> > > >>> If you use a MapFile finding a image based on a key should be very
> > > >>> fast. I think much faster than a database with binary content.
> > > >>>
> > > >>> HTH
> > > >>> Stefan
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> Am 02.06.2006 um 21:10 schrieb Marco Pereira:
> > > >>>
> > > >>> > Hi Everybody,
> > > >>> >
> > > >>> > I've got nutch to index images searching it's url and alt and
> title
> > > >>> > tags.
> > > >>> > But the problem comes when storing the thumbnails.
> > > >>> > I`ve indexed 3million images for a national search engine.
> > > >>> > I was in doubt wheter I use a file system scheme or a database to
> > > >>> > store the
> > > >>> > thumbnails.
> > > >>> > The thumbnails are created with a script that gets the image
> > > >>> urls from
> > > >>> > nutch index doing a search for http (search.jsp?query=http).
> > > >>> >
> > > >>> > Do you have any tips, ideas on this?
> > > >>> >
> > > >>> > Thanks you,
> > > >>> > Marco
> > > >>>
> > > >>> ---------------------------------------------
> > > >>> blog: http://www.find23.org
> > > >>> company: http://www.media-style.com
> > > >>>
> > > >>>
> > > >>>
> > > >>
> > >
> > >
> > >
> > >
> > > > __________ NOD32 1.1576 (20060602) Information __________
> > >
> > > > This message was checked by NOD32 antivirus system.
> > > > http://www.eset.com
> > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > >  Dima                          mailto:[EMAIL PROTECTED]
> > >
> > >
> >
> >
>
>

_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Re: [Nutch-general] Image Search

Reply via email to