We can use source forge as the cvs,

In worst case we can use sf. However I would love to wait what Doug is thinking about having a sandbox repository in the nutch svn with limited access.


-----Original Message-----
From: TDLN [mailto:[EMAIL PROTECTED]
Sent: Saturday, June 03, 2006 10:25 AM
To: [email protected]
Subject: Re: Re[2]: Image Search

Dan - this sounds really good! Participation in an Open Source project
is new to me as well, but hey, that's why we get to start in the
sandbox :)

I was also thinking about source control. We definitely need a
repository, don't you think?

Rgrds, Thomas

On 6/3/06, Dan Morrill <[EMAIL PROTECTED]> wrote:
Well I can do the project management side of it, and can volunteer some time, but have never done this in an open source model before. But I can
do
documentation, project management support, and make a decent cheer leader
as
well.

Let me know.
r/d

-----Original Message-----
From: TDLN [mailto:[EMAIL PROTECTED]
Sent: Saturday, June 03, 2006 9:59 AM
To: [email protected]
Subject: Re: Re[2]: Image Search

Ok, I created a Jira Issue for this:

http://issues.apache.org/jira/browse/NUTCH-296

I did not assign the Issue to any component. Maybe we can have a
"Sandbox" component?

Now, the question is how we can support several people working on this
from a "project management" or code management perspective?

I mean, if we want the Sandbox to flourish, we need some kind of
infrastructure, right?

Rgrds, Thomas Delnoij



On 6/3/06, Dan Morrill <[EMAIL PROTECTED]> wrote:
Sounds like everyone, even me is interested in being able to provide
this
service.

If the process requires that we break it off of nutch code, what all
would
be required to make this happen?

r/d

-----Original Message-----
From: Zaheed Haque [mailto:[EMAIL PROTECTED]
Sent: Saturday, June 03, 2006 9:28 AM
To: [email protected]
Subject: Re: Re[2]: Image Search

Yes! I am very interested.

Regards


On 6/3/06, Dima Mazmanov <[EMAIL PROTECTED]> wrote:
Hi,Stefan.

That would be great!!!
I think many people would vote for this.
Since nutch is really powerfull search engine, it would be nice to
see several types of search in it.

You wrote 3 июня 2006 г., 20:17:06:

Having a image search component for nutch would be nice.
However I think we need to implement this as a kind of separated
tool
outside of the nutch code itself, since it is not 100 %
integrateable
into the nutch code.
(E.G. Nutch define one url == one index document.)
May be this would be a nice project for a nutch sandbox.
If you like you can open an issue to request a nutch sandbox project
"image search".
If we got enough people vote for this issue we may have a chance to
got it created.

Stefan

Am 03.06.2006 um 10:38 schrieb TDLN:

I am interested in developing such a solution as well.

I am currently storing the thumbnails on the file system under a
system generated name. My indexing plugin stores the filename in
the
index. Thumbnails are later served to the client by seperate Apache
HTTP server. This required some changes but is otherwise pretty
straight forward and performs very well for my current 300.000+
images, around 15kb each.

If you are developing the more "Nutch-like" solution I could
contribute to that. For instance; I have some code that generates
the
thumbs using ImageJ that yields very good results.

But I would definitely need some guidance in writing the hadoop map
reduce job. we could even contribute this back and base a small
tutorial on this work.

What do you think?

Rgrds, Thomas

On 6/2/06, Stefan Groschupf <[EMAIL PROTECTED]> wrote:
Hi,
using search http is a bad idea, since you get many but not all
pages.
Just write a hadoop map reduce job that process the fetched
content
in your segments, that should be easy.
Storing images in a file system will be very slow as soon you have
too many.
I personal don't like databases since compared to nutch they are
slow
as a snail.
For a other project also related to images I had created a own
ImageWritable that contained the binary data of a compressed image
compared with some meta data.
If you use a MapFile finding a image based on a key should be very
fast. I think much faster than a database with binary content.

HTH
Stefan




Am 02.06.2006 um 21:10 schrieb Marco Pereira:

Hi Everybody,

I've got nutch to index images searching it's url and alt and
title
tags.
But the problem comes when storing the thumbnails.
I`ve indexed 3million images for a national search engine.
I was in doubt wheter I use a file system scheme or a database
to
store the
thumbnails.
The thumbnails are created with a script that gets the image
urls from
nutch index doing a search for http (search.jsp?query=http).

Do you have any tips, ideas on this?

Thanks you,
Marco

---------------------------------------------
blog: http://www.find23.org
company: http://www.media-style.com








__________ NOD32 1.1576 (20060602) Information __________

This message was checked by NOD32 antivirus system.
http://www.eset.com




--
Regards,
 Dima                          mailto:[EMAIL PROTECTED]









Reply via email to