> We can use source forge as the cvs,

In worst case we can use sf. However I would love to wait what Doug  
is thinking about having a sandbox repository in the nutch svn with  
limited access.

>
> -----Original Message-----
> From: TDLN [mailto:[EMAIL PROTECTED]
> Sent: Saturday, June 03, 2006 10:25 AM
> To: [email protected]
> Subject: Re: Re[2]: Image Search
>
> Dan - this sounds really good! Participation in an Open Source project
> is new to me as well, but hey, that's why we get to start in the
> sandbox :)
>
> I was also thinking about source control. We definitely need a
> repository, don't you think?
>
> Rgrds, Thomas
>
> On 6/3/06, Dan Morrill <[EMAIL PROTECTED]> wrote:
>> Well I can do the project management side of it, and can volunteer  
>> some
>> time, but have never done this in an open source model before. But  
>> I can
> do
>> documentation, project management support, and make a decent cheer  
>> leader
> as
>> well.
>>
>> Let me know.
>> r/d
>>
>> -----Original Message-----
>> From: TDLN [mailto:[EMAIL PROTECTED]
>> Sent: Saturday, June 03, 2006 9:59 AM
>> To: [email protected]
>> Subject: Re: Re[2]: Image Search
>>
>> Ok, I created a Jira Issue for this:
>>
>> http://issues.apache.org/jira/browse/NUTCH-296
>>
>> I did not assign the Issue to any component. Maybe we can have a
>> "Sandbox" component?
>>
>> Now, the question is how we can support several people working on  
>> this
>> from a "project management" or code management perspective?
>>
>> I mean, if we want the Sandbox to flourish, we need some kind of
>> infrastructure, right?
>>
>> Rgrds, Thomas Delnoij
>>
>>
>>
>> On 6/3/06, Dan Morrill <[EMAIL PROTECTED]> wrote:
>>> Sounds like everyone, even me is interested in being able to provide
> this
>>> service.
>>>
>>> If the process requires that we break it off of nutch code, what all
> would
>>> be required to make this happen?
>>>
>>> r/d
>>>
>>> -----Original Message-----
>>> From: Zaheed Haque [mailto:[EMAIL PROTECTED]
>>> Sent: Saturday, June 03, 2006 9:28 AM
>>> To: [email protected]
>>> Subject: Re: Re[2]: Image Search
>>>
>>> Yes! I am very interested.
>>>
>>> Regards
>>>
>>>
>>> On 6/3/06, Dima Mazmanov <[EMAIL PROTECTED]> wrote:
>>>> Hi,Stefan.
>>>>
>>>> That would be great!!!
>>>> I think many people would vote for this.
>>>> Since nutch is really  powerfull  search engine, it would be  
>>>> nice to
>>>> see several types of search in it.
>>>>
>>>> You wrote 3 июня 2006 г., 20:17:06:
>>>>
>>>>> Having a image search component for nutch would be nice.
>>>>> However I think we need to implement this as a kind of separated
> tool
>>>>> outside of the nutch code itself, since it is not 100 %
> integrateable
>>>>> into the nutch code.
>>>>> (E.G. Nutch define one url == one index document.)
>>>>> May be this would be a nice project for a nutch sandbox.
>>>>> If you like you can open an issue to request a nutch sandbox  
>>>>> project
>>>>> "image search".
>>>>> If we got enough people vote for this issue we may have a  
>>>>> chance to
>>>>> got it created.
>>>>
>>>>> Stefan
>>>>
>>>>> Am 03.06.2006 um 10:38 schrieb TDLN:
>>>>
>>>>>> I am interested in developing such a solution as well.
>>>>>>
>>>>>> I am currently storing the thumbnails on the file system under a
>>>>>> system generated name. My indexing plugin stores the filename in
> the
>>>>>> index. Thumbnails are later served to the client by seperate  
>>>>>> Apache
>>>>>> HTTP server. This required some changes but is otherwise pretty
>>>>>> straight forward and performs very well for my current 300.000+
>>>>>> images, around 15kb each.
>>>>>>
>>>>>> If you are developing the more "Nutch-like" solution I could
>>>>>> contribute to that. For instance; I have some code that generates
> the
>>>>>> thumbs using ImageJ that yields very good results.
>>>>>>
>>>>>> But I would definitely need some guidance in writing the  
>>>>>> hadoop map
>>>>>> reduce job. we could even contribute this back and base a small
>>>>>> tutorial on this work.
>>>>>>
>>>>>> What do you think?
>>>>>>
>>>>>> Rgrds, Thomas
>>>>>>
>>>>>> On 6/2/06, Stefan Groschupf <[EMAIL PROTECTED]> wrote:
>>>>>>> Hi,
>>>>>>> using search http is a bad idea, since you get many but not all
>>>>>>> pages.
>>>>>>> Just write a hadoop map reduce job that process the fetched
> content
>>>>>>> in your segments, that should be easy.
>>>>>>> Storing images in a file system will be very slow as soon you  
>>>>>>> have
>>>>>>> too many.
>>>>>>> I personal don't like databases since compared to nutch they are
>> slow
>>>>>>> as a snail.
>>>>>>> For a other project also related to images I had created a own
>>>>>>> ImageWritable that contained the binary data of a compressed  
>>>>>>> image
>>>>>>> compared with some meta data.
>>>>>>> If you use a MapFile finding a image based on a key should be  
>>>>>>> very
>>>>>>> fast. I think much faster than a database with binary content.
>>>>>>>
>>>>>>> HTH
>>>>>>> Stefan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Am 02.06.2006 um 21:10 schrieb Marco Pereira:
>>>>>>>
>>>>>>>> Hi Everybody,
>>>>>>>>
>>>>>>>> I've got nutch to index images searching it's url and alt and
>> title
>>>>>>>> tags.
>>>>>>>> But the problem comes when storing the thumbnails.
>>>>>>>> I`ve indexed 3million images for a national search engine.
>>>>>>>> I was in doubt wheter I use a file system scheme or a database
> to
>>>>>>>> store the
>>>>>>>> thumbnails.
>>>>>>>> The thumbnails are created with a script that gets the image
>>>>>>> urls from
>>>>>>>> nutch index doing a search for http (search.jsp?query=http).
>>>>>>>>
>>>>>>>> Do you have any tips, ideas on this?
>>>>>>>>
>>>>>>>> Thanks you,
>>>>>>>> Marco
>>>>>>>
>>>>>>> ---------------------------------------------
>>>>>>> blog: http://www.find23.org
>>>>>>> company: http://www.media-style.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> __________ NOD32 1.1576 (20060602) Information __________
>>>>
>>>>> This message was checked by NOD32 antivirus system.
>>>>> http://www.eset.com
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>>  Dima                          mailto:[EMAIL PROTECTED]
>>>>
>>>>
>>>
>>>
>>
>>
>
>



_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to