[Nutch-general] Re: pooling for nutch bean

Raghavendra Prabhu Sun, 08 Jan 2006 12:24:57 -0800

I know that in the current code
There is no chance of multiple bean instances


But say you have sites of movies saved under a particular index
and sites of songs saved under a particular folder

In this case we have to extend so that user can search different things.

The basic parameter which we pass to the NutchBean is index,segment file

So the user can extend what we want to see by implementing his own code
which instantiates
NutchBean

This will be an added feature and in this case if the user decides so there
will be multiple NutchBean instances(each corresponding to its own search
scope)

This type of restricting search scope cannot be done using query filters

It is very much possible to have Multiple NutchBeans but we have to consider
how this slows up

How much memory NutchBean consumes and what are the File readers that the
Indexsearcher opens.



Rgds
Raghavendra Prabhu



On 1/8/06, Stefan Groschupf <[EMAIL PROTECTED]> wrote:
>
> Limiting search to a subset of documents is normally down via query
> filters.
> From my point of view have several nutch beans will slow dwn things,
> since you have to open several index searcher, detailer, tcp ip
> connections etc.
> I'm actually not sure if it is possible to have multiple bean instances.
>
> Stefan
>
> Am 08.01.2006 um 16:30 schrieb Raghavendra Prabhu:
>
> > I would like to explain the benefits of pooling -
> >
> > Let it say we try to have a categorisation.
> >
> > Say that nutch needs to search different set of indexes and the
> > user will be
> > able to select the categories
> >
> > In this scenario the nutchbeans instantiated will be different
> >
> > For ex folderone has some segments
> >           foldertwo has some segments
> >
> > The user should be able to search either folderone or foldertwo or
> > both of
> > them
> >
> > If he selects folderone -a  nutchbean will be created
> > if he selects foldertwo - a nutchbean will be created
> >
> > In this case the same bean cannot be made persistent as the beans will
> > different
> >
> > So by implementing pooling for nutchbean we can scale nutch to  a
> > higher
> > level which can support the above combination .
> >
> > In the above case,different users select different nutchbeans
> > (according to
> > what they want to search)
> >
> > We can provide a solution to the above problem later on in case we
> > implement
> > pooling .
> >
> > Please let me know if what i am telling is correct.If i am wrong
> > please feel
> > free to correct me.
> >
> > Rgds
> > Raghavendra Prabhu
> >
> >
> >
> > On 1/8/06, Stefan Groschupf <[EMAIL PROTECTED]> wrote:
> >>
> >> If it is cached in the servlet context than the bean is shared
> >> between all users, isn't it?
> >>
> >>
> >>
> >> Am 08.01.2006 um 07:35 schrieb Raghavendra Prabhu:
> >>
> >>> I know that it is cached in the servlet context
> >>>
> >>> But the mechanism of pooling will help nutchbean be shared across
> >>> different
> >>> users.
> >>>
> >>> Whereas right now it will be persistent across a single user .
> >>>
> >>> So people accessing from different machines will use the same
> >>> nutchbean and
> >>> again check it back into the pool
> >>>
> >>> So in a way -it provides persistence across different users
> >>>
> >>> Rgds
> >>> Prabhu
> >>>
> >>>
> >>> On 1/6/06, Stefan Groschupf <[EMAIL PROTECTED] > wrote:
> >>>>
> >>>> NutchBean is cached in the servlet context. So I guess it is not
> >>>> recreated for each request.
> >>>>
> >>>> Stefan
> >>>>
> >>>> Am 05.01.2006 um 22:38 schrieb Raghavendra Prabhu:
> >>>>
> >>>>> No i dont think so
> >>>>>
> >>>>> What i am suggesting is we have nutch beans instantiated and we
> >>>>> store it .
> >>>>>
> >>>>> Whenever an user comes and searches ,he will be given a
> >>>>> NutchBean .
> >>>>>
> >>>>>
> >>>>> After he searches he returns it to the pool and during the same
> >>>>> time when
> >>>>> some one searches he would get the same bean (note new bean is not
> >>>>> created )
> >>>>>
> >>>>> Only if a bean is not available ,does a new bean get created .
> >>>>>
> >>>>> This makes it faster as different users share the same NutchBean
> >>>>> and it does
> >>>>> not create a new nutchbean
> >>>>>
> >>>>> Note:NutchBean is shared across different users whereas right now
> >>>>> it is only
> >>>>> for a single user and garabage collected
> >>>>>
> >>>>> Here we control the NutchBean instantiation and we have to come up
> >>>>> with a
> >>>>> way to free it .
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 1/6/06, Byron Miller <[EMAIL PROTECTED]> wrote:
> >>>>>>
> >>>>>> If i'm not mistaken doesn't the opensearch servlet get
> >>>>>> around this issue? You could then post process the xml
> >>>>>> through a stylesheet/css or your favorite scripting
> >>>>>> language.
> >>>>>>
> >>>>>> -byron
> >>>>>>
> >>>>>> --- Raghavendra Prabhu < [EMAIL PROTECTED]> wrote:
> >>>>>>
> >>>>>>> Right now
> >>>>>>>
> >>>>>>> Whenever an user comes and searches ,a  NutchBean is
> >>>>>>> created
> >>>>>>>
> >>>>>>> We should have a mechanism where this nutchbean is
> >>>>>>> pooled .I mean is created
> >>>>>>> and stored so that it can be given to the user
> >>>>>>>
> >>>>>>> Immediately after the  user has used the Nutch Bean
> >>>>>>> ,he returns it back
> >>>>>>>
> >>>>>>> (example at orkut ,we get a message saying doughnut
> >>>>>>> not available)
> >>>>>>>
> >>>>>>> This will make search result faster and more
> >>>>>>> efficient
> >>>>>>>
> >>>>>>> Only when paraller users are there will nutchbeans
> >>>>>>> get created
> >>>>>>>
> >>>>>>> Any comments on the above issue
> >>>>>>>
> >>>>>>>
> >>>>>>> Rgds
> >>>>>>> Prabhu
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >> ---------------------------------------------------------------
> >> company:        http://www.media-style.com
> >> forum:        http://www.text-mining.org
> >> blog:            http://www.find23.net
> >>
> >>
> >>
> >>
>
> ---------------------------------------------------------------
> company:        http://www.media-style.com
> forum:        http://www.text-mining.org
> blog:            http://www.find23.net
>
>
>
>

[Nutch-general] Re: pooling for nutch bean

Reply via email to