Hi,
Look at HBase vs Hypertable. Both are implemenations of the same concept
(BigTable). HBase is in Java, Hypertable is in C++. Search the web and you
can find tons of flame discussions. I am not sure one can really say that
one implementation is superior to the other, mainly due to the fact that
both the projects are still very young and each community focuse on
different implementation priorities. Could Nutch benefit from similar flame
wars? Wouldn't it be more like energy waste? Pragmatic approach would be to
identify bottle necks in current Nutch code and try to improve its Java
implementation, if this is not possible and C++ implementation of critical
functionality would provide significant overall performance boost then this
can be a valid justification...

Regards,
Lukas

http://blog.lukas-vlcek.com/


On Wed, Aug 5, 2009 at 12:45 AM, Iain Downs <[email protected]> wrote:

> I wasn't advocating this.
>
> ' (and you may not see much if any)'.
>
> Comparisons of managed languages vs C++ seem to have widely varied results.
> Some claim the managed language is faster, some that it is slower.
>
> The simple tests I've done with C# (which is sort of like java but faster
> ... [no flames please.  I don't really care if this statement is true or
> not!]) make me think C++ is 1.5 to 2 times faster for array intensive work
> -
> mainly because it checks the bounds a lot.  And I would guess that some of
> Nutch falls into this category, but by no means all.
>
> Personally, I would guess that you could get some 10-20 percent higher
> throughput if Nutch and Lucene were all native C++.  But then you would
> have
> taken twice as long to write the code.
>
> And I find writing in managed languages (Java, .net) so much less
> frustrating and so much more productive, that any small performance gains
> are irrelevant!
>
> Iain
>
> -----Original Message-----
> From: reinhard schwab [mailto:[email protected]]
> Sent: 04 August 2009 18:36
> To: [email protected]
> Subject: Re: Nutch in C++
>
> Iain Downs schrieb:
> > I think there is probably a sub text here (I'm putting words in Otis'
> mouth,
> > for which my apologies).
> >
> > ' Yes, you could rewrite Nutch in C++ and have that use CLucene.'  But
> you'd
> > be mad to do so!
> >
> > I'm a bit out of date with Nutch, but it's large.  And Java to C++ is not
> an
> > easy conversion because of the different memory management systems.
> >
> > And why?  I guess you may see some performance improvement, but it would
> be
> > a LOT cheaper to throw hardware at the problem (and you may not see much
> if
> > any).
> >
> performance improvement?
> can you proove that c++ will be faster?
> > So if you have a few months to spare ....
> >
> >
> > Iain
> >
> > -----Original Message-----
> > From: Otis Gospodnetic [mailto:[email protected]]
> > Sent: 04 August 2009 04:49
> > To: [email protected]
> > Subject: Re: Nutch in C++
> >
> > CLucene is just like Lucene (except a few versions behind), but written
> in
> > C++.
> >
> > Yes, you could rewrite Nutch in C++ and have that use CLucene.
> >
> > Otis
> > --
> > Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> >
> >
> >
> > ----- Original Message ----
> >
> >> From: "[email protected]" <[email protected]>
> >> To: [email protected]
> >> Sent: Monday, August 3, 2009 2:29:40 PM
> >> Subject: Re: Nutch in C++
> >>
> >>
> >>
> >>
> >>
> >> Hi,
> >>
> >> I know nutch uses Lucene. But for what is Clucene then? Only for
> indexing
> >>
> > files
> >
> >> in a hard drive?
> >>
> >>
> >> I have knowledge of C++ and some experience. I wanted to code crawler of
> >>
> > Nutch
> >
> >> in C++ to get more experience and make it open source, only if it l be
> >>
> > useful
> >
> >> for the open source community.
> >> My goal is to get more experience in C++ and make? contribution to open
> >>
> > source.
> >
> >> If you know other projects that may be more useful, please let me know.
> >>
> >> thanks.
> >> Alex.
> >>
> >>
> >> -----Original Message-----
> >> From: Otis Gospodnetic
> >> To: [email protected]
> >> Sent: Sun, Aug 2, 2009 8:15 pm
> >> Subject: Re: Nutch in C++
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> Nutch uses Lucene (Java), not CLucene (C++).
> >>
> >> Why are you looking to rewrite Nutch in C++ anyway?  Sounds scary.
> >>
> >> Otis
> >> --
> >> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> >> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> >>
> >>
> >>
> >> ----- Original Message ----
> >>
> >>> From: "[email protected]"
> >>> To: [email protected]
> >>> Sent: Thursday, July 30, 2009 3:13:16 PM
> >>> Subject: Nutch in C++
> >>>
> >>> Hi,
> >>>
> >>> As I understood only indexing part of nutch is in C++ as clucene.? I
> >>>
> > want to
> >
> >>> code? nutch in C++, only in case if it is worth doing that.? I wondered
> >>>
> > if is
> >
> >>> worth coding the remaining parts of nutch in C++, let say the crawler.
> >>>
> > Can
> >
> >>> someone give me directions on what to start.
> >>>
> >>> Thanks
> >>> Alex.
> >>>
> >
> >
> >
>
>

Reply via email to