Hi Lewis this was although my impression when I found out that Lucidwords 
haven’t used Nutch for their product. I also think that Nutch is great and I 
initially thought that Fusion used Nutch, but as I haven’t had the opportunity 
of testing the product of work with it I planted the question, mainly to 
satisfy my curiosity and to get a pick under the hood and opinions from those 
that tested Fusion. 

In terms of development I think that Nutch is more mature than Aperture, and 
definitely though that if Lucidwords used it then it would mean a lot of work 
contributed back into the Nutch source code, making in the end a better product 
and perhaps helping to fulfill some weak points of Nutch (i.e the ability to 
control the crawl via REST API, despite the pending JIRA on the Fjodor work). 

So in the end, the decision of using Apperture instead of Nutch really stroke 
me.

Regards,

On Oct 3, 2014, at 4:27 AM, Lewis John Mcgibbney <[email protected]> 
wrote:

> Hi Folks,
> 
> On Thu, Oct 2, 2014 at 4:01 PM, <[email protected]> wrote:
> 
>> 
>> Hi the new Fusion product from Lucidworks provides “advanced filesystem
>> and web crawlers” anyone have had any time to check this out and how to
>> compare to the current and future plans with Nutch?
> 
> 
> I am always dissapointed (but never surprised) when people go and make
> thier own crawlers, then run them on 'Hadoop'.
> Nutch is THE native Hadoop application... why people go and write thier own
> is utterly beyond me. Maybe they like MatLab too much or something ;) ...
> or maybe modern fortran.
> 
> I do not speak on behalf of the Nutch PMC, however what I will say is this.
> I know that there are many CIO's, CTO's as well as many engineers on this
> list and I know they are watching this thread. Nutch if a different product
> now than it was <1.5 years ago. The work that has been done is unparalleled
> in the Python community, and I make this statement boldly. From what I have
> seen, Nutch is the most comprehensive (if a bit challenging w.r.t
> configuration) product out there for crawling. There are a number of issue
> to be addressed in Jira. We know this. But this still does not change my
> opinion on the software.
> 
> I have been corrected previously before for making such statements, however
> my justification is as follows
> 
> * There is a HUGE difference between crawling and scraping.
> * There is a huge difference between leveraging Apache Tika within the
> Nutch framework for metadata augmentation of URLs over scraping.
> * There is a HUGE benefit to be obtained by utlising the Nutch community...
> which is sh*t hot in comparison to ~2-3 years ago. The same community has
> also ensured that Nutch has been making regular releases for a number of
> years now.
> 
> 
> 
>> Just interested I personally haven’t been able to download the product and
>> test it but I’m a bit curious and I would appreciate your comments on this
>> topic.
>> 
>> 
> Hopefull the above is my outtake on things. If LucidWorks have some magic
> sauce then great. Hopefully they consider bringing some of it back into
> Nutch rather than writing some Perl or Python scripts. I would never expect
> this to happen, however I am utterly depressed at how often I see this
> happening.
> Many software projects are failures.
> Nutch is not. It is a decade old.
> Nutch is a success.
> 
> hth
> Lewis

Concurso "Mi selfie por los 5". Detalles en 
http://justiciaparaloscinco.wordpress.com

Reply via email to