Re: Building Lucene index with Nutch 1.4

Lewis John Mcgibbney Fri, 08 Jun 2012 04:01:02 -0700

Hi Emre,

Even if you were to open a Jira issue for this and submit a patch of
your hack it would be excellent to have the code available to the
community.


All the best, oh and glad you got your application working.
Lewis

On Fri, Jun 8, 2012 at 4:22 AM, Emre Çelikten <[email protected]> wrote:
> Hello again,
>
> I managed to do it. Getting the entire thing to work was tricky. I had to
> resort to a hack.
>
> I will post how I managed to do it here soon, for people that might be
> interested in the future.
>
> Thanks again.
>
> Best,
>
> Emre
>
> On Fri, Jun 8, 2012 at 12:33 AM, Emre Çelikten <[email protected]> wrote:
>
>> Hello Markus,
>>
>> Thanks very much for your help.
>>
>> I have looked at Nutch source. I think I need to make a different version
>> of indexSolr method in SolrIndexer.java, yes? The current version is:
>>
>> public void indexSolr(String solrUrl, Path crawlDb, Path linkDb,
>>       List<Path> segments, boolean noCommit, boolean deleteGone, String
>> solrParams)
>>
>> I will try to change "String solrUrl" part to "SolrServer server" in the
>> new method and use my own SolrServer that was created in the application.
>> Do you think this is a correct approach?
>>
>> Best,
>>
>> Emre
>>
>>
>> On Thu, Jun 7, 2012 at 11:27 PM, Markus Jelsma <[email protected]
>> > wrote:
>>
>>> Hello!
>>>
>>> Sounds very interesting. Anyway, Solr can run embedded in a Java
>>> application called EmbeddedSolrServer. You do need to make some changes to
>>> the SolrIndexer tools in Nutch.
>>>
>>> Cheers
>>>
>>> -----Original message-----
>>> > From:Emre Çelikten <[email protected]>
>>> > Sent: Thu 07-Jun-2012 22:24
>>> > To: [email protected]
>>> > Subject: Building Lucene index with Nutch 1.4
>>> >
>>> > Hello everybody,
>>> >
>>> > As part of a project, I am working on a FOSS tool that will build
>>> language
>>> > models using data obtained from the web which will then be used for
>>> speech
>>> > recognition. I plan to make this tool quite compact by encapsulating as
>>> > much as I can in a single Java application and not requiring the user to
>>> > install/configure tons of stuff.
>>> >
>>> > I have managed to set up Nutch and am able to crawl a website inside a
>>> Java
>>> > application. The next thing I need to do is to search for certain
>>> keywords
>>> > in the obtained data. I have read that the ability to build Lucene
>>> indexes
>>> > has been removed from Nutch and we now need to use Solr instead. The way
>>> > Solr works (servlets, HTTP) is not really appropriate for a tool that
>>> only
>>> > needs search functionality that is invisible to the user.
>>> >
>>> > What would you recommend me to do in this case? Is there absolutely no
>>> way
>>> > of building Lucene indexes? I could not find anything other than
>>> > recommendations to use Solr instead. Should I try to use an older
>>> version
>>> > of Nutch?
>>> >
>>> > Thanks in advance,
>>> >
>>> > Emre
>>> >
>>>
>>
>>



-- 
Lewis

Re: Building Lucene index with Nutch 1.4

Reply via email to