Hi Emre, This is great thank you for the contribution and of course the tutorial which I managed to read yesterday.
Have a great weekend Lewis On Fri, Jun 8, 2012 at 4:42 PM, Emre Çelikten <[email protected]> wrote: > Hello, > > I have done as you asked. I hope I have done it correctly as this was my > first patch. Here's the issue: > https://issues.apache.org/jira/browse/NUTCH-1382 > > Here's a tutorial for people that might be interested: > http://cmusphinx.sourceforge.net/2012/06/building-a-java-application-with-apache-nutch-and-solr/ > > It might have slight changes soon since I will proof-read it this evening. > > Hope that helps. > > Best, > > Emre > > On Fri, Jun 8, 2012 at 2:00 PM, Lewis John Mcgibbney < > [email protected]> wrote: > >> Hi Emre, >> >> Even if you were to open a Jira issue for this and submit a patch of >> your hack it would be excellent to have the code available to the >> community. >> >> All the best, oh and glad you got your application working. >> Lewis >> >> On Fri, Jun 8, 2012 at 4:22 AM, Emre Çelikten <[email protected]> wrote: >> > Hello again, >> > >> > I managed to do it. Getting the entire thing to work was tricky. I had to >> > resort to a hack. >> > >> > I will post how I managed to do it here soon, for people that might be >> > interested in the future. >> > >> > Thanks again. >> > >> > Best, >> > >> > Emre >> > >> > On Fri, Jun 8, 2012 at 12:33 AM, Emre Çelikten <[email protected]> >> wrote: >> > >> >> Hello Markus, >> >> >> >> Thanks very much for your help. >> >> >> >> I have looked at Nutch source. I think I need to make a different >> version >> >> of indexSolr method in SolrIndexer.java, yes? The current version is: >> >> >> >> public void indexSolr(String solrUrl, Path crawlDb, Path linkDb, >> >> List<Path> segments, boolean noCommit, boolean deleteGone, String >> >> solrParams) >> >> >> >> I will try to change "String solrUrl" part to "SolrServer server" in the >> >> new method and use my own SolrServer that was created in the >> application. >> >> Do you think this is a correct approach? >> >> >> >> Best, >> >> >> >> Emre >> >> >> >> >> >> On Thu, Jun 7, 2012 at 11:27 PM, Markus Jelsma < >> [email protected] >> >> > wrote: >> >> >> >>> Hello! >> >>> >> >>> Sounds very interesting. Anyway, Solr can run embedded in a Java >> >>> application called EmbeddedSolrServer. You do need to make some >> changes to >> >>> the SolrIndexer tools in Nutch. >> >>> >> >>> Cheers >> >>> >> >>> -----Original message----- >> >>> > From:Emre Çelikten <[email protected]> >> >>> > Sent: Thu 07-Jun-2012 22:24 >> >>> > To: [email protected] >> >>> > Subject: Building Lucene index with Nutch 1.4 >> >>> > >> >>> > Hello everybody, >> >>> > >> >>> > As part of a project, I am working on a FOSS tool that will build >> >>> language >> >>> > models using data obtained from the web which will then be used for >> >>> speech >> >>> > recognition. I plan to make this tool quite compact by encapsulating >> as >> >>> > much as I can in a single Java application and not requiring the >> user to >> >>> > install/configure tons of stuff. >> >>> > >> >>> > I have managed to set up Nutch and am able to crawl a website inside >> a >> >>> Java >> >>> > application. The next thing I need to do is to search for certain >> >>> keywords >> >>> > in the obtained data. I have read that the ability to build Lucene >> >>> indexes >> >>> > has been removed from Nutch and we now need to use Solr instead. The >> way >> >>> > Solr works (servlets, HTTP) is not really appropriate for a tool that >> >>> only >> >>> > needs search functionality that is invisible to the user. >> >>> > >> >>> > What would you recommend me to do in this case? Is there absolutely >> no >> >>> way >> >>> > of building Lucene indexes? I could not find anything other than >> >>> > recommendations to use Solr instead. Should I try to use an older >> >>> version >> >>> > of Nutch? >> >>> > >> >>> > Thanks in advance, >> >>> > >> >>> > Emre >> >>> > >> >>> >> >> >> >> >> >> >> >> -- >> Lewis >> -- Lewis

