Re: Connecting MySQL to Apache Nutch

Iker Huerga Thu, 13 Jan 2011 06:01:48 -0800

Hi,

I did it following this tutorial at [1] but using nutch 2.0. It uses Apache
Gora as the ORM for data persistance.


I will try to follow this procedure to do the same in nutch 1.2, i will let
you know

[1]  http://techvineyard.blogspot.com/2010/12/build-nutch-20.html

2011/1/13 Markus Jelsma <[email protected]>

> Try using the logger, this way you can check hadoop.log for your output.
>
> import:
> import org.apache.commons.logging.Log;
> import org.apache.commons.logging.LogFactory;
>
> declare:
> public static Log LOG = LogFactory.getLog(SolrWriter.class);
>
> use:
> LOG.info("bla bla");
>
>
>
>
>
> On Thursday 13 January 2011 14:28:21 PEEYUSH CHANDEL wrote:
> > hi markus
> >
> > here is my modified SolarWriter class,please check it and correct me
> > if i am doing something wrong.
> >
> > i tried this code but nothing happens.
> >
> > package org.apache.nutch.indexer.solr;
> >
> > import java.io.IOException;
> > import java.util.ArrayList;
> > import java.util.List;
> > import java.util.Map.Entry;
> > import java.util.Iterator;
> > import java.sql.*;
> >
> > import org.apache.hadoop.mapred.JobConf;
> > import org.apache.nutch.indexer.NutchDocument;
> > import org.apache.nutch.indexer.NutchField;
> > import org.apache.nutch.indexer.NutchIndexWriter;
> > import org.apache.solr.client.solrj.SolrServer;
> > import org.apache.solr.client.solrj.SolrServerException;
> > import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
> > import org.apache.solr.common.SolrInputDocument;
> >
> > public class SolrWriter implements NutchIndexWriter {
> >
> >   private SolrServer solr;
> >   private SolrMappingReader solrMapping;
> >
> >   private final List<SolrInputDocument> inputDocs =
> >     new ArrayList<SolrInputDocument>();
> >
> >   private int commitSize;
> >
> >   public void open(JobConf job, String name) throws IOException {
> >     solr = new CommonsHttpSolrServer(job.get(SolrConstants.SERVER_URL));
> >     commitSize = job.getInt(SolrConstants.COMMIT_SIZE, 1000);
> >     solrMapping = SolrMappingReader.getInstance(job);
> >   }
> >
> >   public void write(NutchDocument doc) throws IOException {
> >     final SolrInputDocument inputDoc = new SolrInputDocument();
> >     for(final Entry<String, NutchField> e : doc) {
> >       for (final Object val : e.getValue().getValues()) {
> >         inputDoc.addField(solrMapping.mapKey(e.getKey()), val,
> > e.getValue().getWeight());
> >         String sCopy = solrMapping.mapCopyKey(e.getKey());
> >         if (sCopy != e.getKey()) {
> >               inputDoc.addField(sCopy, val, e.getValue().getWeight());
> >         }
> >       }
> >     }
> >     inputDoc.setDocumentBoost(doc.getWeight());
> >     inputDocs.add(inputDoc);
> >
> > //here is my modified code
> >
> >     SolrInputDocument abc;
> >     Iterator it=inputDocs.iterator();
> >     while(it.hasNext())
> >     {
> >       abc=(SolrInputDocument)it.next();
> >       String test=(abc.toString());
> >
> >         Connection conn = null;
> >         String url = "jdbc:mysql://localhost:3306/";
> >         String dbName = "data";
> >         String driver = "com.mysql.jdbc.Driver";
> >         String userName = "root";
> >         String password = "passwd";
> >         try {
> >             Class.forName(driver).newInstance();
> >             conn =
> > DriverManager.getConnection(url+dbName,userName,password);
> > System.out.println("Connected to the database");
> >
> >                       java.sql.Statement s = conn.createStatement();
> >                       int r = s.executeUpdate("INSERT INTO data(data)
> > VALUES('"+test+"')");
> >
> >             System.out.println("Done");
> >        conn.close();
> >             System.out.println("Disconnected from database");
> >
> >               }
> >               catch (Exception e) {
> >                       System.out.println(e);
> >                       System.exit(0);
> >                       }
> >
> >     }
> >
> >     if (inputDocs.size() > commitSize) {
> >       try {
> >         solr.add(inputDocs);
> >
> >       } catch (final SolrServerException e) {
> >         throw makeIOException(e);
> >       }
> >       inputDocs.clear();
> >     }
> >   }
> >
> >   public void close() throws IOException {
> >     try {
> >       if (!inputDocs.isEmpty()) {
> >         solr.add(inputDocs);
> >         inputDocs.clear();
> >       }
> >       // solr.commit();
> >     } catch (final SolrServerException e) {
> >       throw makeIOException(e);
> >     }
> >   }
> >
> >   public static IOException makeIOException(SolrServerException e) {
> >     final IOException ioe = new IOException();
> >     ioe.initCause(e);
> >     return ioe;
> >   }
> >
> > }
> >
> > -Thanks you very much
> >
> > On 1/13/11, Markus Jelsma <[email protected]> wrote:
> > > public void write gets called for each NutchDocument and collects them
> in
> > > inputDocs. You could, after line 60, call a customer method to read all
> > > fields
> > > and create a SQL insert statement out of it.
> > >
> > > On Thursday 13 January 2011 13:55:14 PEEYUSH CHANDEL wrote:
> > >> hi markus,
> > >>
> > >> i try to modify the SolrWriter.java class and place my mysql connecter
> > >> their but nothing
> > >>
> > >> happens  so can please explain a little more with example of code that
> > >> exactly which
> > >>
> > >> part of SolrWriter class is going to be replace by mysql connecter.
> > >>
> > >> -Thanks You Very Much
> > >>
> > >> On 1/13/11, Markus Jelsma <[email protected]> wrote:
> > >> > Here's the class you need to look at:
> > >> >
> http://svn.apache.org/viewvc/nutch/branches/branch-1.2/src/java/org/ap
> > >> > ach e/nutch/indexer/solr/SolrWriter.java?view=markup
> > >> >
> > >> >> Modifying the Solr index writer to use a MySQL connector is surely
> > >> >> the easiest short cut.
> > >> >>
> > >> >> > hi O.Klein
> > >> >> >
> > >> >> > thanks for the answer but i am using nutch 1.2 so any solution
> for
> > >> >> > this version.
> > >> >> >
> > >> >> > On 1/13/11, O. Klein <[email protected]> wrote:
> > >> >> > > Nutch 2.0 supports storage of data in MySQL DB.
> > >> >> > >
> > >> >> > > But that version is not for production yet.
> > >> >> > >
> > >> >> > > Check
> > >> >> > > http://techvineyard.blogspot.com/2010/12/build-nutch-20.htmlon
> > >> >> > > how to get it running.
> > >> >> > > --
> > >> >> > > View this message in context:
> > >> >> > >
> http://lucene.472066.n3.nabble.com/Connecting-MySQL-to-Apache-Nut
> > >> >> > > ch- tp2 24 3983p2244263.html Sent from the Nutch - User mailing
> > >> >> > > list archive at
> > >> >> > > Nabble.com.
> > >
> > > --
> > > Markus Jelsma - CTO - Openindex
> > > http://www.linkedin.com/in/markus17
> > > 050-8536620 / 06-50258350
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>



-- 
Iker Huerga
http://www.linkatu.net

Re: Connecting MySQL to Apache Nutch

Reply via email to