Re: Connecting MySQL to Apache Nutch

PEEYUSH CHANDEL Thu, 13 Jan 2011 09:46:30 -0800

hi Markus

i tried Log also but still the same problem.


sry if  i am wrong but i think that this problem can be solved by changing in

LuceneWriter.Java class because by default the indexer in Nutch 1.2 is luncence.

On 1/13/11, Markus Jelsma <[email protected]> wrote:
> Try using the logger, this way you can check hadoop.log for your output.
>
> import:
> import org.apache.commons.logging.Log;
> import org.apache.commons.logging.LogFactory;
>
> declare:
> public static Log LOG = LogFactory.getLog(SolrWriter.class);
>
> use:
> LOG.info("bla bla");
>
>
>
>
>
> On Thursday 13 January 2011 14:28:21 PEEYUSH CHANDEL wrote:
>> hi markus
>>
>> here is my modified SolarWriter class,please check it and correct me
>> if i am doing something wrong.
>>
>> i tried this code but nothing happens.
>>
>> package org.apache.nutch.indexer.solr;
>>
>> import java.io.IOException;
>> import java.util.ArrayList;
>> import java.util.List;
>> import java.util.Map.Entry;
>> import java.util.Iterator;
>> import java.sql.*;
>>
>> import org.apache.hadoop.mapred.JobConf;
>> import org.apache.nutch.indexer.NutchDocument;
>> import org.apache.nutch.indexer.NutchField;
>> import org.apache.nutch.indexer.NutchIndexWriter;
>> import org.apache.solr.client.solrj.SolrServer;
>> import org.apache.solr.client.solrj.SolrServerException;
>> import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
>> import org.apache.solr.common.SolrInputDocument;
>>
>> public class SolrWriter implements NutchIndexWriter {
>>
>>   private SolrServer solr;
>>   private SolrMappingReader solrMapping;
>>
>>   private final List<SolrInputDocument> inputDocs =
>>     new ArrayList<SolrInputDocument>();
>>
>>   private int commitSize;
>>
>>   public void open(JobConf job, String name) throws IOException {
>>     solr = new CommonsHttpSolrServer(job.get(SolrConstants.SERVER_URL));
>>     commitSize = job.getInt(SolrConstants.COMMIT_SIZE, 1000);
>>     solrMapping = SolrMappingReader.getInstance(job);
>>   }
>>
>>   public void write(NutchDocument doc) throws IOException {
>>     final SolrInputDocument inputDoc = new SolrInputDocument();
>>     for(final Entry<String, NutchField> e : doc) {
>>       for (final Object val : e.getValue().getValues()) {
>>         inputDoc.addField(solrMapping.mapKey(e.getKey()), val,
>> e.getValue().getWeight());
>>         String sCopy = solrMapping.mapCopyKey(e.getKey());
>>         if (sCopy != e.getKey()) {
>>              inputDoc.addField(sCopy, val, e.getValue().getWeight());
>>         }
>>       }
>>     }
>>     inputDoc.setDocumentBoost(doc.getWeight());
>>     inputDocs.add(inputDoc);
>>
>> //here is my modified code
>>
>>     SolrInputDocument abc;
>>     Iterator it=inputDocs.iterator();
>>     while(it.hasNext())
>>     {
>>      abc=(SolrInputDocument)it.next();
>>      String test=(abc.toString());
>>
>>         Connection conn = null;
>>         String url = "jdbc:mysql://localhost:3306/";
>>         String dbName = "data";
>>         String driver = "com.mysql.jdbc.Driver";
>>         String userName = "root";
>>         String password = "passwd";
>>         try {
>>             Class.forName(driver).newInstance();
>>             conn =
>> DriverManager.getConnection(url+dbName,userName,password);
>> System.out.println("Connected to the database");
>>
>>                       java.sql.Statement s = conn.createStatement();
>>                       int r = s.executeUpdate("INSERT INTO data(data)
>> VALUES('"+test+"')");
>>
>>             System.out.println("Done");
>>        conn.close();
>>             System.out.println("Disconnected from database");
>>
>>               }
>>               catch (Exception e) {
>>                       System.out.println(e);
>>                       System.exit(0);
>>                       }
>>
>>     }
>>
>>     if (inputDocs.size() > commitSize) {
>>       try {
>>         solr.add(inputDocs);
>>
>>       } catch (final SolrServerException e) {
>>         throw makeIOException(e);
>>       }
>>       inputDocs.clear();
>>     }
>>   }
>>
>>   public void close() throws IOException {
>>     try {
>>       if (!inputDocs.isEmpty()) {
>>         solr.add(inputDocs);
>>         inputDocs.clear();
>>       }
>>       // solr.commit();
>>     } catch (final SolrServerException e) {
>>       throw makeIOException(e);
>>     }
>>   }
>>
>>   public static IOException makeIOException(SolrServerException e) {
>>     final IOException ioe = new IOException();
>>     ioe.initCause(e);
>>     return ioe;
>>   }
>>
>> }
>>
>> -Thanks you very much
>>
>> On 1/13/11, Markus Jelsma <[email protected]> wrote:
>> > public void write gets called for each NutchDocument and collects them
>> > in
>> > inputDocs. You could, after line 60, call a customer method to read all
>> > fields
>> > and create a SQL insert statement out of it.
>> >
>> > On Thursday 13 January 2011 13:55:14 PEEYUSH CHANDEL wrote:
>> >> hi markus,
>> >>
>> >> i try to modify the SolrWriter.java class and place my mysql connecter
>> >> their but nothing
>> >>
>> >> happens  so can please explain a little more with example of code that
>> >> exactly which
>> >>
>> >> part of SolrWriter class is going to be replace by mysql connecter.
>> >>
>> >> -Thanks You Very Much
>> >>
>> >> On 1/13/11, Markus Jelsma <[email protected]> wrote:
>> >> > Here's the class you need to look at:
>> >> > http://svn.apache.org/viewvc/nutch/branches/branch-1.2/src/java/org/ap
>> >> > ach e/nutch/indexer/solr/SolrWriter.java?view=markup
>> >> >
>> >> >> Modifying the Solr index writer to use a MySQL connector is surely
>> >> >> the easiest short cut.
>> >> >>
>> >> >> > hi O.Klein
>> >> >> >
>> >> >> > thanks for the answer but i am using nutch 1.2 so any solution for
>> >> >> > this version.
>> >> >> >
>> >> >> > On 1/13/11, O. Klein <[email protected]> wrote:
>> >> >> > > Nutch 2.0 supports storage of data in MySQL DB.
>> >> >> > >
>> >> >> > > But that version is not for production yet.
>> >> >> > >
>> >> >> > > Check
>> >> >> > > http://techvineyard.blogspot.com/2010/12/build-nutch-20.html on
>> >> >> > > how to get it running.
>> >> >> > > --
>> >> >> > > View this message in context:
>> >> >> > > http://lucene.472066.n3.nabble.com/Connecting-MySQL-to-Apache-Nut
>> >> >> > > ch- tp2 24 3983p2244263.html Sent from the Nutch - User mailing
>> >> >> > > list archive at
>> >> >> > > Nabble.com.
>> >
>> > --
>> > Markus Jelsma - CTO - Openindex
>> > http://www.linkedin.com/in/markus17
>> > 050-8536620 / 06-50258350
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>

Re: Connecting MySQL to Apache Nutch

Reply via email to