Hi, I did it following this tutorial at [1] but using nutch 2.0. It uses Apache Gora as the ORM for data persistance.
I will try to follow this procedure to do the same in nutch 1.2, i will let you know [1] http://techvineyard.blogspot.com/2010/12/build-nutch-20.html 2011/1/13 Markus Jelsma <[email protected]> > Try using the logger, this way you can check hadoop.log for your output. > > import: > import org.apache.commons.logging.Log; > import org.apache.commons.logging.LogFactory; > > declare: > public static Log LOG = LogFactory.getLog(SolrWriter.class); > > use: > LOG.info("bla bla"); > > > > > > On Thursday 13 January 2011 14:28:21 PEEYUSH CHANDEL wrote: > > hi markus > > > > here is my modified SolarWriter class,please check it and correct me > > if i am doing something wrong. > > > > i tried this code but nothing happens. > > > > package org.apache.nutch.indexer.solr; > > > > import java.io.IOException; > > import java.util.ArrayList; > > import java.util.List; > > import java.util.Map.Entry; > > import java.util.Iterator; > > import java.sql.*; > > > > import org.apache.hadoop.mapred.JobConf; > > import org.apache.nutch.indexer.NutchDocument; > > import org.apache.nutch.indexer.NutchField; > > import org.apache.nutch.indexer.NutchIndexWriter; > > import org.apache.solr.client.solrj.SolrServer; > > import org.apache.solr.client.solrj.SolrServerException; > > import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer; > > import org.apache.solr.common.SolrInputDocument; > > > > public class SolrWriter implements NutchIndexWriter { > > > > private SolrServer solr; > > private SolrMappingReader solrMapping; > > > > private final List<SolrInputDocument> inputDocs = > > new ArrayList<SolrInputDocument>(); > > > > private int commitSize; > > > > public void open(JobConf job, String name) throws IOException { > > solr = new CommonsHttpSolrServer(job.get(SolrConstants.SERVER_URL)); > > commitSize = job.getInt(SolrConstants.COMMIT_SIZE, 1000); > > solrMapping = SolrMappingReader.getInstance(job); > > } > > > > public void write(NutchDocument doc) throws IOException { > > final SolrInputDocument inputDoc = new SolrInputDocument(); > > for(final Entry<String, NutchField> e : doc) { > > for (final Object val : e.getValue().getValues()) { > > inputDoc.addField(solrMapping.mapKey(e.getKey()), val, > > e.getValue().getWeight()); > > String sCopy = solrMapping.mapCopyKey(e.getKey()); > > if (sCopy != e.getKey()) { > > inputDoc.addField(sCopy, val, e.getValue().getWeight()); > > } > > } > > } > > inputDoc.setDocumentBoost(doc.getWeight()); > > inputDocs.add(inputDoc); > > > > //here is my modified code > > > > SolrInputDocument abc; > > Iterator it=inputDocs.iterator(); > > while(it.hasNext()) > > { > > abc=(SolrInputDocument)it.next(); > > String test=(abc.toString()); > > > > Connection conn = null; > > String url = "jdbc:mysql://localhost:3306/"; > > String dbName = "data"; > > String driver = "com.mysql.jdbc.Driver"; > > String userName = "root"; > > String password = "passwd"; > > try { > > Class.forName(driver).newInstance(); > > conn = > > DriverManager.getConnection(url+dbName,userName,password); > > System.out.println("Connected to the database"); > > > > java.sql.Statement s = conn.createStatement(); > > int r = s.executeUpdate("INSERT INTO data(data) > > VALUES('"+test+"')"); > > > > System.out.println("Done"); > > conn.close(); > > System.out.println("Disconnected from database"); > > > > } > > catch (Exception e) { > > System.out.println(e); > > System.exit(0); > > } > > > > } > > > > if (inputDocs.size() > commitSize) { > > try { > > solr.add(inputDocs); > > > > } catch (final SolrServerException e) { > > throw makeIOException(e); > > } > > inputDocs.clear(); > > } > > } > > > > public void close() throws IOException { > > try { > > if (!inputDocs.isEmpty()) { > > solr.add(inputDocs); > > inputDocs.clear(); > > } > > // solr.commit(); > > } catch (final SolrServerException e) { > > throw makeIOException(e); > > } > > } > > > > public static IOException makeIOException(SolrServerException e) { > > final IOException ioe = new IOException(); > > ioe.initCause(e); > > return ioe; > > } > > > > } > > > > -Thanks you very much > > > > On 1/13/11, Markus Jelsma <[email protected]> wrote: > > > public void write gets called for each NutchDocument and collects them > in > > > inputDocs. You could, after line 60, call a customer method to read all > > > fields > > > and create a SQL insert statement out of it. > > > > > > On Thursday 13 January 2011 13:55:14 PEEYUSH CHANDEL wrote: > > >> hi markus, > > >> > > >> i try to modify the SolrWriter.java class and place my mysql connecter > > >> their but nothing > > >> > > >> happens so can please explain a little more with example of code that > > >> exactly which > > >> > > >> part of SolrWriter class is going to be replace by mysql connecter. > > >> > > >> -Thanks You Very Much > > >> > > >> On 1/13/11, Markus Jelsma <[email protected]> wrote: > > >> > Here's the class you need to look at: > > >> > > http://svn.apache.org/viewvc/nutch/branches/branch-1.2/src/java/org/ap > > >> > ach e/nutch/indexer/solr/SolrWriter.java?view=markup > > >> > > > >> >> Modifying the Solr index writer to use a MySQL connector is surely > > >> >> the easiest short cut. > > >> >> > > >> >> > hi O.Klein > > >> >> > > > >> >> > thanks for the answer but i am using nutch 1.2 so any solution > for > > >> >> > this version. > > >> >> > > > >> >> > On 1/13/11, O. Klein <[email protected]> wrote: > > >> >> > > Nutch 2.0 supports storage of data in MySQL DB. > > >> >> > > > > >> >> > > But that version is not for production yet. > > >> >> > > > > >> >> > > Check > > >> >> > > http://techvineyard.blogspot.com/2010/12/build-nutch-20.htmlon > > >> >> > > how to get it running. > > >> >> > > -- > > >> >> > > View this message in context: > > >> >> > > > http://lucene.472066.n3.nabble.com/Connecting-MySQL-to-Apache-Nut > > >> >> > > ch- tp2 24 3983p2244263.html Sent from the Nutch - User mailing > > >> >> > > list archive at > > >> >> > > Nabble.com. > > > > > > -- > > > Markus Jelsma - CTO - Openindex > > > http://www.linkedin.com/in/markus17 > > > 050-8536620 / 06-50258350 > > -- > Markus Jelsma - CTO - Openindex > http://www.linkedin.com/in/markus17 > 050-8536620 / 06-50258350 > -- Iker Huerga http://www.linkatu.net

