Hi Lewis, Sorry for late reply. Please find the complete log here: http://pastebin.com/EqeMtsb2
We can see that some of parse processes were not completed successfully. Following are crawling and indexing steps commands. *[Crawling step]* bayu@thinkpato:/opt/searchengine/nutch$ ./bin/nutch crawl urls -depth 3 -topN 5 *[Indexing step]* bayu@thinkpato:/opt/searchengine/nutch$ ./bin/nutch solrindex http://localhost:8080/solr -reindex SolrIndexerJob: starting Adding 1 documents SolrIndexerJob: done. Even though I repeat many times on crawling, the indexing is always only proceed adding 1 document. Below are parsechecker output of success and fail files parsed: *[success]* -- but it's inconsistent since another .odt file is FAIL parsed by nutch. see the hadoop log. bayu@thinkpato:/opt/searchengine/nutch$ ./bin/nutch parsechecker http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.odt --------- Url --------------- http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.odt --------- Metadata --------- Page-Count : 1 dc:creator : Bayu Widyasanyata meta:character-count : 532 Paragraph-Count : 2 nbWord : 69 meta:paragraph-count : 2 Character Count : 532 Last-Save-Date : 2012-12-21T05:37:30 dcterms:modified : 2012-12-21T05:37:30 Object-Count : 0 meta:object-count : 0 Author : Bayu Widyasanyata nbObject : 0 creator : Bayu Widyasanyata xmpTPg:NPages : 1 meta:image-count : 0 Table-Count : 0 nbCharacter : 532 Word-Count : 69 meta:table-count : 0 meta:initial-author : Bayu Widyasanyata Last-Modified : 2012-12-21T05:37:30 Creation-Date : 2012-12-21T05:33:12 generator : OpenOffice.org/3.2$Linux OpenOffice.org_project/320m12$Build-9483 meta:creation-date : 2012-12-21T05:33:12 meta:word-count : 69 Image-Count : 0 nbImg : 0 meta:author : Bayu Widyasanyata nbTab : 0 nbPage : 1 editing-cycles : 2 Content-Type : application/vnd.oasis.opendocument.text meta:save-date : 2012-12-21T05:37:30 meta:page-count : 1 Edit-Time : PT00H04M18S initial-creator : Bayu Widyasanyata nbPara : 2 modified : 2012-12-21T05:37:30 date : 2012-12-21T05:33:12 dcterms:created : 2012-12-21T05:33:12 *[failed]* bayu@thinkpato:/opt/searchengine/nutch$ ./bin/nutch parsechecker http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.pdf --------- Url --------------- http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.pdf --------- Metadata --------- xmp:CreatorTool : Writer meta:author : Bayu Widyasanyata xmpTPg:NPages : 1 dc:creator : Bayu Widyasanyata Content-Type : application/pdf created : Sun Dec 23 19:23:22 WIT 2012 Author : Bayu Widyasanyata Creation-Date : 2012-12-23T12:23:22Z date : 2012-12-23T12:23:22Z producer : OpenOffice.org 3.2 meta:creation-date : 2012-12-23T12:23:22Z creator : Bayu Widyasanyata dcterms:created : 2012-12-23T12:23:22Z Thanks.- On Fri, Jan 11, 2013 at 11:09 AM, Lewis John Mcgibbney < [email protected]> wrote: > I can't see any log output. Can you fetch and parse the pdfs with the > parsechecker tool? > > On Thursday, January 10, 2013, Bayu Widyasanyata <[email protected]> > wrote: > > For clarity, the log below is the about 4 of 5 my PDF docs that can't be > > parsed by nutch. > > > > On Fri, Jan 11, 2013 at 8:29 AM, Bayu Widyasanyata > > <[email protected]>wrote: > > > >> nutch parsing is still problem on pdf files. > >> Only 1 pdf can be parsed successfully. > >> > >> 2013-01-11 08:11:23,679 WARN parse.ParseUtil - Unable to successfully > >> parse content > >> http://localhost/sapi/nospasi_Akhirat_Lebih_Utama_Daripada_Dunia.pdf of > >> type application/pdf > >> > >> Even I had added on parse-plugins.xml explicitly: > >> > >> <mimeType name="application/pdf"> > >> <plugin id="parse-tika" /> > >> </mimeType> > >> > >> What the missed things? > >> > >> On Fri, Jan 11, 2013 at 7:55 AM, Lewis John Mcgibbney < > >> [email protected]> wrote: > >> > >>> No problem at all. > >>> > >>> Better safe than sorry. > >>> > >>> Lewis > >>> > >>> On Thu, Jan 10, 2013 at 4:43 PM, Bayu Widyasanyata > >>> <[email protected]>wrote: > >>> > >>> > Yes, I forgot that things even I already put on my notes on previous > >>> > installation. > >>> > I'm quite new on nutch and also Java developments :) > >>> > > >>> > Thanks! > >>> > > >>> > On Fri, Jan 11, 2013 at 7:01 AM, Lewis John Mcgibbney < > >>> > [email protected]> wrote: > >>> > > >>> > > Hi, > >>> > > > >>> > > java.io.IOException: java.lang.ClassNotFoundException: > >>> > > > com.mysql.jdbc.Driver > >>> > > > > >>> > > > >>> > > If you look at ivy.xml [0] you will see that the > mysql-connector-java > >>> > > dependency is commented out. Please uncomment it, then build Nutch > 2.x > >>> > src > >>> > > again. > >>> > > > >>> > > This will download the dependency and make it available on your > >>> > classpath. > >>> > > > >>> > > Thank you > >>> > > > >>> > > Lewis > >>> > > > >>> > > [0] > >>> > > > >>> > http://svn.apache.org/viewvc/nutch/branches/2.x/ivy/ivy.xml?view=markup > >>> > > > >>> > > >>> > > >>> > > >>> > -- > >>> > wassalam, > >>> > [bayu] > >>> > > >>> > >>> > >>> > >>> -- > >>> *Lewis* > >>> > >> > >> > >> > >> -- > >> wassalam, > >> [bayu] > > > > > > > > > > -- > > wassalam, > > [bayu] > > > > -- > *Lewis* > -- wassalam, [bayu]

