Hi, I tried *bin/nutch org.apache.nutch.parse.ParserChecker http://www.fcgroningen.nl/uploads/media/hollabovenplaat_01.jpg* using the latest trunk from SVN and I am getting
--------- > Version: 5 > Status: success(1,0) > Title: > Outlinks: 0 > Content Metadata: ETag="15dab-8280a1c0" Date=Mon, 17 May 2010 13:55:16 GMT > Content-Length=89515 Expires=Mon, 26 Jul 2010 13:55:16 GMT > Last-Modified=Mon, 26 Jan 2009 13:13:51 GMT Content-Type=image/jpeg > Connection=close Accept-Ranges=bytes Server=Apache/2.2.3 (Debian) > PHP/5.2.0-8+etch16 Cache-Control=max-age=6048000 > Parse Metadata: Software=Adobe Photoshop CS2 Windows Number of Components=3 > Orientation=Top, left side (Horizontal / normal) Color Space=sRGB Image > Height=156 pixels Data Precision=8 bits Exif Image Width=992 pixels > Component 1=Y component: Quantization table 0, Sampling factors 1 horiz/1 > vert Component 2=Cb component: Quantization table 1, Sampling factors 1 > horiz/1 vert Compression=JPEG (old-style) Component 3=Cr component: > Quantization table 1, Sampling factors 1 horiz/1 vert Date/Time=2009:01:26 > 14:05:22 X Resolution=72 dots per inch Thumbnail Offset=302 bytes Exif Image > Height=156 pixels Thumbnail Length=3259 bytes Resolution Unit=Inch Image > Width=992 pixels Thumbnail Data=[3259 bytes of thumbnail data] Y > Resolution=72 dots per inch > could you try the command above? J. -- DigitalPebble Ltd http://www.digitalpebble.com On 17 May 2010 14:26, Markus Jelsma <[email protected]> wrote: > Hi, > > > It seems it still doens't work afterall. I updated all config files and the > JPEG (and more new as it looks like). But the log still tells me it cannot > find a suitable parser. > > --------------- > 2010-05-17 15:20:06,636 WARN parse.ParseUtil - No suitable parser found > when > trying to parse content > http://www.fcgroningen.nl/uploads/media/hollabovenplaat_01.jpg of type > image/jpeg > 2010-05-17 15:20:06,637 WARN parse.Parser - Error parsing: > http://www.fcgroningen.nl/uploads/media/hollabovenplaat_01.jpg: > org.apache.nutch.parse.ParseException: parser not found for > contentType=image/jpeg > url=http://www.fcgroningen.nl/uploads/media/hollabovenplaat_01.jpg > at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:74) > at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:85) > at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:41) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) > --------------- > > > Cheers, > > On Monday 17 May 2010 14:37:54 Markus Jelsma wrote: > > Hi, > > > > > > I've got a copy of the nutch-2010-05-11_04-34-41 nightly build because i > > need Tika to parse JPEG images and that would be in 1.1 as i read > > somewhere [1]. > > > > --------------- > > 2010-05-17 14:36:13,074 WARN parse.ParseUtil - No suitable parser found > > when trying to parse content > > http://www.fcgroningen.nl/uploads/media/hollabovenplaat_01.jpg of type > > image/jpeg > > 2010-05-17 14:36:13,075 WARN parse.Parser - Error parsing: > > http://www.fcgroningen.nl/uploads/media/hollabovenplaat_01.jpg: > > org.apache.nutch.parse.ParseException: parser not found for > > contentType=image/jpeg > > url=http://www.fcgroningen.nl/uploads/media/hollabovenplaat_01.jpg > > --------------- > > > > > > [1]: http://lucene.472066.n3.nabble.com/Adding-jpeg-parser-to-nutch- > > td710135.html > > > > Cheers, > > > > Markus Jelsma - Technisch Architect - Buyways BV > > http://www.linkedin.com/in/markus17 > > 050-8536620 / 06-50258350 > > > > Markus Jelsma - Technisch Architect - Buyways BV > http://www.linkedin.com/in/markus17 > 050-8536620 / 06-50258350 > >

