Hi David, The latest Nutch release candidate (1.1, http://svn.apache.org/repos/asf/lucene/nutch/tags/1.1) includes the tika-parser plugin, which provides a JpegParser (see here: http://bit.ly/b0zRX8) that hopefully can suit your needs.
Let me know what you think. Cheers, Chris On 4/10/10 6:56 AM, "Gombkötő Dávid" <madav...@gmail.com> wrote: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello. Im working on a school task, wich is to modify nutch to be able to identify, and download jpegs , creaty a thumbnail , and index the url of this jpegs with the other crawl result so that the web interface can show images as well. At the start i found that ParserNotFound.java can do the trick for me. I modified the constructor so that it matches the url-s end to a pattern, and if it ends to jpeg it creates a file with the name of the md5sum of the url and writes the url in it to a directory found in my filesystem. Well.. this is ugly, i wanted to add the working directory to the parsernotfound.java , but i couldnt. And to move forward with my work, i need to know how to make my own jpeg parser as first task. After that i would like to index my result somehow :) So.. my question.. how can i add my jpeg parser? Or, how can i add a new parser to the nutch system? Thanks for your awnsers. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJLwIObAAoJEIJu8h6i9aAHb6AH/jegl+oqvUg8nJCJo1p/IuVx KuWthxGn0S+qDMfXrYb+AIRpmuj2YAWQwEE9Lhw2ftSJwFqH4gf4VwmDJq8CDTto BDX+/lOOI7ZVtKzNmDgaN2nwX0gwn0PJgKTV8BGkUbVy3McfisQ/9v9UBzhjj7f7 DTvsZN2yNyv9PUls9GSqXw9czFsuKB7PLGnssqB6a8DTgFeoLT2F8e0B9q2Tht92 eAZV2awEnnH/wNTIjfwO00YXNdvNcGANiFzz0v4CoMekSEigoRBSemtYhsYCOppo S0OUy8SCT4A2B6sWADIQjMKgnWuLm53dkHl9D91p0zMpnCTcq5u3hjLnxgq69L8= =M7VY -----END PGP SIGNATURE----- ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++