NullPointerException in ZipTextExtractor if no MIME type for zipped file ------------------------------------------------------------------------
Key: NUTCH-472 URL: https://issues.apache.org/jira/browse/NUTCH-472 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 0.9.0 Environment: Any Reporter: Antony Bowesman extractText throws a NPE in String contentType = MIME.getMimeType(fname).getName(); if the file in the zip has no configured mime type which breaks the parsing of the zip. Code should do: public String extractText(InputStream input, String url, List outLinksList) throws IOException { String resultText = ""; byte temp; ZipInputStream zin = new ZipInputStream(input); ZipEntry entry; while ((entry = zin.getNextEntry()) != null) { if (!entry.isDirectory()) { int size = (int) entry.getSize(); byte[] b = new byte[size]; for(int x = 0; x < size; x++) { int err = zin.read(); if(err != -1) { b[x] = (byte)err; } } String newurl = url + "/"; String fname = entry.getName(); newurl += fname; URL aURL = new URL(newurl); String base = aURL.toString(); int i = fname.lastIndexOf('.'); if (i != -1) { // Trying to resolve the Mime-Type MimeType mt = MIME.getMimeType(fname); if (mt != null) { String contentType = mt.getName(); try { Metadata metadata = new Metadata(); metadata.set(Response.CONTENT_LENGTH, Long.toString(entry.getSize())); metadata.set(Response.CONTENT_TYPE, contentType); Content content = new Content(newurl, base, b, contentType, metadata, this.conf); Parse parse = new ParseUtil(this.conf).parse(content); ParseData theParseData = parse.getData(); Outlink[] theOutlinks = theParseData.getOutlinks(); for(int count = 0; count < theOutlinks.length; count++) { outLinksList.add(new Outlink(theOutlinks[count].getToUrl(), theOutlinks[count].getAnchor(), this.conf)); } resultText += entry.getName() + " " + parse.getText() + " "; } catch (ParseException e) { if (LOG.isInfoEnabled()) { LOG.info("fetch okay, but can't parse " + fname + ", reason: " + e.getMessage()); } } } else { resultText += entry.getName(); } } } } return resultText; } -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers