Hi:
Please attach the patch with a jira issue my mail account give me
trouble with attachment.
Kind regards
Zaheed
On 12/14/06, Doğacan Güney <[EMAIL PROTECTED]> wrote:
Doğacan Güney wrote:
> Hi,
>
> After hadoop-0.9.1, parsing and indexing doesn't seem to work.
> If you parse while fetching then it is fine, but if you run parse as a
> different job, it creates an essentially empty parse_data
> directory(which has index files, but doesn't have data files). I am
> looking into this, but so far, I couldn't find the source of error.
>
> Also, indexing fails at Indexer.OutputFormat.getRecordWriter. The
> parameter fs seems to be an instance of PhasedFileSystem which throws
> exceptions on delete and {start,complete}LocalOutput. The following
> patch should fix it, but may not be the best way of doing this.
>
> Index: src/java/org/apache/nutch/indexer/Indexer.java
> ===================================================================
> --- src/java/org/apache/nutch/indexer/Indexer.java (revision 487240)
> +++ src/java/org/apache/nutch/indexer/Indexer.java (working copy)
> @@ -94,11 +94,15 @@
> final Path temp =
> job.getLocalPath("index/_"+Integer.toString(new
> Random().nextInt()));
>
> - fs.delete(perm); // delete old, if any
> -
> + final FileSystem dfs = FileSystem.get(job);
> + + if (dfs.exists(perm)) {
> + dfs.delete(perm); // delete old,
> if any
> + }
> + final AnalyzerFactory factory = new AnalyzerFactory(job);
> final IndexWriter writer = // build locally first
> - new IndexWriter(fs.startLocalOutput(perm, temp).toString(),
> + new IndexWriter(dfs.startLocalOutput(perm, temp).toString(),
> new NutchDocumentAnalyzer(job), true);
>
> writer.setMergeFactor(job.getInt("indexer.mergeFactor", 10));
> @@ -146,7 +150,7 @@
> // optimize & close index
> writer.optimize();
> writer.close();
> - fs.completeLocalOutput(perm, temp); // copy to dfs
> + dfs.completeLocalOutput(perm, temp);
> fs.createNewFile(new Path(perm, DONE_NAME));
> } finally {
> closed = true;
>
>
>
>
Sorry about the patch, it got garbled somehow. I am attaching it, I hope
mailing list doesn't drop attachments.