For further reference, below is the Hadoop job task log for the
mergesegs command.
You'll see that parse_data etc merges are performed.
Completed Tasks
Task Complete Status Start Time Finish Time Errors
Counters
task_201201111048_0031_m_000000 100.00%
file:/opt/nutch_1_4/data/crawl/segments/20120111111422/crawl_fetch/part-00000/data:0+259
11-Jan-2012 11:16:22
11-Jan-2012 11:16:25 (3sec)
9
task_201201111048_0031_m_000001 100.00%
file:/opt/nutch_1_4/data/crawl/segments/20120111111422/crawl_generate/part-00000:0+234
11-Jan-2012 11:16:22
11-Jan-2012 11:16:25 (3sec)
9
task_201201111048_0031_m_000002 100.00%
file:/opt/nutch_1_4/data/crawl/segments/20120111111422/content/part-00000/data:0+129
11-Jan-2012 11:16:25
11-Jan-2012 11:16:28 (3sec)
9
task_201201111048_0031_m_000003 100.00%
file:/opt/nutch_1_4/data/crawl/segments/20120111111422/crawl_parse/part-00000:0+129
11-Jan-2012 11:16:25
11-Jan-2012 11:16:28 (3sec)
9
task_201201111048_0031_m_000004 100.00%
file:/opt/nutch_1_4/data/crawl/segments/20120111111422/parse_data/part-00000/data:0+128
11-Jan-2012 11:16:28
11-Jan-2012 11:16:31 (3sec)
9
task_201201111048_0031_m_000005 100.00%
file:/opt/nutch_1_4/data/crawl/segments/20120111111422/parse_text/part-00000/data:0+128
11-Jan-2012 11:16:28
11-Jan-2012 11:16:31 (3sec)
And the parse_data job itself:
attempt_201201111048_0031_m_000004_0
/default-rack/dhcp-192-168-4-26.semantico.net SUCCEEDED 100.00%
11-Jan-2012 11:16:28 11-Jan-2012 11:16:30 (1sec)