Is there any optimizations that can be done when merging segments?
I'm using -numFetchers when calling generate and then marging them back when done. Note the slow rec/s performance.
050926 101850 * Merging all segments into segments 050926 102420 Processed 20000 records (60.708897 rec/s) 050926 102844 Processed 40000 records (75.714554 rec/s) 050926 103337 Processed 60000 records (68.25682 rec/s) -- 050926 101521 parsing file:/data/nutch07/conf/nutch-site.xml 050926 101522 No FS indicated, using default:local 050926 101522 * Opening 200 segments: 050926 101522 - segment 20050926073716-0: 1671 records. 050926 101522 - segment 20050926073716-1: 922 records. 050926 101522 - segment 20050926073716-10: 91 records. 050926 101522 - segment 20050926073716-11: 4928 records. 050926 101522 - segment 20050926073716-12: 946 records. 050926 101522 - segment 20050926073716-13: 3306 records. 050926 101522 - segment 20050926073716-14: 1002 records. 050926 101522 - segment 20050926073716-15: 4794 records. 050926 101522 - segment 20050926073716-16: 1542 records. 050926 101523 - segment 20050926073716-17: 218 records. 050926 101523 - segment 20050926073716-18: 1438 records. 050926 101523 - segment 20050926073716-19: 1025 records. 050926 101523 - segment 20050926073716-2: 991 records. 050926 101523 - segment 20050926073716-20: 5468 records. 050926 101523 - segment 20050926073716-21: 2992 records. 050926 101523 - segment 20050926073716-22: 1934 records. 050926 101523 - segment 20050926073716-23: 1403 records. 050926 101523 - segment 20050926073716-24: 862 records. 050926 101523 - segment 20050926073716-25: 1078 records. 050926 101524 - segment 20050926073716-26: 1412 records. 050926 101524 - segment 20050926073716-27: 4199 records. 050926 101524 - segment 20050926073716-28: 1741 records. 050926 101524 - segment 20050926073716-29: 3477 records. 050926 101524 - segment 20050926073716-3: 1853 records. 050926 101524 - segment 20050926073716-30: 1866 records. 050926 101524 - segment 20050926073716-31: 462 records. 050926 101524 - segment 20050926073716-32: 2728 records. 050926 101524 - segment 20050926073716-33: 1205 records. 050926 101524 - segment 20050926073716-34: 2244 records. 050926 101524 - segment 20050926073716-35: 1656 records. 050926 101524 - segment 20050926073716-36: 1527 records. 050926 101524 - segment 20050926073716-37: 2955 records. 050926 101524 - segment 20050926073716-38: 12739 records. 050926 101524 - segment 20050926073716-39: 530 records. 050926 101524 - segment 20050926073716-4: 2753 records. 050926 101524 - segment 20050926073716-40: 1759 records. 050926 101524 - segment 20050926073716-41: 2729 records. 050926 101524 - segment 20050926073716-42: 1050 records. 050926 101524 - segment 20050926073716-43: 3044 records. 050926 101524 - segment 20050926073716-44: 780 records. 050926 101524 - segment 20050926073716-45: 950 records. 050926 101524 - segment 20050926073716-46: 2530 records. 050926 101524 - segment 20050926073716-47: 585 records. 050926 101524 - segment 20050926073716-48: 5786 records. 050926 101524 - segment 20050926073716-49: 3371 records. 050926 101525 - segment 20050926073716-5: 4956 records. 050926 101525 - segment 20050926073716-6: 1332 records. 050926 101525 - segment 20050926073716-7: 1534 records. 050926 101525 - segment 20050926073716-8: 1970 records. 050926 101525 - segment 20050926073716-9: 3662 records. 050926 101525 * TOTAL 115996 input records in 50 segments. 050926 101525 * Creating master index... 050926 101550 Processed 20000 records (785.54596 rec/s) 050926 101610 Processed 40000 records (1009.795 rec/s) 050926 101627 Processed 60000 records (1144.4922 rec/s) 050926 101717 Processed 80000 records (405.50677 rec/s) 050926 101805 Processed 100000 records (417.52783 rec/s) 050926 101845 * Creating index took 200409 ms 050926 101845 * Optimizing index took 8 ms 050926 101845 * Removing duplicate entries... 050926 101846 Processed 20000 records (21739.13 rec/s) 050926 101847 Processed 40000 records (25477.707 rec/s) 050926 101848 Processed 60000 records (26281.209 rec/s) 050926 101849 Processed 80000 records (11363.637 rec/s) 050926 101850 Processed 100000 records (28943.56 rec/s) 050926 101850 * Deduplicating took 5368 ms 050926 101850 * Merging all segments into segments 050926 102420 Processed 20000 records (60.708897 rec/s) 050926 102844 Processed 40000 records (75.714554 rec/s) 050926 103337 Processed 60000 records (68.25682 rec/s)
