I thought you could also modify the code a little bit, to add a
ParrallelFPGGrowthPartitioner and use multiple reducer to min the FPGrowth
result. For the real mine work is processed in the reducer and without the
partitioner, the Hadoop only use 1 reducer which would be probably slow.

Best wishes,
Stanley Xu



On Tue, Apr 5, 2011 at 4:35 AM, Robin Anil <[email protected]> wrote:

> Could you try with the Performance patch on the JIRA issue page 619
>
>
> On Tue, Apr 5, 2011 at 1:14 AM, Vipul Pandey <[email protected]> wrote:
>
> > So I have a new problem now.
> > I have about 3.5M baskets and about a 100K items/features. After
> > successfully completing parallel-counting, grouping and transaction
> sorting
> > in a reasonable amount of time - PFP Growth just gets stuck in the Reduce
> > phase of ParallelFPGrowth.
> > The reducer is running since about 6:30 PM PST Friday, 1st of April -
> (it's
> > beyond noon of 4th April here - i.e. running for more than 65 hours now)
> > and
> > it's still going on.
> >
> > Anyone faced this issue before? (Console logs pasted below)
> > Also,
> > - We don't have anything else running on our cluster and it's a
> reasonably
> > powerful box.
> > - the jobtracker UI shows this status for the reducer : "Processing
> FPTree:
> > FPGrowth Algorithm for a given feature: 49797 > reduce" and the feature
> > keeps changing every minute or two- which shows that it's still doing
> it's
> > job!
> > - no other errors/messages in the logs
> > - config :  -s 100 -k 50 -g 2  - Any Recommendations on these? my usual
> > values are "-s 10 -g 10" but that doesn't seem to cut for me either.
> >
> >
> >
> > Any clue?
> >
> > Thanks
> > Vipul
> >
> >
> > Log :
> >
> > 11/04/01 18:20:07 INFO common.HadoopUtil: Deleting
> >  /run/MR20/fim/parallelcounting
> >
> > 11/04/01 18:20:07 WARN mapred.JobClient: Use GenericOptionsParser for
> > parsing the arguments. Applications should implement Tool for the same.
> >
> > 11/04/01 18:20:08 INFO input.FileInputFormat: Total input paths to
> process
> > :
> > 20
> >
> > 11/04/01 18:20:08 INFO mapred.JobClient: Running job:
> job_201103311244_0045
> >
> > 11/04/01 18:20:09 INFO mapred.JobClient:  map 0% reduce 0%
> >
> > 11/04/01 18:20:20 INFO mapred.JobClient:  map 2% reduce 0%
> >
> > 11/04/01 18:20:21 INFO mapred.JobClient:  map 8% reduce 0%
> >
> > 11/04/01 18:20:22 INFO mapred.JobClient:  map 11% reduce 0%
> >
> > 11/04/01 18:20:23 INFO mapred.JobClient:  map 13% reduce 0%
> >
> > 11/04/01 18:20:24 INFO mapred.JobClient:  map 16% reduce 0%
> >
> > 11/04/01 18:20:25 INFO mapred.JobClient:  map 17% reduce 0%
> >
> > 11/04/01 18:20:26 INFO mapred.JobClient:  map 18% reduce 0%
> >
> > 11/04/01 18:20:27 INFO mapred.JobClient:  map 21% reduce 0%
> >
> > 11/04/01 18:20:28 INFO mapred.JobClient:  map 22% reduce 0%
> >
> > 11/04/01 18:20:29 INFO mapred.JobClient:  map 24% reduce 0%
> >
> > 11/04/01 18:20:30 INFO mapred.JobClient:  map 27% reduce 0%
> >
> > 11/04/01 18:20:31 INFO mapred.JobClient:  map 28% reduce 0%
> >
> > 11/04/01 18:20:32 INFO mapred.JobClient:  map 31% reduce 0%
> >
> > 11/04/01 18:20:34 INFO mapred.JobClient:  map 33% reduce 0%
> >
> > 11/04/01 18:20:35 INFO mapred.JobClient:  map 35% reduce 0%
> >
> > 11/04/01 18:20:36 INFO mapred.JobClient:  map 38% reduce 0%
> >
> > 11/04/01 18:20:37 INFO mapred.JobClient:  map 40% reduce 0%
> >
> > 11/04/01 18:20:38 INFO mapred.JobClient:  map 42% reduce 0%
> >
> > 11/04/01 18:20:39 INFO mapred.JobClient:  map 43% reduce 0%
> >
> > 11/04/01 18:20:40 INFO mapred.JobClient:  map 45% reduce 0%
> >
> > 11/04/01 18:20:41 INFO mapred.JobClient:  map 47% reduce 0%
> >
> > 11/04/01 18:20:42 INFO mapred.JobClient:  map 50% reduce 0%
> >
> > 11/04/01 18:20:44 INFO mapred.JobClient:  map 53% reduce 0%
> >
> > 11/04/01 18:20:45 INFO mapred.JobClient:  map 55% reduce 0%
> >
> > 11/04/01 18:20:47 INFO mapred.JobClient:  map 57% reduce 0%
> >
> > 11/04/01 18:20:48 INFO mapred.JobClient:  map 60% reduce 0%
> >
> > 11/04/01 18:20:49 INFO mapred.JobClient:  map 61% reduce 0%
> >
> > 11/04/01 18:20:50 INFO mapred.JobClient:  map 63% reduce 0%
> >
> > 11/04/01 18:20:51 INFO mapred.JobClient:  map 64% reduce 0%
> >
> > 11/04/01 18:20:52 INFO mapred.JobClient:  map 66% reduce 0%
> >
> > 11/04/01 18:20:53 INFO mapred.JobClient:  map 68% reduce 0%
> >
> > 11/04/01 18:20:54 INFO mapred.JobClient:  map 69% reduce 0%
> >
> > 11/04/01 18:20:55 INFO mapred.JobClient:  map 71% reduce 0%
> >
> > 11/04/01 18:20:56 INFO mapred.JobClient:  map 72% reduce 0%
> >
> > 11/04/01 18:20:57 INFO mapred.JobClient:  map 74% reduce 0%
> >
> > 11/04/01 18:20:58 INFO mapred.JobClient:  map 75% reduce 0%
> >
> > 11/04/01 18:20:59 INFO mapred.JobClient:  map 76% reduce 0%
> >
> > 11/04/01 18:21:00 INFO mapred.JobClient:  map 78% reduce 0%
> >
> > 11/04/01 18:21:01 INFO mapred.JobClient:  map 79% reduce 0%
> >
> > 11/04/01 18:21:03 INFO mapred.JobClient:  map 81% reduce 0%
> >
> > 11/04/01 18:21:06 INFO mapred.JobClient:  map 83% reduce 0%
> >
> > 11/04/01 18:21:07 INFO mapred.JobClient:  map 85% reduce 0%
> >
> > 11/04/01 18:21:09 INFO mapred.JobClient:  map 87% reduce 0%
> >
> > 11/04/01 18:21:10 INFO mapred.JobClient:  map 88% reduce 0%
> >
> > 11/04/01 18:21:12 INFO mapred.JobClient:  map 89% reduce 0%
> >
> > 11/04/01 18:21:13 INFO mapred.JobClient:  map 90% reduce 0%
> >
> > 11/04/01 18:21:16 INFO mapred.JobClient:  map 91% reduce 0%
> >
> > 11/04/01 18:21:17 INFO mapred.JobClient:  map 92% reduce 0%
> >
> > 11/04/01 18:21:19 INFO mapred.JobClient:  map 93% reduce 0%
> >
> > 11/04/01 18:21:25 INFO mapred.JobClient:  map 94% reduce 0%
> >
> > 11/04/01 18:21:29 INFO mapred.JobClient:  map 94% reduce 5%
> >
> > 11/04/01 18:21:35 INFO mapred.JobClient:  map 95% reduce 6%
> >
> > 11/04/01 18:21:40 INFO mapred.JobClient:  map 96% reduce 6%
> >
> > 11/04/01 18:21:43 INFO mapred.JobClient:  map 97% reduce 6%
> >
> > 11/04/01 18:21:44 INFO mapred.JobClient:  map 97% reduce 10%
> >
> > 11/04/01 18:21:48 INFO mapred.JobClient:  map 98% reduce 10%
> >
> > 11/04/01 18:21:52 INFO mapred.JobClient:  map 99% reduce 10%
> >
> > 11/04/01 18:21:57 INFO mapred.JobClient:  map 100% reduce 10%
> >
> > 11/04/01 18:21:59 INFO mapred.JobClient:  map 100% reduce 13%
> >
> > 11/04/01 18:22:05 INFO mapred.JobClient:  map 100% reduce 18%
> >
> > 11/04/01 18:22:11 INFO mapred.JobClient:  map 100% reduce 25%
> >
> > 11/04/01 18:23:23 INFO mapred.JobClient:  map 100% reduce 26%
> >
> > 11/04/01 18:23:26 INFO mapred.JobClient:  map 100% reduce 33%
> >
> > 11/04/01 18:25:27 INFO mapred.JobClient:  map 100% reduce 74%
> >
> > 11/04/01 18:25:29 INFO mapred.JobClient:  map 100% reduce 76%
> >
> > 11/04/01 18:25:32 INFO mapred.JobClient:  map 100% reduce 78%
> >
> > 11/04/01 18:25:35 INFO mapred.JobClient:  map 100% reduce 80%
> >
> > 11/04/01 18:25:38 INFO mapred.JobClient:  map 100% reduce 82%
> >
> > 11/04/01 18:25:41 INFO mapred.JobClient:  map 100% reduce 84%
> >
> > 11/04/01 18:25:44 INFO mapred.JobClient:  map 100% reduce 86%
> >
> > 11/04/01 18:25:47 INFO mapred.JobClient:  map 100% reduce 88%
> >
> > 11/04/01 18:25:50 INFO mapred.JobClient:  map 100% reduce 90%
> >
> > 11/04/01 18:25:53 INFO mapred.JobClient:  map 100% reduce 92%
> >
> > 11/04/01 18:25:56 INFO mapred.JobClient:  map 100% reduce 94%
> >
> > 11/04/01 18:25:59 INFO mapred.JobClient:  map 100% reduce 96%
> >
> > 11/04/01 18:26:02 INFO mapred.JobClient:  map 100% reduce 98%
> >
> > 11/04/01 18:26:05 INFO mapred.JobClient:  map 100% reduce 100%
> >
> > 11/04/01 18:26:06 INFO mapred.JobClient: Job complete:
> > job_201103311244_0045
> >
> > 11/04/01 18:26:06 INFO mapred.JobClient: Counters: 23
> >
> > 11/04/01 18:26:06 INFO mapred.JobClient:   Job Counters
> >
> > 11/04/01 18:26:06 INFO mapred.JobClient:     Launched reduce tasks=1
> >
> > 11/04/01 18:26:06 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=2045245
> >
> > 11/04/01 18:26:06 INFO mapred.JobClient:     Total time spent by all
> > reduces
> > waiting after reserving slots (ms)=0
> >
> > 11/04/01 18:26:06 INFO mapred.JobClient:     Total time spent by all maps
> > waiting after reserving slots (ms)=0
> >
> > 11/04/01 18:26:06 INFO mapred.JobClient:     Rack-local map tasks=2
> >
> > 11/04/01 18:26:06 INFO mapred.JobClient:     Launched map tasks=23
> >
> > 11/04/01 18:26:06 INFO mapred.JobClient:     Data-local map tasks=21
> >
> > 11/04/01 18:26:06 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=287005
> >
> > 11/04/01 18:26:06 INFO mapred.JobClient:   FileSystemCounters
> >
> > 11/04/01 18:26:06 INFO mapred.JobClient:     FILE_BYTES_READ=1058213034
> >
> > 11/04/01 18:26:06 INFO mapred.JobClient:     HDFS_BYTES_READ=1565999068
> >
> > 11/04/01 18:26:06 INFO mapred.JobClient:
> FILE_BYTES_WRITTEN=1397240087
> >
> > 11/04/01 18:26:06 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=638557669
> >
> > 11/04/01 18:26:06 INFO mapred.JobClient:   Map-Reduce Framework
> >
> > 11/04/01 18:26:07 INFO mapred.JobClient:     Reduce input groups=24671345
> >
> > 11/04/01 18:26:07 INFO mapred.JobClient:     Combine output
> > records=275836241
> >
> > 11/04/01 18:26:07 INFO mapred.JobClient:     Map input records=3480795
> >
> > 11/04/01 18:26:07 INFO mapred.JobClient:     Reduce shuffle
> bytes=318629292
> >
> > 11/04/01 18:26:07 INFO mapred.JobClient:     Reduce output
> records=24671345
> >
> > 11/04/01 18:26:07 INFO mapred.JobClient:     Spilled Records=442523762
> >
> > 11/04/01 18:26:07 INFO mapred.JobClient:     Map output bytes=2873217818
> >
> > 11/04/01 18:26:07 INFO mapred.JobClient:     Combine input
> > records=373343059
> >
> > 11/04/01 18:26:07 INFO mapred.JobClient:     Map output records=164272900
> >
> > 11/04/01 18:26:07 INFO mapred.JobClient:     SPLIT_RAW_BYTES=2860
> >
> > 11/04/01 18:26:07 INFO mapred.JobClient:     Reduce input
> records=66766082
> >
> > 11/04/01 18:26:20 INFO pfpgrowth.PFPGrowth: No of Features: 103412
> >
> > 11/04/01 18:26:20 INFO common.HadoopUtil: Deleting
> > hdfs://nnn001:54310/run/MR20/fim/fList
> >
> > 11/04/01 18:26:21 INFO common.HadoopUtil: Deleting
> > hdfs://nnn001:54310/run/MR20/fim/gList
> >
> > 11/04/01 18:26:21 INFO common.HadoopUtil: Deleting
> > /run/MR20/fim/sortedoutput
> >
> > 11/04/01 18:26:21 WARN mapred.JobClient: Use GenericOptionsParser for
> > parsing the arguments. Applications should implement Tool for the same.
> >
> > 11/04/01 18:26:21 INFO input.FileInputFormat: Total input paths to
> process
> > :
> > 20
> >
> > 11/04/01 18:26:21 INFO mapred.JobClient: Running job:
> job_201103311244_0046
> >
> > 11/04/01 18:26:22 INFO mapred.JobClient:  map 0% reduce 0%
> >
> > 11/04/01 18:26:33 INFO mapred.JobClient:  map 27% reduce 0%
> >
> > 11/04/01 18:26:34 INFO mapred.JobClient:  map 58% reduce 0%
> >
> > 11/04/01 18:26:35 INFO mapred.JobClient:  map 63% reduce 0%
> >
> > 11/04/01 18:26:37 INFO mapred.JobClient:  map 81% reduce 0%
> >
> > 11/04/01 18:26:39 INFO mapred.JobClient:  map 86% reduce 0%
> >
> > 11/04/01 18:26:40 INFO mapred.JobClient:  map 95% reduce 0%
> >
> > 11/04/01 18:26:44 INFO mapred.JobClient:  map 95% reduce 21%
> >
> > 11/04/01 18:26:46 INFO mapred.JobClient:  map 98% reduce 21%
> >
> > 11/04/01 18:26:47 INFO mapred.JobClient:  map 98% reduce 31%
> >
> > 11/04/01 18:26:48 INFO mapred.JobClient:  map 100% reduce 31%
> >
> > 11/04/01 18:26:53 INFO mapred.JobClient:  map 100% reduce 33%
> >
> > 11/04/01 18:26:58 INFO mapred.JobClient:  map 100% reduce 90%
> >
> > 11/04/01 18:27:01 INFO mapred.JobClient:  map 100% reduce 100%
> >
> > 11/04/01 18:27:01 INFO mapred.JobClient: Job complete:
> > job_201103311244_0046
> >
> > 11/04/01 18:27:01 INFO mapred.JobClient: Counters: 22
> >
> > 11/04/01 18:27:01 INFO mapred.JobClient:   Job Counters
> >
> > 11/04/01 18:27:01 INFO mapred.JobClient:     Launched reduce tasks=1
> >
> > 11/04/01 18:27:01 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=232517
> >
> > 11/04/01 18:27:01 INFO mapred.JobClient:     Total time spent by all
> > reduces
> > waiting after reserving slots (ms)=0
> >
> > 11/04/01 18:27:01 INFO mapred.JobClient:     Total time spent by all maps
> > waiting after reserving slots (ms)=0
> >
> > 11/04/01 18:27:01 INFO mapred.JobClient:     Launched map tasks=20
> >
> > 11/04/01 18:27:01 INFO mapred.JobClient:     Data-local map tasks=20
> >
> > 11/04/01 18:27:01 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=24526
> >
> > 11/04/01 18:27:01 INFO mapred.JobClient:   FileSystemCounters
> >
> > 11/04/01 18:27:01 INFO mapred.JobClient:     FILE_BYTES_READ=38487572
> >
> > 11/04/01 18:27:01 INFO mapred.JobClient:     HDFS_BYTES_READ=1618968708
> >
> > 11/04/01 18:27:01 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=81300431
> >
> > 11/04/01 18:27:01 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=84985526
> >
> > 11/04/01 18:27:01 INFO mapred.JobClient:   Map-Reduce Framework
> >
> > 11/04/01 18:27:01 INFO mapred.JobClient:     Reduce input groups=48309
> >
> > 11/04/01 18:27:01 INFO mapred.JobClient:     Combine output records=0
> >
> > 11/04/01 18:27:01 INFO mapred.JobClient:     Map input records=3480795
> >
> > 11/04/01 18:27:01 INFO mapred.JobClient:     Reduce shuffle
> bytes=39454031
> >
> > 11/04/01 18:27:01 INFO mapred.JobClient:     Reduce output
> records=1539349
> >
> > 11/04/01 18:27:01 INFO mapred.JobClient:     Spilled Records=3078698
> >
> > 11/04/01 18:27:01 INFO mapred.JobClient:     Map output bytes=72092246
> >
> > 11/04/01 18:27:01 INFO mapred.JobClient:     Combine input records=0
> >
> > 11/04/01 18:27:01 INFO mapred.JobClient:     Map output records=1539349
> >
> > 11/04/01 18:27:01 INFO mapred.JobClient:     SPLIT_RAW_BYTES=2860
> >
> > 11/04/01 18:27:01 INFO mapred.JobClient:     Reduce input records=1539349
> >
> > 11/04/01 18:27:01 INFO common.HadoopUtil: Deleting run/MR20/fim/fpgrowth
> >
> > 11/04/01 18:27:01 WARN mapred.JobClient: Use GenericOptionsParser for
> > parsing the arguments. Applications should implement Tool for the same.
> >
> > 11/04/01 18:27:02 INFO input.FileInputFormat: Total input paths to
> process
> > :
> > 1
> >
> > 11/04/01 18:27:02 INFO mapred.JobClient: Running job:
> job_201103311244_0047
> >
> > 11/04/01 18:27:03 INFO mapred.JobClient:  map 0% reduce 0%
> >
> > 11/04/01 18:27:14 INFO mapred.JobClient:  map 39% reduce 0%
> >
> > 11/04/01 18:27:17 INFO mapred.JobClient:  map 44% reduce 0%
> >
> > 11/04/01 18:28:15 INFO mapred.JobClient:  map 64% reduce 0%
> >
> > 11/04/01 18:28:38 INFO mapred.JobClient:  map 74% reduce 0%
> >
> > 11/04/01 18:28:41 INFO mapred.JobClient:  map 78% reduce 0%
> >
> > 11/04/01 18:28:44 INFO mapred.JobClient:  map 88% reduce 0%
> >
> > 11/04/01 18:28:47 INFO mapred.JobClient:  map 95% reduce 0%
> >
> > 11/04/01 18:28:50 INFO mapred.JobClient:  map 100% reduce 0%
> >
> > 11/04/01 18:32:27 INFO mapred.JobClient:  map 100% reduce 33%
> >
> > 11/04/01 18:34:45 INFO mapred.JobClient:  map 100% reduce 100%
> >
> >
> > ## AND IT's been here for all this while
> >
>

Reply via email to