Hi again! The problem is solved. The code was buggy. Details here : http://stackoverflow.com/questions/8215375/why-does-apache-mahout-frequent-pattern-minnig-algorithm-return-only-1-item-item
Greetings from switzerland, Sébastien > Hi again! > > I tried the command line. The outuput is NOT the same. > > Sample Data is (in file:) : > 1 bier butter bread > 2 bier bread > 3 bier butter > 4 bier milk bread butter > 5 bread bier > 6 bier milk butter > > Sample session / output (logging removed): > $ ./mahout fpg -i /Users/snoir/Desktop/SampleFPData.txt -o patterns -k 50 > -method sequential -regex '[\ ]' -s 2 > > INFO: Dumping Patterns for Feature: milk > ([butter, milk],2) > > INFO: Dumping Patterns for Feature: bread > ([bread],3), ([bread, butter],2) > > INFO: Dumping Patterns for Feature: butter > ([butter],4), ([butter, milk],2), ([bread, butter],2) > > To my understanding, the command line output is correct. The code version > gives a bad result. > > Comments welcome! > > Best, > Sébastien > > > > On 21 nov. 2011, at 21:59, Grant Ingersoll wrote: > >> Could you try comparing your dataset when using the bin/mahout process and >> report back here? >> >> On Nov 21, 2011, at 4:49 AM, Sébastien Noir wrote: >> >>> Hi! >>> >>> I'm currently trying to understand how to use the implementation of the >>> FPGrowth algoritm (see : >>> https://cwiki.apache.org/MAHOUT/parallel-frequent-pattern-mining.html). >>> >>> Currently, I'm just trying it with stupid data, and scala code. The problem >>> is that it output only single item itemset. >>> I probably missed something. Could you give me a hint? >>> >>> By the way, the code below is scala (calling java implementation >>> directly!). It that is a problem, I can translate it to java... >>> >>> sample outuput : >>> >>> freqList :Buffer((bier,15), (bread,12), (milk,11), (butter,6)) >>> 10:47:44,688 INFO ~ Number of unique items 4 >>> 10:47:44,688 INFO ~ Number of unique pruned items 4 >>> 10:47:44,688 INFO ~ Number of Nodes in the FP Tree: 0 >>> 10:47:44,688 INFO ~ Mining FTree Tree for all patterns with 3 >>> updater : FPGrowth Algorithm for a given feature: 3 >>> butter:[butter] : 6 >>> 10:47:44,690 INFO ~ Found 1 Patterns with Least Support 6 >>> 10:47:44,690 INFO ~ Mining FTree Tree for all patterns with 2 >>> updater : FPGrowth Algorithm for a given feature: 2 >>> updater : FPGrowth Algorithm for a given feature: 3 >>> milk:[milk] : 11 >>> 10:47:44,690 INFO ~ Found 1 Patterns with Least Support 11 >>> 10:47:44,690 INFO ~ Mining FTree Tree for all patterns with 1 >>> updater : FPGrowth Algorithm for a given feature: 1 >>> updater : FPGrowth Algorithm for a given feature: 2 >>> updater : FPGrowth Algorithm for a given feature: 3 >>> bread:[bread] : 12 >>> 10:47:44,690 INFO ~ Found 1 Patterns with Least Support 12 >>> 10:47:44,690 INFO ~ Mining FTree Tree for all patterns with 0 >>> updater : FPGrowth Algorithm for a given feature: 0 >>> updater : FPGrowth Algorithm for a given feature: 1 >>> updater : FPGrowth Algorithm for a given feature: 2 >>> updater : FPGrowth Algorithm for a given feature: 3 >>> bier:[bier] : 15 >>> 10:47:44,691 INFO ~ Found 1 Patterns with Least Support 15 >>> >>> code : >>> >>> >>> import org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth >>> import java.util.HashSet >>> import org.apache.mahout.common.iterator.StringRecordIterator >>> import org.apache.mahout.common.iterator.FileLineIterable >>> import org.apache.mahout.fpm.pfpgrowth.convertors._ >>> import org.apache.mahout.fpm.pfpgrowth.convertors.integer._ >>> import org.apache.mahout.fpm.pfpgrowth.convertors.string._ >>> import org.apache.hadoop.io.SequenceFile.Writer >>> import org.apache.mahout.fpm.pfpgrowth.convertors.StatusUpdater >>> import org.apache.hadoop.mapred.OutputCollector >>> import scala.collection.JavaConversions._ >>> import java.util.{ List => JList } >>> import org.apache.mahout.common.{ Pair => JPair } >>> import java.lang.{ Long => JLong } >>> import org.apache.hadoop.io.{ Text => JText } >>> >>> val minSupport = 1L >>> val k: Int = 50 >>> val fps: FPGrowth[String] = new FPGrowth[String]() >>> >>> val milk = "milk" >>> val bread = "bread" >>> val butter = "butter" >>> val bier = "bier" >>> >>> val transactionStream: Iterator[JPair[JList[String], JLong]] = Iterator( >>> new JPair(List(milk, bread), 1L), >>> new JPair(List(butter), 1L), >>> new JPair(List(bier), 10L), >>> new JPair(List(milk, bread, butter), 5L), >>> new JPair(List(milk, bread, bier), 5L), >>> new JPair(List(bread), 1L) >>> ) >>> >>> val frequencies: Collection[JPair[String, JLong]] = fps.generateFList( >>> transactionStream, minSupport.toInt) >>> >>> println("freqList :" + frequencies) >>> >>> var returnableFeatures: Collection[String] = List( >>> milk, bread, butter, bier) >>> >>> var output: OutputCollector[String, JList[JPair[JList[String], JLong]]] = ( >>> new OutputCollector[String, JList[JPair[JList[String], JLong]]] { >>> def collect(x1: String, >>> x2: JList[JPair[JList[String], JLong]]) = { >>> println(x1 + ":" + >>> x2.map(pair => "[" + pair.getFirst.mkString(",") + "] : " + >>> pair.getSecond).mkString("; ")) >>> } >>> } >>> ) >>> >>> val updater: StatusUpdater = new StatusUpdater { >>> def update(status: String) = println("updater : " + status) >>> } >>> >>> fps.generateTopKFrequentPatterns( >>> transactionStream, >>> frequencies, >>> minSupport, >>> k, >>> null, //returnableFeatures >>> output, >>> updater) >>> >>> >>> >>> >>> >>> >>> >> >> -------------------------------------------- >> Grant Ingersoll >> http://www.lucidimagination.com >> >> >> >
