Trouble understanding how to use the FP_Growth algorithm

Sébastien Noir Mon, 21 Nov 2011 01:50:10 -0800

Hi!

I'm currently trying to understand how to use the implementation of the 
FPGrowth algoritm (see : 
https://cwiki.apache.org/MAHOUT/parallel-frequent-pattern-mining.html).


Currently, I'm just trying it with stupid data, and scala code. The problem is 
that it output only single item itemset.
I probably missed something. Could you give me a hint?

By the way, the code below is scala (calling java implementation directly!). It 
that is a problem, I can translate it to java...

sample outuput :

freqList :Buffer((bier,15), (bread,12), (milk,11), (butter,6))
10:47:44,688 INFO  ~ Number of unique items 4
10:47:44,688 INFO  ~ Number of unique pruned items 4
10:47:44,688 INFO  ~ Number of Nodes in the FP Tree: 0
10:47:44,688 INFO  ~ Mining FTree Tree for all patterns with 3
updater : FPGrowth Algorithm for a given feature: 3
butter:[butter] : 6
10:47:44,690 INFO  ~ Found 1 Patterns with Least Support 6
10:47:44,690 INFO  ~ Mining FTree Tree for all patterns with 2
updater : FPGrowth Algorithm for a given feature: 2
updater : FPGrowth Algorithm for a given feature: 3
milk:[milk] : 11
10:47:44,690 INFO  ~ Found 1 Patterns with Least Support 11
10:47:44,690 INFO  ~ Mining FTree Tree for all patterns with 1
updater : FPGrowth Algorithm for a given feature: 1
updater : FPGrowth Algorithm for a given feature: 2
updater : FPGrowth Algorithm for a given feature: 3
bread:[bread] : 12
10:47:44,690 INFO  ~ Found 1 Patterns with Least Support 12
10:47:44,690 INFO  ~ Mining FTree Tree for all patterns with 0
updater : FPGrowth Algorithm for a given feature: 0
updater : FPGrowth Algorithm for a given feature: 1
updater : FPGrowth Algorithm for a given feature: 2
updater : FPGrowth Algorithm for a given feature: 3
bier:[bier] : 15
10:47:44,691 INFO  ~ Found 1 Patterns with Least Support 15

code :


    import org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth
    import java.util.HashSet
    import org.apache.mahout.common.iterator.StringRecordIterator
    import org.apache.mahout.common.iterator.FileLineIterable
    import org.apache.mahout.fpm.pfpgrowth.convertors._
    import org.apache.mahout.fpm.pfpgrowth.convertors.integer._
    import org.apache.mahout.fpm.pfpgrowth.convertors.string._
    import org.apache.hadoop.io.SequenceFile.Writer
    import org.apache.mahout.fpm.pfpgrowth.convertors.StatusUpdater
    import org.apache.hadoop.mapred.OutputCollector
    import scala.collection.JavaConversions._
    import java.util.{ List => JList }
    import org.apache.mahout.common.{ Pair => JPair }
    import java.lang.{ Long => JLong }
    import org.apache.hadoop.io.{ Text => JText }

    val minSupport = 1L
    val k: Int = 50
    val fps: FPGrowth[String] = new FPGrowth[String]()

    val milk = "milk"
    val bread = "bread"
    val butter = "butter"
    val bier = "bier"

    val transactionStream: Iterator[JPair[JList[String], JLong]] = Iterator(
      new JPair(List(milk, bread), 1L),
      new JPair(List(butter), 1L),
      new JPair(List(bier), 10L),
      new JPair(List(milk, bread, butter), 5L),
      new JPair(List(milk, bread, bier), 5L),
      new JPair(List(bread), 1L)
    )

    val frequencies: Collection[JPair[String, JLong]] = fps.generateFList(
      transactionStream, minSupport.toInt)

    println("freqList :" + frequencies)

    var returnableFeatures: Collection[String] = List(
      milk, bread, butter, bier)

    var output: OutputCollector[String, JList[JPair[JList[String], JLong]]] = (
      new OutputCollector[String, JList[JPair[JList[String], JLong]]] {
        def collect(x1: String,
                    x2: JList[JPair[JList[String], JLong]]) = {
          println(x1 + ":" +
            x2.map(pair => "[" + pair.getFirst.mkString(",") + "] : " +
              pair.getSecond).mkString("; "))
        }
      }
    )

    val updater: StatusUpdater = new StatusUpdater {
      def update(status: String) = println("updater : " + status)
    }

    fps.generateTopKFrequentPatterns(
      transactionStream,
      frequencies,
      minSupport,
      k,
      null, //returnableFeatures
      output,
      updater)

Trouble understanding how to use the FP_Growth algorithm

Reply via email to