Hi again!

The problem is solved. The code was buggy.
Details here : 
http://stackoverflow.com/questions/8215375/why-does-apache-mahout-frequent-pattern-minnig-algorithm-return-only-1-item-item

Greetings from switzerland,
Sébastien

> Hi again!
> 
> I tried the command line. The outuput is NOT the same.
> 
> Sample Data is (in file:) :
> 1     bier butter bread
> 2     bier bread
> 3     bier butter
> 4     bier milk bread butter
> 5     bread bier
> 6     bier milk butter
> 
> Sample session / output (logging removed):
> $ ./mahout fpg -i /Users/snoir/Desktop/SampleFPData.txt -o patterns -k 50 
> -method sequential -regex '[\ ]' -s 2
> 
> INFO: Dumping Patterns for Feature: milk 
> ([butter, milk],2)
> 
> INFO: Dumping Patterns for Feature: bread 
> ([bread],3), ([bread, butter],2)
> 
> INFO: Dumping Patterns for Feature: butter 
> ([butter],4), ([butter, milk],2), ([bread, butter],2)
> 
> To my understanding, the command line output is correct. The code version 
> gives a bad result.
> 
> Comments welcome!
> 
> Best,
> Sébastien
> 
> 
> 
> On 21 nov. 2011, at 21:59, Grant Ingersoll wrote:
> 
>> Could you try comparing your dataset when using the bin/mahout process and 
>> report back here?
>> 
>> On Nov 21, 2011, at 4:49 AM, Sébastien Noir wrote:
>> 
>>> Hi!
>>> 
>>> I'm currently trying to understand how to use the implementation of the 
>>> FPGrowth algoritm (see : 
>>> https://cwiki.apache.org/MAHOUT/parallel-frequent-pattern-mining.html).
>>> 
>>> Currently, I'm just trying it with stupid data, and scala code. The problem 
>>> is that it output only single item itemset.
>>> I probably missed something. Could you give me a hint?
>>> 
>>> By the way, the code below is scala (calling java implementation 
>>> directly!). It that is a problem, I can translate it to java...
>>> 
>>> sample outuput :
>>> 
>>> freqList :Buffer((bier,15), (bread,12), (milk,11), (butter,6))
>>> 10:47:44,688 INFO  ~ Number of unique items 4
>>> 10:47:44,688 INFO  ~ Number of unique pruned items 4
>>> 10:47:44,688 INFO  ~ Number of Nodes in the FP Tree: 0
>>> 10:47:44,688 INFO  ~ Mining FTree Tree for all patterns with 3
>>> updater : FPGrowth Algorithm for a given feature: 3
>>> butter:[butter] : 6
>>> 10:47:44,690 INFO  ~ Found 1 Patterns with Least Support 6
>>> 10:47:44,690 INFO  ~ Mining FTree Tree for all patterns with 2
>>> updater : FPGrowth Algorithm for a given feature: 2
>>> updater : FPGrowth Algorithm for a given feature: 3
>>> milk:[milk] : 11
>>> 10:47:44,690 INFO  ~ Found 1 Patterns with Least Support 11
>>> 10:47:44,690 INFO  ~ Mining FTree Tree for all patterns with 1
>>> updater : FPGrowth Algorithm for a given feature: 1
>>> updater : FPGrowth Algorithm for a given feature: 2
>>> updater : FPGrowth Algorithm for a given feature: 3
>>> bread:[bread] : 12
>>> 10:47:44,690 INFO  ~ Found 1 Patterns with Least Support 12
>>> 10:47:44,690 INFO  ~ Mining FTree Tree for all patterns with 0
>>> updater : FPGrowth Algorithm for a given feature: 0
>>> updater : FPGrowth Algorithm for a given feature: 1
>>> updater : FPGrowth Algorithm for a given feature: 2
>>> updater : FPGrowth Algorithm for a given feature: 3
>>> bier:[bier] : 15
>>> 10:47:44,691 INFO  ~ Found 1 Patterns with Least Support 15
>>> 
>>> code :
>>> 
>>> 
>>>  import org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth
>>>  import java.util.HashSet
>>>  import org.apache.mahout.common.iterator.StringRecordIterator
>>>  import org.apache.mahout.common.iterator.FileLineIterable
>>>  import org.apache.mahout.fpm.pfpgrowth.convertors._
>>>  import org.apache.mahout.fpm.pfpgrowth.convertors.integer._
>>>  import org.apache.mahout.fpm.pfpgrowth.convertors.string._
>>>  import org.apache.hadoop.io.SequenceFile.Writer
>>>  import org.apache.mahout.fpm.pfpgrowth.convertors.StatusUpdater
>>>  import org.apache.hadoop.mapred.OutputCollector
>>>  import scala.collection.JavaConversions._
>>>  import java.util.{ List => JList }
>>>  import org.apache.mahout.common.{ Pair => JPair }
>>>  import java.lang.{ Long => JLong }
>>>  import org.apache.hadoop.io.{ Text => JText }
>>> 
>>>  val minSupport = 1L
>>>  val k: Int = 50
>>>  val fps: FPGrowth[String] = new FPGrowth[String]()
>>> 
>>>  val milk = "milk"
>>>  val bread = "bread"
>>>  val butter = "butter"
>>>  val bier = "bier"
>>> 
>>>  val transactionStream: Iterator[JPair[JList[String], JLong]] = Iterator(
>>>    new JPair(List(milk, bread), 1L),
>>>    new JPair(List(butter), 1L),
>>>    new JPair(List(bier), 10L),
>>>    new JPair(List(milk, bread, butter), 5L),
>>>    new JPair(List(milk, bread, bier), 5L),
>>>    new JPair(List(bread), 1L)
>>>  )
>>> 
>>>  val frequencies: Collection[JPair[String, JLong]] = fps.generateFList(
>>>    transactionStream, minSupport.toInt)
>>> 
>>>  println("freqList :" + frequencies)
>>> 
>>>  var returnableFeatures: Collection[String] = List(
>>>    milk, bread, butter, bier)
>>> 
>>>  var output: OutputCollector[String, JList[JPair[JList[String], JLong]]] = (
>>>    new OutputCollector[String, JList[JPair[JList[String], JLong]]] {
>>>      def collect(x1: String,
>>>                  x2: JList[JPair[JList[String], JLong]]) = {
>>>        println(x1 + ":" +
>>>          x2.map(pair => "[" + pair.getFirst.mkString(",") + "] : " +
>>>            pair.getSecond).mkString("; "))
>>>      }
>>>    }
>>>  )
>>> 
>>>  val updater: StatusUpdater = new StatusUpdater {
>>>    def update(status: String) = println("updater : " + status)
>>>  }
>>> 
>>>  fps.generateTopKFrequentPatterns(
>>>    transactionStream,
>>>    frequencies,
>>>    minSupport,
>>>    k,
>>>    null, //returnableFeatures
>>>    output,
>>>    updater)
>>> 
>>> 
>>>     
>>> 
>>> 
>>> 
>>> 
>> 
>> --------------------------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com
>> 
>> 
>> 
> 

Reply via email to