[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-04-09 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/5213#issuecomment-91347701
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-04-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5213#issuecomment-91265851
  
  [Test build #29941 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29941/consoleFull)
 for   PR 5213 at commit 
[`8ce0359`](https://github.com/apache/spark/commit/8ce0359e42d05b095147ec121a3d868e580bae7d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-04-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5213#issuecomment-91348808
  
  [Test build #29961 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29961/consoleFull)
 for   PR 5213 at commit 
[`ed62ead`](https://github.com/apache/spark/commit/ed62eadccc83599855bed162103dbafdc59d8226).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-04-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5213#issuecomment-91275448
  
  [Test build #29945 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29945/consoleFull)
 for   PR 5213 at commit 
[`8ce0359`](https://github.com/apache/spark/commit/8ce0359e42d05b095147ec121a3d868e580bae7d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5213#issuecomment-91274549
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29941/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-04-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5213#issuecomment-91363126
  
  [Test build #29961 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29961/consoleFull)
 for   PR 5213 at commit 
[`ed62ead`](https://github.com/apache/spark/commit/ed62eadccc83599855bed162103dbafdc59d8226).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class FPGrowthModel(JavaModelWrapper):`
  * `class FPGrowth(object):`

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-04-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5213#issuecomment-91328549
  
  [Test build #29952 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29952/consoleFull)
 for   PR 5213 at commit 
[`ed62ead`](https://github.com/apache/spark/commit/ed62eadccc83599855bed162103dbafdc59d8226).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class FPGrowthModel(JavaModelWrapper):`
  * `class FPGrowth(object):`

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-04-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5213#issuecomment-91311173
  
  [Test build #29952 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29952/consoleFull)
 for   PR 5213 at commit 
[`ed62ead`](https://github.com/apache/spark/commit/ed62eadccc83599855bed162103dbafdc59d8226).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5213#issuecomment-91302140
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29945/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-04-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5213#issuecomment-91302117
  
  [Test build #29945 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29945/consoleFull)
 for   PR 5213 at commit 
[`8ce0359`](https://github.com/apache/spark/commit/8ce0359e42d05b095147ec121a3d868e580bae7d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class FPGrowthModel(JavaModelWrapper):`
  * `class FPGrowth(object):`

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-04-09 Thread shaneknapp
Github user shaneknapp commented on the pull request:

https://github.com/apache/spark/pull/5213#issuecomment-91274674
  
jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5213#issuecomment-91328594
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29952/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-04-05 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/5213#issuecomment-89868426
  
The implementation looks good to me. There are some minor issues about the 
docstring style. Please fix it and it should be good to go. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-04-05 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/5213#discussion_r27781308
  
--- Diff: python/pyspark/mllib/fpm.py ---
@@ -0,0 +1,67 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from pyspark import SparkContext
+from pyspark.mllib.common import JavaModelWrapper, callMLlibFunc, 
inherit_doc
+
+__all__ = ['FPGrowth', 'FPGrowthModel']
+
+
+@inherit_doc
+class FPGrowthModel(JavaModelWrapper):
+
+A FP-Growth model for mining frequent itemsets using the Parallel 
FP-Growth algorithm.
+
+ data = [[a, b, c], [a, b, d, e], [a, c, e], 
[a, c, f]]
+ rdd = sc.parallelize(data, 2)
+ model = FPGrowth.train(rdd, 0.6, 2)
+ result = model.freqItemsets().collect()
+ sorted(model.freqItemsets().collect())
+[([u'a'], 4), ([u'c'], 3), ([u'c', u'a'], 3)]
+
+def freqItemsets(self):
--- End diff --

Add an blank line before `def ..` and add doc to this function.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-04-05 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/5213#discussion_r27781310
  
--- Diff: python/pyspark/mllib/fpm.py ---
@@ -0,0 +1,67 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from pyspark import SparkContext
+from pyspark.mllib.common import JavaModelWrapper, callMLlibFunc, 
inherit_doc
+
+__all__ = ['FPGrowth', 'FPGrowthModel']
+
+
+@inherit_doc
+class FPGrowthModel(JavaModelWrapper):
+
+A FP-Growth model for mining frequent itemsets using the Parallel 
FP-Growth algorithm.
+
+ data = [[a, b, c], [a, b, d, e], [a, c, e], 
[a, c, f]]
+ rdd = sc.parallelize(data, 2)
+ model = FPGrowth.train(rdd, 0.6, 2)
+ result = model.freqItemsets().collect()
+ sorted(model.freqItemsets().collect())
+[([u'a'], 4), ([u'c'], 3), ([u'c', u'a'], 3)]
+
+def freqItemsets(self):
+return self.call(getFreqItemsets)
+
+
+class FPGrowth(object):
+
+@classmethod
+def train(cls, data, minSupport=0.3, numPartitions=-1):
+
+Computes an FP-Growth model that contains frequent itemsets.
+:param data:The input data set, each element contains 
a transaction.
--- End diff --

line too wide


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-04-05 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/5213#discussion_r27781306
  
--- Diff: python/pyspark/mllib/fpm.py ---
@@ -0,0 +1,67 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from pyspark import SparkContext
+from pyspark.mllib.common import JavaModelWrapper, callMLlibFunc, 
inherit_doc
+
+__all__ = ['FPGrowth', 'FPGrowthModel']
+
+
+@inherit_doc
+class FPGrowthModel(JavaModelWrapper):
+
+A FP-Growth model for mining frequent itemsets using the Parallel 
FP-Growth algorithm.
--- End diff --

In Python doc, we limit the line width to 72 (following PEP8). This doesn't 
include the code example in the doc. Please update the doc strings in your PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-04-05 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/5213#discussion_r27781309
  
--- Diff: python/pyspark/mllib/fpm.py ---
@@ -0,0 +1,67 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from pyspark import SparkContext
+from pyspark.mllib.common import JavaModelWrapper, callMLlibFunc, 
inherit_doc
+
+__all__ = ['FPGrowth', 'FPGrowthModel']
+
+
+@inherit_doc
+class FPGrowthModel(JavaModelWrapper):
+
+A FP-Growth model for mining frequent itemsets using the Parallel 
FP-Growth algorithm.
+
+ data = [[a, b, c], [a, b, d, e], [a, c, e], 
[a, c, f]]
+ rdd = sc.parallelize(data, 2)
+ model = FPGrowth.train(rdd, 0.6, 2)
+ result = model.freqItemsets().collect()
+ sorted(model.freqItemsets().collect())
+[([u'a'], 4), ([u'c'], 3), ([u'c', u'a'], 3)]
+
+def freqItemsets(self):
+return self.call(getFreqItemsets)
+
+
+class FPGrowth(object):
--- End diff --

add doc


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-04-05 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/5213#discussion_r27781307
  
--- Diff: python/pyspark/mllib/fpm.py ---
@@ -0,0 +1,67 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from pyspark import SparkContext
+from pyspark.mllib.common import JavaModelWrapper, callMLlibFunc, 
inherit_doc
+
+__all__ = ['FPGrowth', 'FPGrowthModel']
+
+
+@inherit_doc
+class FPGrowthModel(JavaModelWrapper):
+
+A FP-Growth model for mining frequent itemsets using the Parallel 
FP-Growth algorithm.
+
+ data = [[a, b, c], [a, b, d, e], [a, c, e], 
[a, c, f]]
+ rdd = sc.parallelize(data, 2)
+ model = FPGrowth.train(rdd, 0.6, 2)
+ result = model.freqItemsets().collect()
--- End diff --

remove this line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-04-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5213#issuecomment-89633708
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29713/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-03-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5213#issuecomment-87968740
  
  [Test build #29463 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29463/consoleFull)
 for   PR 5213 at commit 
[`a2d7cf7`](https://github.com/apache/spark/commit/a2d7cf797d7fc681ecf3a8dfd0908100d282f4ce).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-03-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5213#issuecomment-87993418
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29463/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-03-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5213#issuecomment-87993400
  
  [Test build #29463 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29463/consoleFull)
 for   PR 5213 at commit 
[`a2d7cf7`](https://github.com/apache/spark/commit/a2d7cf797d7fc681ecf3a8dfd0908100d282f4ce).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class FPGrowthModel(JavaModelWrapper):`
  * `class FPGrowth(object):`

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-03-31 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/5213#discussion_r27506245
  
--- Diff: python/pyspark/mllib/fpm.py ---
@@ -0,0 +1,81 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from pyspark import SparkContext
+from pyspark.mllib.common import JavaModelWrapper, callMLlibFunc, 
inherit_doc
+
+__all__ = ['FPGrowth', 'FPGrowthModel']
+
+
+@inherit_doc
+class FPGrowthModel(JavaModelWrapper):
+
+A FP-Growth model for mining frequent itemsets using the Parallel 
FP-Growth algorithm.
+
+ r1 = [r,z,h,k,p]
+ r2 = [z,y,x,w,v,u,t,s]
+ r3 = [s,x,o,n,r]
+ r4 = [x,z,y,m,t,s,q,e]
+ r5 = [z]
+ r6 = [x,z,y,r,q,t,p]
+ rdd = sc.parallelize([r1,r2,r3,r4,r5,r6], 2)
+ model = FPGrowth.train(rdd, 0.5, 2)
+ result = model.freqItemsets().collect()
--- End diff --

Maybe we should increase the threshold to make the expected output shorter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-03-31 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/5213#discussion_r27506251
  
--- Diff: python/pyspark/mllib/fpm.py ---
@@ -0,0 +1,81 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from pyspark import SparkContext
+from pyspark.mllib.common import JavaModelWrapper, callMLlibFunc, 
inherit_doc
+
+__all__ = ['FPGrowth', 'FPGrowthModel']
+
+
+@inherit_doc
+class FPGrowthModel(JavaModelWrapper):
+
+A FP-Growth model for mining frequent itemsets using the Parallel 
FP-Growth algorithm.
+
+ r1 = [r,z,h,k,p]
+ r2 = [z,y,x,w,v,u,t,s]
+ r3 = [s,x,o,n,r]
+ r4 = [x,z,y,m,t,s,q,e]
+ r5 = [z]
+ r6 = [x,z,y,r,q,t,p]
+ rdd = sc.parallelize([r1,r2,r3,r4,r5,r6], 2)
+ model = FPGrowth.train(rdd, 0.5, 2)
+ result = model.freqItemsets().collect()
+ expected = [([us], 3), ([uz], 5), ([ux], 4), ([ut], 3), 
([uy], 3), ([ur],3),
+... ([ux, uz], 3), ([uy, ut], 3), ([ut, ux], 3), 
([us,ux], 3),
+... ([uy, ux], 3), ([uy, uz], 3), ([ut, uz], 3), ([uy, 
ux, uz], 3),
+... ([ut, ux, uz], 3), ([uy, ut, uz], 3), ([uy, ut, 
ux], 3),
+... ([uy, ut, ux, uz], 3)]
+ diff1 = [x for x in result if x not in expected]
+ len(diff1)
+0
+ diff2 = [x for x in expected if x not in result]
+ len(diff2)
+0
+
+def freqItemsets(self):
--- End diff --

Empty line before this line and doc are needed. It might be convenient if 
we follow the Java/Scala implementation and use a namedtuple to wrap the 
result. So users can call `items` and `freq` instead of `[0]` and `[1]`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-03-31 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/5213#discussion_r27506202
  
--- Diff: python/pyspark/mllib/fpm.py ---
@@ -0,0 +1,81 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from pyspark import SparkContext
+from pyspark.mllib.common import JavaModelWrapper, callMLlibFunc, 
inherit_doc
+
+__all__ = ['FPGrowth', 'FPGrowthModel']
+
+
+@inherit_doc
+class FPGrowthModel(JavaModelWrapper):
+
+A FP-Growth model for mining frequent itemsets using the Parallel 
FP-Growth algorithm.
+
+ r1 = [r,z,h,k,p]
+ r2 = [z,y,x,w,v,u,t,s]
+ r3 = [s,x,o,n,r]
+ r4 = [x,z,y,m,t,s,q,e]
+ r5 = [z]
+ r6 = [x,z,y,r,q,t,p]
+ rdd = sc.parallelize([r1,r2,r3,r4,r5,r6], 2)
+ model = FPGrowth.train(rdd, 0.5, 2)
+ result = model.freqItemsets().collect()
+ expected = [([us], 3), ([uz], 5), ([ux], 4), ([ut], 3), 
([uy], 3), ([ur],3),
+... ([ux, uz], 3), ([uy, ut], 3), ([ut, ux], 3), 
([us,ux], 3),
+... ([uy, ux], 3), ([uy, uz], 3), ([ut, uz], 3), ([uy, 
ux, uz], 3),
+... ([ut, ux, uz], 3), ([uy, ut, uz], 3), ([uy, ut, 
ux], 3),
+... ([uy, ut, ux, uz], 3)]
+ diff1 = [x for x in result if x not in expected]
--- End diff --

This is not necessary, if the test above works.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-03-31 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/5213#discussion_r27506158
  
--- Diff: python/pyspark/mllib/fpm.py ---
@@ -0,0 +1,81 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from pyspark import SparkContext
+from pyspark.mllib.common import JavaModelWrapper, callMLlibFunc, 
inherit_doc
+
+__all__ = ['FPGrowth', 'FPGrowthModel']
+
+
+@inherit_doc
+class FPGrowthModel(JavaModelWrapper):
+
+A FP-Growth model for mining frequent itemsets using the Parallel 
FP-Growth algorithm.
+
+ r1 = [r,z,h,k,p]
--- End diff --

Space After `,` and make `r1, r2, ...` a single array. The doctests are 
also used as code examples in the generated doc. So we should try to make it 
simple.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-03-31 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/5213#discussion_r27506120
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -407,6 +406,35 @@ private[python] class PythonMLLibAPI extends 
Serializable {
   }
 
   /**
+   * A Wrapper of FPGrowthModel to provide helper method for Python
+   */
+  private[python] class FPGrowthModelWrapper(model: FPGrowthModel[Any])
--- End diff --

It might be simpler to move this out of `class PythonMLLibAPI` to use with 
py4j. See #5243.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-03-31 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/5213#discussion_r27506124
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -407,6 +406,35 @@ private[python] class PythonMLLibAPI extends 
Serializable {
   }
 
   /**
+   * A Wrapper of FPGrowthModel to provide helper method for Python
+   */
+  private[python] class FPGrowthModelWrapper(model: FPGrowthModel[Any])
+extends FPGrowthModel(model.freqItemsets) {
+
+def getFreqItemsets: RDD[Array[Any]] = {
+  SerDe.fromTuple2RDD(model.freqItemsets.map(x = (x.javaItems, 
x.freq)))
+}
+  }
+
+  /**
+   * Java stub for Python mllib FPGrowth.train().  This stub returns a 
handle
+   * to the Java object instead of the content of the Java object.  Extra 
care
+   * needs to be taken in the Python code to ensure it gets freed on exit; 
see
+   * the Py4J documentation.
+   */
+  def trainFPGrowthModel(
+  data: JavaRDD[java.lang.Iterable[Any]],
+  minSupport: Double,
+  numPartitions: Int): FPGrowthModel[Any] = {
+val fpm = new FPGrowth()
--- End diff --

`fpm` - `fpg`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-03-31 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/5213#discussion_r27506168
  
--- Diff: python/pyspark/mllib/fpm.py ---
@@ -0,0 +1,81 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from pyspark import SparkContext
+from pyspark.mllib.common import JavaModelWrapper, callMLlibFunc, 
inherit_doc
+
+__all__ = ['FPGrowth', 'FPGrowthModel']
+
+
+@inherit_doc
+class FPGrowthModel(JavaModelWrapper):
+
+A FP-Growth model for mining frequent itemsets using the Parallel 
FP-Growth algorithm.
+
+ r1 = [r,z,h,k,p]
+ r2 = [z,y,x,w,v,u,t,s]
+ r3 = [s,x,o,n,r]
+ r4 = [x,z,y,m,t,s,q,e]
+ r5 = [z]
+ r6 = [x,z,y,r,q,t,p]
+ rdd = sc.parallelize([r1,r2,r3,r4,r5,r6], 2)
+ model = FPGrowth.train(rdd, 0.5, 2)
+ result = model.freqItemsets().collect()
--- End diff --

Use 
~~~
 sorted(model.freqItemsets().collect())
~~~

and put the results as expected output to verify.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-03-30 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/5213#issuecomment-87813779
  
@yanboliang  Thanks for the updates.  Can you please fix the merge issues?  
(Rebasing off of the current master is often easiest.)

Also, can you please add documentation to FPGrowth.train()?  Copying 
algorithm + parameter documentation from the Scala docs should be fine.  That 
should be it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-03-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5213#issuecomment-87372913
  
  [Test build #29364 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29364/consoleFull)
 for   PR 5213 at commit 
[`e3f17cb`](https://github.com/apache/spark/commit/e3f17cbafea25013c98f389782d4d3c37f4e1dca).
 * This patch **does not merge cleanly**.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-03-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5213#issuecomment-87388652
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29364/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-03-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5213#issuecomment-87388637
  
  [Test build #29364 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29364/consoleFull)
 for   PR 5213 at commit 
[`e3f17cb`](https://github.com/apache/spark/commit/e3f17cbafea25013c98f389782d4d3c37f4e1dca).
 * This patch **passes all tests**.

 * This patch **does not merge cleanly**.
 * This patch adds the following public classes _(experimental)_:
  * `class FPGrowthModel(JavaModelWrapper):`
  * `class FPGrowth(object):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-03-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/5213#discussion_r27331205
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -425,6 +426,33 @@ private[python] class PythonMLLibAPI extends 
Serializable {
   }
 
   /**
+   * A Wrapper of FPGrowthModel to provide helpfer method for Python
+   */
+  private[python] class FPGrowthModelWrapper(model: FPGrowthModel[Any])
+extends FPGrowthModel(model.freqItemsets) {
+def getFreqItemsets: RDD[Array[Any]] = {
--- End diff --

style: insert newline between class header  method def (since the class 
header spans multiple lines)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-03-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/5213#discussion_r27331216
  
--- Diff: python/pyspark/mllib/fpm.py ---
@@ -0,0 +1,74 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from pyspark import SparkContext
+from pyspark.mllib.common import JavaModelWrapper, callMLlibFunc, 
inherit_doc
+
+__all__ = ['FPGrowth', 'FPGrowthModel']
+
+
+@inherit_doc
+class FPGrowthModel(JavaModelWrapper):
+
+A FP-Growth model for mining frequent itemsets using the Parallel 
FP-Growth algorithm.
+
+ r1 = [r,z,h,k,p]
+ r2 = [z,y,x,w,v,u,t,s]
+ r3 = [s,x,o,n,r]
+ r4 = [x,z,y,m,t,s,q,e]
+ r5 = [z]
+ r6 = [x,z,y,r,q,t,p]
+ rdd = sc.parallelize([r1,r2,r3,r4,r5,r6], 2)
+ model = FPGrowth.train(rdd, 0.5, 2)
+ result = model.freqItemsets().collect()
+ expected = [([us], 3), ([uz], 5), ([ux], 4), ([ut], 3), 
([uy], 3), ([ur],3),
+... ([ux, uz], 3), ([uy, ut], 3), ([ut, ux], 3), 
([us,ux], 3),
+... ([uy, ux], 3), ([uy, uz], 3), ([ut, uz], 3), ([uy, 
ux, uz], 3),
+... ([ut, ux, uz], 3), ([uy, ut, uz], 3), ([uy, ut, 
ux], 3),
+... ([uy, ut, ux, uz], 3)]
+ diff1 = [x for x in result if x not in expected]
+ len(diff1)
+0
+ diff2 = [x for x in expected if x not in result]
+ len(diff2)
+0
+
+def freqItemsets(self):
+return self.call(getFreqItemsets)
+
+
+class FPGrowth(object):
+
+@classmethod
+def train(cls, data, minSupport=0.3, numPartition=-1):
--- End diff --

numPartition -- numPartitions (with an s)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-03-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/5213#discussion_r27331201
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -425,6 +426,33 @@ private[python] class PythonMLLibAPI extends 
Serializable {
   }
 
   /**
+   * A Wrapper of FPGrowthModel to provide helpfer method for Python
--- End diff --

typo: helper


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-03-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/5213#discussion_r27331208
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -425,6 +426,33 @@ private[python] class PythonMLLibAPI extends 
Serializable {
   }
 
   /**
+   * A Wrapper of FPGrowthModel to provide helpfer method for Python
+   */
+  private[python] class FPGrowthModelWrapper(model: FPGrowthModel[Any])
+extends FPGrowthModel(model.freqItemsets) {
+def getFreqItemsets: RDD[Array[Any]] = {
+  SerDe.fromTuple2RDD(model.freqItemsets.map(x = (x.javaItems, 
x.freq)))
+}
+  }
+
+  /**
+   * Java stub for Python mllib FPGrowth.train().  This stub returns a 
handle
+   * to the Java object instead of the content of the Java object.  Extra 
care
+   * needs to be taken in the Python code to ensure it gets freed on exit; 
see
+   * the Py4J documentation.
+   */
+  def trainFPGrowthModel(data: JavaRDD[java.lang.Iterable[Any]],
--- End diff --

style: put data on next line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-03-27 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/5213#issuecomment-87092628
  
Let's keep it Experimental for now; we can hopefully remove that tag before 
the 1.4 release if no issues come up before then.

Also, can you please add doc to match the Scaladoc?  (We've been lazy about 
this with Python but should be better about making the docs match.)

Please edit python/docs/pyspark.mllib.rst to generate docs for Python.  I'd 
follow the pyspark.mllib.recommendation module for settings.

Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-03-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5213#issuecomment-86592836
  
  [Test build #29237 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29237/consoleFull)
 for   PR 5213 at commit 
[`4f26944`](https://github.com/apache/spark/commit/4f269441c7cd6b1396342f19dad804a98dbb5350).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-03-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5213#issuecomment-86644206
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29240/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-03-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5213#issuecomment-86644183
  
  [Test build #29240 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29240/consoleFull)
 for   PR 5213 at commit 
[`546494a`](https://github.com/apache/spark/commit/546494a4c5e0ee245f3a2e30ab0378d7462bf46c).
 * This patch **passes all tests**.

 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class FPGrowthModel(JavaModelWrapper):`
  * `class FPGrowth(object):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-03-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5213#issuecomment-86596561
  
  [Test build #29237 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29237/consoleFull)
 for   PR 5213 at commit 
[`4f26944`](https://github.com/apache/spark/commit/4f269441c7cd6b1396342f19dad804a98dbb5350).
 * This patch **fails to build**.

 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class FPGrowthModel(JavaModelWrapper):`
  * `class FPGrowth(object):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-03-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5213#issuecomment-86596572
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29237/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-03-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5213#issuecomment-86604567
  
  [Test build #29240 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29240/consoleFull)
 for   PR 5213 at commit 
[`546494a`](https://github.com/apache/spark/commit/546494a4c5e0ee245f3a2e30ab0378d7462bf46c).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-03-26 Thread yanboliang
GitHub user yanboliang opened a pull request:

https://github.com/apache/spark/pull/5213

[SPARK-6264] [MLLIB] Support FPGrowth algorithm in Python API

Support FPGrowth algorithm in Python API

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yanboliang/spark spark-6264

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5213.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5213


commit a924861b3a6e587d944b0f45b4cdd9229efa9e5a
Author: Yanbo Liang yblia...@gmail.com
Date:   2015-03-26T15:42:17Z

Support FPGrowth algorithm in Python API

commit 59b1feee3ad7e744a1ec4454c9e54b0d2edcffd1
Author: Yanbo Liang yblia...@gmail.com
Date:   2015-03-26T15:55:40Z

add fpm to __init__.py

commit 4f269441c7cd6b1396342f19dad804a98dbb5350
Author: Yanbo Liang yblia...@gmail.com
Date:   2015-03-26T15:56:52Z

fix typos




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org