Re: The portability of MAHOUT platform to python
Hey Ted, for maths python has numpy and scipy libraries which perform calculations in python. The reason I am stressing on python is because python is having a big collection of libraries that future mahout users can exploit if they were given this platform in python. This is something similar to what Spark project is doing. Vibhanshu On Tue, Oct 14, 2014 at 9:06 AM, Ted Dunning ted.dunn...@gmail.com wrote: It is plausible to port some of the newer scala stuff to python. It would take some thought about the right way to do it. The kicker is going to be that a lot of what Mahout does bottoms out in math that is written in Java. How that would work from Python is mysterious to me. On Mon, Oct 13, 2014 at 9:18 PM, Vibhanshu Prasad vibhanshugs...@gmail.com wrote: Hello Everyone, I am a college student who wants to contribute towards the development of the mahout library. I have been using this for last 1 year and was mesmerized by its features. I wanted to know if someone is working towards exporting this whole platform to python. If no, then is there is any possible way i can start doing it. provided that I am not a committer yet . Regards Vibhanshu
Re: The portability of MAHOUT platform to python
sure, please do the same in case you also find something new. Vibhanshu On Tue, Oct 14, 2014 at 10:45 AM, uday kiran marupati.udayki...@gmail.com wrote: HI Prasad, I am also interested in Mahout development. Could you please inform me when the work comes or when a new module development starts. Regards, Udaykiran Reddy On Tue, Oct 14, 2014 at 6:48 AM, Vibhanshu Prasad vibhanshugs...@gmail.com wrote: Hello Everyone, I am a college student who wants to contribute towards the development of the mahout library. I have been using this for last 1 year and was mesmerized by its features. I wanted to know if someone is working towards exporting this whole platform to python. If no, then is there is any possible way i can start doing it. provided that I am not a committer yet . Regards Vibhanshu
[jira] [Created] (MAHOUT-1621) k-fold cross-validation in MapReduce Random Forest example?
Tawfiq Hasanin created MAHOUT-1621: -- Summary: k-fold cross-validation in MapReduce Random Forest example? Key: MAHOUT-1621 URL: https://issues.apache.org/jira/browse/MAHOUT-1621 Project: Mahout Issue Type: Question Components: Examples Environment: Ubuntu Linux 14.04 Reporter: Tawfiq Hasanin Fix For: 1.0 My goal is to modify MapReduce Random Forest example by combining BuildForest.java and TestForest.java into a new class called RandomForest.java The main point is to input one data file which is going to be used in training and testing; with k-fold cross-validation. I have a big data with hight diminutional features and small amount of instances. Seems to be a frustrating dead-end. is this process achievable? Or is it against MapReduce nature? Thanks.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Build failed in Jenkins: Mahout-Examples-Cluster-Reuters-II #976
See https://builds.apache.org/job/Mahout-Examples-Cluster-Reuters-II/976/ -- [...truncated 196 lines...] at org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:382) ... 31 more Caused by: svn: E175002: OPTIONS request failed on '/repos/asf/mahout/trunk' at org.tmatesoft.svn.core.SVNErrorMessage.create(SVNErrorMessage.java:208) at org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection._request(HTTPConnection.java:775) ... 32 more Caused by: svn: E175002: timed out waiting for server at org.tmatesoft.svn.core.SVNErrorMessage.create(SVNErrorMessage.java:208) at org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection._request(HTTPConnection.java:514) ... 32 more Caused by: java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:618) at org.tmatesoft.svn.core.internal.util.SVNSocketConnection.run(SVNSocketConnection.java:57) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ... 4 more java.io.IOException: remote file operation failed: https://builds.apache.org/job/Mahout-Examples-Cluster-Reuters-II/ws/ at hudson.remoting.Channel@4c1e596d:ubuntu-6 at hudson.FilePath.act(FilePath.java:910) at hudson.FilePath.act(FilePath.java:887) at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:936) at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:871) at hudson.model.AbstractProject.checkout(AbstractProject.java:1414) at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:671) at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:580) at hudson.model.Run.execute(Run.java:1676) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:231) Caused by: java.io.IOException: Failed to check out https://svn.apache.org/repos/asf/mahout/trunk at hudson.scm.subversion.CheckoutUpdater$1.perform(CheckoutUpdater.java:110) at hudson.scm.subversion.WorkspaceUpdater$UpdateTask.delegateTo(WorkspaceUpdater.java:161) at hudson.scm.SubversionSCM$CheckOutTask.perform(SubversionSCM.java:1030) at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:1011) at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:987) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2462) at hudson.remoting.UserRequest.perform(UserRequest.java:118) at hudson.remoting.UserRequest.perform(UserRequest.java:48) at hudson.remoting.Request$2.run(Request.java:328) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.tmatesoft.svn.core.SVNException: svn: E175002: OPTIONS /repos/asf/mahout/trunk failed at org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:388) at org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:373) at org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:361) at org.tmatesoft.svn.core.internal.io.dav.DAVConnection.performHttpRequest(DAVConnection.java:707) at org.tmatesoft.svn.core.internal.io.dav.DAVConnection.exchangeCapabilities(DAVConnection.java:627) at org.tmatesoft.svn.core.internal.io.dav.DAVConnection.open(DAVConnection.java:102) at org.tmatesoft.svn.core.internal.io.dav.DAVRepository.openConnection(DAVRepository.java:1020) at org.tmatesoft.svn.core.internal.io.dav.DAVRepository.getLatestRevision(DAVRepository.java:180) at org.tmatesoft.svn.core.internal.wc16.SVNBasicDelegate.getRevisionNumber(SVNBasicDelegate.java:480) at org.tmatesoft.svn.core.internal.wc16.SVNBasicDelegate.getLocations(SVNBasicDelegate.java:833) at
Re: How to build a recommendation system based on mahout serving millions even billions of users ?
You should move forward to version 0.9. Take a look at more recent methods in this book: https://www.mapr.com/practical-machine-learning On Tue, Oct 14, 2014 at 2:43 AM, 王建国 jordanhao...@gmail.com wrote: Hi,Owen and all: I am a developer from china.I am building a recommendation sysytem based on mahhout in version-0.9.Since the userids and itemids are string, I need to map them to long.But I found that there is a Long-to-Int mapping provided by the function int TasteHadoopUtils.idToIndex(long). Considering there may be millions even billions of users,I wonder if it possible to have many long mapped into one int? If ture,that does do much harm . This is quite confusing.What solution should I choose in this situation?Meanwhile,I read the answer from you as followed.Could you please tell me which data structure indexed by long you use in Myrrix. Thanks in advance. wangjiangwei Question: I have read some code about item-based recommendation in version-0.6, starting from org.apache.mahout.cf.taste. hadoop.item.RecommenderJob. I found that there is a Long-to-Int mapping provided by the function int TasteHadoopUtils.idToIndex(long). Long-to-Int is performed both on userId and itemId. I wonder if it possible to have two long mapped into one int? If it is the case, then we would likely to merge vectors from different itemids/uids, right? This is quite confusing. Is it better to provide a RandomAccessSparseVector implemented by OpenLongDoubleHashMap instead of OpenIntDoubleHashMap? Thanks in advance. Wei Feng Answer: That's right. It ought to be uncommon but can happen. For recommenders, it only means that you start to treat two users or two items as the same thing. That doesn't do much harm though. Maybe one user's recs are a little funny. I do think it would have been useful to index by long, but that would have significantly increased memory requirements too. (In developing Myrrix I have switched to use a data structure indexed by long though, because it becomes more necessary to avoid the mapping.) Sean Owen
[jira] [Commented] (MAHOUT-1516) run classify-20newsgroups.sh failed cause by /tmp/mahout-work-jpan/20news-all does not exists in hdfs.
[ https://issues.apache.org/jira/browse/MAHOUT-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171809#comment-14171809 ] Tim Groeneveld commented on MAHOUT-1516: This bug has been tagged as 'PATCH AVAILABLE', where is this patch, or is it just that the issue has now been resolved in trunk? run classify-20newsgroups.sh failed cause by /tmp/mahout-work-jpan/20news-all does not exists in hdfs. -- Key: MAHOUT-1516 URL: https://issues.apache.org/jira/browse/MAHOUT-1516 Project: Mahout Issue Type: Bug Components: Examples Affects Versions: 0.9 Environment: hadoop2.2.0 mahout0.9 ubuntu12.04 Reporter: Jian Pan Priority: Minor Labels: patch Fix For: 1.0 + echo 'Copying 20newsgroups data to HDFS' Copying 20newsgroups data to HDFS + set +e + /home/jpan/Software/hadoop-2.2.0/bin/hadoop dfs -rmr /tmp/mahout-work-jpan/20news-all DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. rmr: DEPRECATED: Please use 'rm -r' instead. 14/04/17 10:26:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable rmr: `/tmp/mahout-work-jpan/20news-all': No such file or directory + set -e + /home/jpan/Software/hadoop-2.2.0/bin/hadoop dfs -put /tmp/mahout-work-jpan/20news-all /tmp/mahout-work-jpan/20news-all DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. 14/04/17 10:26:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable put: `/tmp/mahout-work-jpan/20news-all': No such file or directory -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: How to build a recommendation system based on mahout serving millions even billions of users ?
Thank you very much! It is version 0.9 I am leaning now. I will read the book as you advise. 2014-10-15 5:47 GMT+08:00 Ted Dunning ted.dunn...@gmail.com: You should move forward to version 0.9. Take a look at more recent methods in this book: https://www.mapr.com/practical-machine-learning On Tue, Oct 14, 2014 at 2:43 AM, 王建国 jordanhao...@gmail.com wrote: Hi,Owen and all: I am a developer from china.I am building a recommendation sysytem based on mahhout in version-0.9.Since the userids and itemids are string, I need to map them to long.But I found that there is a Long-to-Int mapping provided by the function int TasteHadoopUtils.idToIndex(long). Considering there may be millions even billions of users,I wonder if it possible to have many long mapped into one int? If ture,that does do much harm . This is quite confusing.What solution should I choose in this situation?Meanwhile,I read the answer from you as followed.Could you please tell me which data structure indexed by long you use in Myrrix. Thanks in advance. wangjiangwei Question: I have read some code about item-based recommendation in version-0.6, starting from org.apache.mahout.cf.taste. hadoop.item.RecommenderJob. I found that there is a Long-to-Int mapping provided by the function int TasteHadoopUtils.idToIndex(long). Long-to-Int is performed both on userId and itemId. I wonder if it possible to have two long mapped into one int? If it is the case, then we would likely to merge vectors from different itemids/uids, right? This is quite confusing. Is it better to provide a RandomAccessSparseVector implemented by OpenLongDoubleHashMap instead of OpenIntDoubleHashMap? Thanks in advance. Wei Feng Answer: That's right. It ought to be uncommon but can happen. For recommenders, it only means that you start to treat two users or two items as the same thing. That doesn't do much harm though. Maybe one user's recs are a little funny. I do think it would have been useful to index by long, but that would have significantly increased memory requirements too. (In developing Myrrix I have switched to use a data structure indexed by long though, because it becomes more necessary to avoid the mapping.) Sean Owen
Re: How to build a recommendation system based on mahout serving millions even billions of users ?
Hi,Ted. I don't know why I can't download the book.Maybe,the network is very poor.Can you sent it to me ? I am looking forward to read it.Thanks. 2014-10-15 5:47 GMT+08:00 Ted Dunning ted.dunn...@gmail.com: You should move forward to version 0.9. Take a look at more recent methods in this book: https://www.mapr.com/practical-machine-learning On Tue, Oct 14, 2014 at 2:43 AM, 王建国 jordanhao...@gmail.com wrote: Hi,Owen and all: I am a developer from china.I am building a recommendation sysytem based on mahhout in version-0.9.Since the userids and itemids are string, I need to map them to long.But I found that there is a Long-to-Int mapping provided by the function int TasteHadoopUtils.idToIndex(long). Considering there may be millions even billions of users,I wonder if it possible to have many long mapped into one int? If ture,that does do much harm . This is quite confusing.What solution should I choose in this situation?Meanwhile,I read the answer from you as followed.Could you please tell me which data structure indexed by long you use in Myrrix. Thanks in advance. wangjiangwei Question: I have read some code about item-based recommendation in version-0.6, starting from org.apache.mahout.cf.taste. hadoop.item.RecommenderJob. I found that there is a Long-to-Int mapping provided by the function int TasteHadoopUtils.idToIndex(long). Long-to-Int is performed both on userId and itemId. I wonder if it possible to have two long mapped into one int? If it is the case, then we would likely to merge vectors from different itemids/uids, right? This is quite confusing. Is it better to provide a RandomAccessSparseVector implemented by OpenLongDoubleHashMap instead of OpenIntDoubleHashMap? Thanks in advance. Wei Feng Answer: That's right. It ought to be uncommon but can happen. For recommenders, it only means that you start to treat two users or two items as the same thing. That doesn't do much harm though. Maybe one user's recs are a little funny. I do think it would have been useful to index by long, but that would have significantly increased memory requirements too. (In developing Myrrix I have switched to use a data structure indexed by long though, because it becomes more necessary to avoid the mapping.) Sean Owen