Re: The portability of MAHOUT platform to python

2014-10-14 Thread Vibhanshu Prasad
Hey Ted,

for maths python has numpy and scipy libraries which perform calculations
in python.
The reason I am stressing on python is because python is having a big
collection of libraries that future mahout users can exploit if they were
given this platform in python.  This is something similar to what Spark
project is doing.

Vibhanshu


On Tue, Oct 14, 2014 at 9:06 AM, Ted Dunning ted.dunn...@gmail.com wrote:

 It is plausible to port some of the newer scala stuff to python.  It would
 take some thought about the right way to do it.

 The kicker is going to be that a lot of what Mahout does bottoms out in
 math that is written in Java.  How that would work from Python is
 mysterious to me.


 On Mon, Oct 13, 2014 at 9:18 PM, Vibhanshu Prasad 
 vibhanshugs...@gmail.com
 wrote:

  Hello Everyone,
 
  I am a college student who wants to contribute towards the development of
  the mahout library. I have been using this for last 1 year and was
  mesmerized by its features.
 
  I wanted to know if someone is working towards exporting this whole
  platform to python.
 
  If no, then is there is any possible way i can start doing it. provided
  that I am not a committer yet .
 
  Regards
  Vibhanshu
 



Re: The portability of MAHOUT platform to python

2014-10-14 Thread Vibhanshu Prasad
sure,
please do the same in case you also find something new.

Vibhanshu

On Tue, Oct 14, 2014 at 10:45 AM, uday kiran marupati.udayki...@gmail.com
wrote:

 HI Prasad,

 I am also interested in Mahout development. Could you please inform me when
 the work comes or when a new module development starts.

 Regards,
 Udaykiran Reddy

 On Tue, Oct 14, 2014 at 6:48 AM, Vibhanshu Prasad 
 vibhanshugs...@gmail.com
 wrote:

  Hello Everyone,
 
  I am a college student who wants to contribute towards the development of
  the mahout library. I have been using this for last 1 year and was
  mesmerized by its features.
 
  I wanted to know if someone is working towards exporting this whole
  platform to python.
 
  If no, then is there is any possible way i can start doing it. provided
  that I am not a committer yet .
 
  Regards
  Vibhanshu
 



[jira] [Created] (MAHOUT-1621) k-fold cross-validation in MapReduce Random Forest example?

2014-10-14 Thread Tawfiq Hasanin (JIRA)
Tawfiq Hasanin created MAHOUT-1621:
--

 Summary: k-fold cross-validation in MapReduce Random Forest 
example?
 Key: MAHOUT-1621
 URL: https://issues.apache.org/jira/browse/MAHOUT-1621
 Project: Mahout
  Issue Type: Question
  Components: Examples
 Environment: Ubuntu Linux 14.04
Reporter: Tawfiq Hasanin
 Fix For: 1.0


My goal is to modify MapReduce Random Forest example by combining 
BuildForest.java and TestForest.java into a new class called RandomForest.java

The main point is to input one data file which is going to be used in training 
and testing; with k-fold cross-validation. 

I have a big data with hight diminutional features and small amount of 
instances. 

Seems to be a frustrating dead-end. is this process achievable? Or is it 
against MapReduce nature? 

Thanks..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: Mahout-Examples-Cluster-Reuters-II #976

2014-10-14 Thread Apache Jenkins Server
See https://builds.apache.org/job/Mahout-Examples-Cluster-Reuters-II/976/

--
[...truncated 196 lines...]
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:382)
... 31 more
Caused by: svn: E175002: OPTIONS request failed on '/repos/asf/mahout/trunk'
at 
org.tmatesoft.svn.core.SVNErrorMessage.create(SVNErrorMessage.java:208)
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection._request(HTTPConnection.java:775)
... 32 more
Caused by: svn: E175002: timed out waiting for server
at 
org.tmatesoft.svn.core.SVNErrorMessage.create(SVNErrorMessage.java:208)
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection._request(HTTPConnection.java:514)
... 32 more
Caused by: java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:618)
at 
org.tmatesoft.svn.core.internal.util.SVNSocketConnection.run(SVNSocketConnection.java:57)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
... 4 more
java.io.IOException: remote file operation failed: 
https://builds.apache.org/job/Mahout-Examples-Cluster-Reuters-II/ws/ at 
hudson.remoting.Channel@4c1e596d:ubuntu-6
at hudson.FilePath.act(FilePath.java:910)
at hudson.FilePath.act(FilePath.java:887)
at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:936)
at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:871)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1414)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:671)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:580)
at hudson.model.Run.execute(Run.java:1676)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:231)
Caused by: java.io.IOException: Failed to check out 
https://svn.apache.org/repos/asf/mahout/trunk
at 
hudson.scm.subversion.CheckoutUpdater$1.perform(CheckoutUpdater.java:110)
at 
hudson.scm.subversion.WorkspaceUpdater$UpdateTask.delegateTo(WorkspaceUpdater.java:161)
at 
hudson.scm.SubversionSCM$CheckOutTask.perform(SubversionSCM.java:1030)
at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:1011)
at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:987)
at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2462)
at hudson.remoting.UserRequest.perform(UserRequest.java:118)
at hudson.remoting.UserRequest.perform(UserRequest.java:48)
at hudson.remoting.Request$2.run(Request.java:328)
at 
hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.tmatesoft.svn.core.SVNException: svn: E175002: OPTIONS 
/repos/asf/mahout/trunk failed
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:388)
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:373)
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:361)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVConnection.performHttpRequest(DAVConnection.java:707)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVConnection.exchangeCapabilities(DAVConnection.java:627)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVConnection.open(DAVConnection.java:102)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVRepository.openConnection(DAVRepository.java:1020)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVRepository.getLatestRevision(DAVRepository.java:180)
at 
org.tmatesoft.svn.core.internal.wc16.SVNBasicDelegate.getRevisionNumber(SVNBasicDelegate.java:480)
at 
org.tmatesoft.svn.core.internal.wc16.SVNBasicDelegate.getLocations(SVNBasicDelegate.java:833)
at 

Re: How to build a recommendation system based on mahout serving millions even billions of users ?

2014-10-14 Thread Ted Dunning
You should move forward to version 0.9.

Take a look at more recent methods in this book:

https://www.mapr.com/practical-machine-learning



On Tue, Oct 14, 2014 at 2:43 AM, 王建国 jordanhao...@gmail.com wrote:

 Hi,Owen and all:
 I am a developer from china.I am building a recommendation sysytem
 based on mahhout in version-0.9.Since the userids and itemids are string,
 I need to map them to long.But I found that  there is a Long-to-Int mapping
 provided by the function int TasteHadoopUtils.idToIndex(long).
 Considering there may be millions  even billions of users,I wonder if  it
 possible to have many long mapped into one int? If ture,that does do much
 harm .
 This is quite confusing.What solution should I choose in this
 situation?Meanwhile,I read the answer from you as followed.Could you please
 tell me
 which data structure indexed by long you use in Myrrix. Thanks in advance.
 wangjiangwei

 Question:
 I have read some code about item-based recommendation in version-0.6,
 starting from org.apache.mahout.cf.taste.
 hadoop.item.RecommenderJob. I found that there is a Long-to-Int mapping
 provided by the function int TasteHadoopUtils.idToIndex(long).
 Long-to-Int is performed both on userId and itemId. I wonder if it possible
 to have two long mapped into one int? If it is the case, then we would
 likely to merge vectors from different itemids/uids, right? This is quite
 confusing.
 Is it better to provide a RandomAccessSparseVector implemented by
 OpenLongDoubleHashMap instead of OpenIntDoubleHashMap? Thanks in advance.
 Wei Feng
 Answer:
 That's right. It ought to be uncommon but can happen. For recommenders,
 it
 only means that you start to treat two users or two items as the same
 thing. That doesn't do much harm though. Maybe one user's recs are a little
 funny.
 I do think it would have been useful to index by long, but that would have
 significantly increased memory requirements too.
 (In developing Myrrix I have switched to use a data structure indexed by
 long though, because it becomes more necessary to avoid the mapping.)
 Sean Owen



[jira] [Commented] (MAHOUT-1516) run classify-20newsgroups.sh failed cause by /tmp/mahout-work-jpan/20news-all does not exists in hdfs.

2014-10-14 Thread Tim Groeneveld (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171809#comment-14171809
 ] 

Tim Groeneveld commented on MAHOUT-1516:


This bug has been tagged as 'PATCH AVAILABLE', where is this patch, or is it 
just that the issue has now been resolved in trunk?

 run classify-20newsgroups.sh failed cause by /tmp/mahout-work-jpan/20news-all 
 does not exists in hdfs.
 --

 Key: MAHOUT-1516
 URL: https://issues.apache.org/jira/browse/MAHOUT-1516
 Project: Mahout
  Issue Type: Bug
  Components: Examples
Affects Versions: 0.9
 Environment: hadoop2.2.0 mahout0.9 ubuntu12.04 
Reporter: Jian Pan
Priority: Minor
  Labels: patch
 Fix For: 1.0


 + echo 'Copying 20newsgroups data to HDFS'
 Copying 20newsgroups data to HDFS
 + set +e
 + /home/jpan/Software/hadoop-2.2.0/bin/hadoop dfs -rmr 
 /tmp/mahout-work-jpan/20news-all
 DEPRECATED: Use of this script to execute hdfs command is deprecated.
 Instead use the hdfs command for it.
 rmr: DEPRECATED: Please use 'rm -r' instead.
 14/04/17 10:26:25 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 rmr: `/tmp/mahout-work-jpan/20news-all': No such file or directory
 + set -e
 + /home/jpan/Software/hadoop-2.2.0/bin/hadoop dfs -put 
 /tmp/mahout-work-jpan/20news-all /tmp/mahout-work-jpan/20news-all
 DEPRECATED: Use of this script to execute hdfs command is deprecated.
 Instead use the hdfs command for it.
 14/04/17 10:26:26 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 put: `/tmp/mahout-work-jpan/20news-all': No such file or directory



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: How to build a recommendation system based on mahout serving millions even billions of users ?

2014-10-14 Thread 王建国
Thank you very much! It is version 0.9 I am leaning now. I will read the
book as you advise.

2014-10-15 5:47 GMT+08:00 Ted Dunning ted.dunn...@gmail.com:

 You should move forward to version 0.9.

 Take a look at more recent methods in this book:

 https://www.mapr.com/practical-machine-learning



 On Tue, Oct 14, 2014 at 2:43 AM, 王建国 jordanhao...@gmail.com wrote:

  Hi,Owen and all:
  I am a developer from china.I am building a recommendation sysytem
  based on mahhout in version-0.9.Since the userids and itemids are string,
  I need to map them to long.But I found that  there is a Long-to-Int
 mapping
  provided by the function int TasteHadoopUtils.idToIndex(long).
  Considering there may be millions  even billions of users,I wonder if  it
  possible to have many long mapped into one int? If ture,that does do much
  harm .
  This is quite confusing.What solution should I choose in this
  situation?Meanwhile,I read the answer from you as followed.Could you
 please
  tell me
  which data structure indexed by long you use in Myrrix. Thanks in
 advance.
  wangjiangwei
 
  Question:
  I have read some code about item-based recommendation in version-0.6,
  starting from org.apache.mahout.cf.taste.
  hadoop.item.RecommenderJob. I found that there is a Long-to-Int mapping
  provided by the function int TasteHadoopUtils.idToIndex(long).
  Long-to-Int is performed both on userId and itemId. I wonder if it
 possible
  to have two long mapped into one int? If it is the case, then we would
  likely to merge vectors from different itemids/uids, right? This is quite
  confusing.
  Is it better to provide a RandomAccessSparseVector implemented by
  OpenLongDoubleHashMap instead of OpenIntDoubleHashMap? Thanks in advance.
  Wei Feng
  Answer:
  That's right. It ought to be uncommon but can happen. For
 recommenders,
  it
  only means that you start to treat two users or two items as the same
  thing. That doesn't do much harm though. Maybe one user's recs are a
 little
  funny.
  I do think it would have been useful to index by long, but that would
 have
  significantly increased memory requirements too.
  (In developing Myrrix I have switched to use a data structure indexed by
  long though, because it becomes more necessary to avoid the mapping.)
  Sean Owen
 



Re: How to build a recommendation system based on mahout serving millions even billions of users ?

2014-10-14 Thread 王建国
Hi,Ted.
   I don't know why I can't download the book.Maybe,the network is very
poor.Can you sent it to me ? I am looking forward to read it.Thanks.

2014-10-15 5:47 GMT+08:00 Ted Dunning ted.dunn...@gmail.com:

 You should move forward to version 0.9.

 Take a look at more recent methods in this book:

 https://www.mapr.com/practical-machine-learning



 On Tue, Oct 14, 2014 at 2:43 AM, 王建国 jordanhao...@gmail.com wrote:

  Hi,Owen and all:
  I am a developer from china.I am building a recommendation sysytem
  based on mahhout in version-0.9.Since the userids and itemids are string,
  I need to map them to long.But I found that  there is a Long-to-Int
 mapping
  provided by the function int TasteHadoopUtils.idToIndex(long).
  Considering there may be millions  even billions of users,I wonder if  it
  possible to have many long mapped into one int? If ture,that does do much
  harm .
  This is quite confusing.What solution should I choose in this
  situation?Meanwhile,I read the answer from you as followed.Could you
 please
  tell me
  which data structure indexed by long you use in Myrrix. Thanks in
 advance.
  wangjiangwei
 
  Question:
  I have read some code about item-based recommendation in version-0.6,
  starting from org.apache.mahout.cf.taste.
  hadoop.item.RecommenderJob. I found that there is a Long-to-Int mapping
  provided by the function int TasteHadoopUtils.idToIndex(long).
  Long-to-Int is performed both on userId and itemId. I wonder if it
 possible
  to have two long mapped into one int? If it is the case, then we would
  likely to merge vectors from different itemids/uids, right? This is quite
  confusing.
  Is it better to provide a RandomAccessSparseVector implemented by
  OpenLongDoubleHashMap instead of OpenIntDoubleHashMap? Thanks in advance.
  Wei Feng
  Answer:
  That's right. It ought to be uncommon but can happen. For
 recommenders,
  it
  only means that you start to treat two users or two items as the same
  thing. That doesn't do much harm though. Maybe one user's recs are a
 little
  funny.
  I do think it would have been useful to index by long, but that would
 have
  significantly increased memory requirements too.
  (In developing Myrrix I have switched to use a data structure indexed by
  long though, because it becomes more necessary to avoid the mapping.)
  Sean Owen