Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-05 Thread Suneel Marthi
) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) Changing the hadoop home to /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop-mapreduce doesn't change the output, nor does /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop-0.20-mapreduce Any idea now ? 2014-03-05 15:45 GMT+01:00 Suneel Marthi suneel_mar...@yahoo.com: Are u

Re: Rework our website

2014-03-05 Thread Suneel Marthi
+1 for Option# 2. On Wednesday, March 5, 2014 7:11 AM, Sebastian Schelter s...@apache.org wrote: Hi everyone, In our latest discussion, I argued that the lack (and errors) of documentation on our website is one of the main pain points of Mahout atm. To be honest, I'm also not very happy

Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-05 Thread Suneel Marthi
there has been a patch in even just the past few weeks that makes it work even better with 2.x. So I suppose I would build from HEAD if possible to take advantage. On Wed, Mar 5, 2014 at 4:30 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: Not sure if the CDH4 patches on top of 0.7 has fixes for M

Re: PCA with ssvd leads to StackOverFlowError

2014-03-04 Thread Suneel Marthi
I have not seen the stackoverflow error, but this code has been fixed since .8 Sent from my iPhone On Mar 4, 2014, at 12:40 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: It doesn't look like -us has been removed. At least i see it on the head of the trunk, SSVDCli.java, line 62:

Re: PCA with ssvd leads to StackOverFlowError

2014-03-04 Thread Suneel Marthi
The -us option was fixed for Mahout 0.8, seems like u r using Mahout 0.7 which had this issue (from ur stacktrace, its apparent u r using Mahout 0.7).  Please upgrade to the latest mahout version. On Tuesday, March 4, 2014 8:54 AM, Kevin Moulart kevinmoul...@gmail.com wrote: Hi, I'm

Re: wikipedia bayes quickstart example on EC2 (cloudera)

2014-03-01 Thread Suneel Marthi
Please work off of the latest Mahout 0.9, most of these issues from Mahout 0.7 have been addressed in later releases. On Saturday, March 1, 2014 12:14 PM, Jessie Wright jessie.wri...@gmail.com wrote: Hi, I'm a noob and trying to run the wikipedia bayes example on EC2 (using a cdh4.5

Re: Installation question

2014-02-24 Thread Suneel Marthi
You run mvn install in the root folder only to build the entire project, the instructions could be wrong for all u know and may need to be updated. On Monday, February 24, 2014 2:32 AM, Mahmood Naderan nt_mahm...@yahoo.com wrote: Yes you are right. One more question. I ran mvn install in

Re: Cluster Dumper in 0.9

2014-02-23 Thread Suneel Marthi
also attached Cluster Metadata On Wed, Feb 19, 2014 at 9:21 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: R u running clusterdump or seqdumper? Could u paste the commands that u had run and their respective outputs? On Wednesday, February 19, 2014 6:16 AM, Bikash Gupta

Re: complementary naive bayes classifier

2014-02-23 Thread Suneel Marthi
the naive bayes. Will debug the code later on to discover more details. A general question, what are the options available in Mahout when we have very imbalanced data sets? Regards, On Fri, Feb 21, 2014 at 12:09 AM, Suneel Marthi suneel_mar...@yahoo.comwrote: Complimentary Naive Bayes does exist

Re: Cluster Dumper in 0.9

2014-02-23 Thread Suneel Marthi
, 2014 at 6:05 AM, Suneel Marthi suneel_mar...@yahoo.com wrote: The key in the CSV is the clusterId (and not the named vector). Here's the complete code snippet which should make sense. {Code}     Cluster cluster = clusterWritable.getValue();     line.append(cluster.getId

Re: Use Naïve Bayes on a large CSV

2014-02-20 Thread Suneel Marthi
To convert input CSV to vectors, u can either: a) Use CSVIterator b) use InputDriver Either of the above should generate vectors from input CSV that could then be fed into Mahout classifier/clustering jobs. On Thursday, February 20, 2014 5:57 AM, Kevin Moulart kevinmoul...@gmail.com

Re: Mapreduce job failed

2014-02-20 Thread Suneel Marthi
Seems like u r running this on HAdoop 2.2 (officially not supported for Mahout 0.8 or 0.9), work around is to run this in sequential mode with -xm sequential. On Thursday, February 20, 2014 1:36 PM, Zhang, Pengchu pzh...@sandia.gov wrote: Hello, I am trying to seqdirirectory with mahout

Re: Mapreduce job failed

2014-02-20 Thread Suneel Marthi
... and the reason for this failing is that 'TaskAttemptContext' which was a Class in Hadoop 1.x has now become an interface in Hadoop 2.2. Suggest that u execute this job in non-MR mode with '-xm sequential'.  On Thursday, February 20, 2014 2:26 PM, Suneel Marthi suneel_mar...@yahoo.com

Re: [EXTERNAL] Re: Mapreduce job failed

2014-02-20 Thread Suneel Marthi
mode? 2. It is too bad that Hadoop2.2. does not support for newer versions of Mahout. Are you aware of that Hadoop 1.x working with Mahout 0.8 0r 0.9 on MR? I do have a large dataset to be clustered. Thanks. Pengchu -Original Message- From: Suneel Marthi [mailto:suneel_mar

Re: [EXTERNAL] Re: Mapreduce job failed

2014-02-20 Thread Suneel Marthi
that its trackable?  As we r now working towards Mahout 1.0 and Hadoop 2.x compatibility its good that u have reported this issue. Thanks. Thanks. Pengchu -Original Message- From: Suneel Marthi [mailto:suneel_mar...@yahoo.com] Sent: Thursday, February 20, 2014 1:17 PM To: user

Re: [EXTERNAL] Re: Mapreduce job failed

2014-02-20 Thread Suneel Marthi
- From: Suneel Marthi [mailto:suneel_mar...@yahoo.com] Sent: Thursday, February 20, 2014 2:35 PM To: user@mahout.apache.org Subject: Re: [EXTERNAL] Re: Mapreduce job failed On Thursday, February 20, 2014 4:26 PM, Zhang, Pengchu pzh...@sandia.gov wrote: Thanks, it has been executed

Re: complementary naive bayes classifier

2014-02-20 Thread Suneel Marthi
Complimentary Naive Bayes does exist in Mahout (invoked with -c option when running BayesDriver). The code for ThetaSummer job does exist and the code being still commented out (been that way since Mahout 0.7) could be either due to oversight or due to not having tested Theta Normalization

Re: complementary naive bayes classifier

2014-02-20 Thread Suneel Marthi
, February 21, 2014 12:10 AM, Suneel Marthi suneel_mar...@yahoo.com wrote: Complimentary Naive Bayes does exist in Mahout (invoked with -c option when running BayesDriver). The code for ThetaSummer job does exist and the code being still commented out (been that way since Mahout 0.7) could be either

Re: [Edit] Approach for Clustering Data

2014-02-18 Thread Suneel Marthi
On Tuesday, February 18, 2014 3:37 AM, Bikash Gupta bikash.gupt...@gmail.com wrote: Ted/Peter, Thanks for the response. This is exactly what I am trying to achieve. May be I was not able to put my questions clearly. I am clustering on few variables of Customer/User(except their

Re: Mahout 0.8, Hadoop 1.2.1 and Lucene version

2014-02-18 Thread Suneel Marthi
You definitely don't have to mess with hadoop source. On Tuesday, February 18, 2014 10:28 AM, Stamatis Rapanakis stamrapana...@gmail.com wrote: I try to run an example and get the following error: eb 18, 2014 4:31:28 PM org.apache.hadoop.mapred.LocalJobRunner$Job run WARNING:

Re: reduce is too slow in StreamingKmeans

2014-02-18 Thread Suneel Marthi
Streaming KMeans runs with a single reducer that runs Ball KMeans and hence the slow performance that you have been experiencing. How did u come up with -km 63000? Given that u would like 1 clusters (= k) and have 2,000,000 datapoints (= n) so k * ln(n) = 1 * ln(2 * 10^6)  = 145087

Apache Mahout 0.9 released

2014-02-18 Thread Suneel Marthi
The Apache Mahout PMC is pleased to announce the release of Mahout 0.9. Mahout's goal is to build scalable machine learning libraries focused primarily in the areas of collaborative filtering (recommenders), clustering and classification (known collectively as the 3Cs), as well as the necessary

Mahout unavailable from mirrors

2014-02-15 Thread Suneel Marthi
Apache Mahout (all releases) are presently unavailable for download as all the Mahout releases were accidentally blown out from all the mirrors during Infrastructure maintenance. Anyone looking to download Mahout latest or older releases can do so from the archives at

Re: Mahout unavailable from mirrors

2014-02-15 Thread Suneel Marthi
. On Saturday, February 15, 2014 8:08 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: Apache Mahout (all releases) are presently unavailable for download as all the Mahout releases were accidentally blown out from all the mirrors during Infrastructure maintenance. Anyone looking to download Mahout

Re: seqdumper output?

2014-02-11 Thread Suneel Marthi
You should run the clusterdump on /home/r9r/seqTest/seqKmeans/clusters-1-final/part-x to see the points that are in the cluster. But u need a dictionary for that which wouldn't be available if the vectors were generated from CSV. So one way to generate a dictionary for a CSV and verify the

Re: Popularity of recommender items

2014-02-08 Thread Suneel Marthi
I am not fulltime on Mahout either and have a fulltime job which is unrelated to Mahout. Its just that I have been sacrificing personal time to keep things moving on Mahout. On Saturday, February 8, 2014 3:13 PM, Ted Dunning ted.dunn...@gmail.com wrote: Thompson sampling doesn't

Re: Clustering CSV

2014-02-07 Thread Suneel Marthi
You wouldn't have a dictionary when creating vectors from CSV (via CsvIterator). If u would like to see the documents that are part of cluster, try running the cluster output thru a seqdumper and that should give the document names (or points) that belong to a cluster. You need to be working

Re: Extracting the topics of documents (LDA, Mahout 0.7)

2014-02-06 Thread Suneel Marthi
Sent from my iPhone On Feb 6, 2014, at 10:08 AM, Ted Dunning ted.dunn...@gmail.com wrote: I can't comment on the specific question that you ask, but it should not necessarily be expected that LDA will reconstruct the categories that you have in mind. It will develop categories that

Re: Using EnglishAnalyzer in KMeans

2014-02-05 Thread Suneel Marthi
You must stop using Mahout 0.5 and switch to using Mahout 0.8 or 0.9, the reasons being:- a)  Mahout 0.5 is past its shelf life and has been purged from all Apache mirrors and hence is not available for download. b)  Mahout 0.5 was using Lucene 3.x.  Mahout 0.8 and above use Lucene 4.x, Lucene

Re: Mapping from docId to clusters in the clusterdump

2014-02-02 Thread Suneel Marthi
This is an issue that was very recently fixed (infact fixed last week). Please work off of present trunk, u should see the name of the text files that r part of clusters. On Sunday, February 2, 2014 5:09 AM, Sznajder ForMailingList bs4mailingl...@gmail.com wrote: Hi, I have a directory

Re: Mapping from docId to clusters in the clusterdump

2014-02-02 Thread Suneel Marthi
This was fixed as part of jira Mahout-1410. Sent from my iPhone On Feb 2, 2014, at 5:11 AM, Suneel Marthi suneel_mar...@yahoo.com wrote: This is an issue that was very recently fixed (infact fixed last week). Please work off of present trunk, u should see the name of the text files

Re: Mapping from docId to clusters in the clusterdump

2014-02-02 Thread Suneel Marthi
:13 PM, Suneel Marthi suneel_mar...@yahoo.comwrote: This was fixed as part of jira Mahout-1410. Sent from my iPhone On Feb 2, 2014, at 5:11 AM, Suneel Marthi suneel_mar...@yahoo.com wrote: This is an issue that was very recently fixed (infact fixed last week). Please work off

Re: Mahout 0.9 Release

2014-02-02 Thread Suneel Marthi
Mahout 0.9 has been pushed to the mirrors and is available for download at http://www.apache.org/dyn/closer.cgi/mahout/ On Friday, January 31, 2014 11:21 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: The release has passed with the required votes from PMC, will be pushing 0.9

Re: Mahout 0.9 Release

2014-02-02 Thread Suneel Marthi
/2014 10:22 PM, Suneel Marthi wrote: Mahout 0.9 has been pushed to the mirrors and is available for download at http://www.apache.org/dyn/closer.cgi/mahout/ On Friday, January 31, 2014 11:21 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: The release has passed with the required votes from

Re: Mahout 0.9 Release

2014-02-02 Thread Suneel Marthi
:42 PM, Suneel Marthi suneel_mar...@yahoo.comwrote: Someone's got to update the web site to the latest release, I don't see a login or edit link to make the changes myself. Isabel??? On Sunday, February 2, 2014 4:30 PM, Sebastian Schelter s...@apache.org wrote: Hi Suneel, Thats great

Re: Using Mahout to cluster a large CSV file

2014-01-31 Thread Suneel Marthi
Use Mahout's CSVVectorIterator.java to read ur input CSV file and generate vectors. You pass in a java.io.Reader to your CSV file and it generates Dense Vectors (from CSV). U could then feed the generated vectors into KMeans clustering. On Friday, January 31, 2014 7:55 AM, Allen, Ronald L.

Re: Mahout 0.9 Release

2014-01-31 Thread Suneel Marthi
Thanks Dmitriy. That makes it +2 Sent from my iPhone On Jan 31, 2014, at 8:13 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: +1. Some specific parts I am concerned about look good. -d On Tue, Jan 28, 2014 at 4:45 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: Fixed

Re: Mahout 0.9 Release

2014-01-31 Thread Suneel Marthi
On Fri, Jan 31, 2014 at 5:32 PM, Suneel Marthi suneel_mar...@yahoo.comwrote: Thanks Dmitriy. That makes it +2 Sent from my iPhone On Jan 31, 2014, at 8:13 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: +1. Some specific parts I am concerned about look good. -d On Tue

Re: How KMeans clustering works in Mahout 0.8?

2014-01-30 Thread Suneel Marthi
No Sent from my iPhone On Jan 30, 2014, at 10:57 AM, Pat Ferrel p...@occamsmachete.com wrote: Is there any qualitative difference sequential v MR? On Jan 28, 2014, at 10:11 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: All of Mahout's clustering algos can be run in both MR and non

Re: Visualizing output of k-means

2014-01-29 Thread Suneel Marthi
Run clusterdump on the output clustered points and export that to Graphml format. Tools like Gephi, Graphviz etc should be able to then read the Graphml file to display visualizations. Sent from my iPhone On Jan 29, 2014, at 6:06 AM, mandeep singh mandeep.ma.si...@oracle.com wrote: Hi

Re: mahout command seq2sparse

2014-01-29 Thread Suneel Marthi
That's a bash script that invokes a Java class - MahoutDriver which reads the props file mentioned earlier. The props files is a mapping of the commandName to the actual Java program. For Eg:- seq2sparse would be mapped to SparseVectorsFromSequenceFiles in the props. On Wednesday, January

Re: How KMeans clustering works in Mahout 0.8?

2014-01-28 Thread Suneel Marthi
Look at KMeansDriver.java in the specified package and trace thru the code. You should see both MR and non-MR versions of kmeans impl. On Tuesday, January 28, 2014 2:35 PM, Saeed Adel Mehraban s.ade...@gmail.com wrote: I see the package, but I couldn't find anything related to map-reduce.

Mahout 0.9 Release

2014-01-28 Thread Suneel Marthi
Fixed the issues that were reported with Clustering code this past week, upgraded codebase to Lucene 4.6.1 that was released today. Here's the URL for the 0.9 release in staging:- https://repository.apache.org/content/repositories/orgapachemahout-1004/org/apache/mahout/mahout-distribution/0.9/

Re: Problem converting tokenized documents into TFIDF vectors

2014-01-26 Thread Suneel Marthi
Scott, FYI... 0.9 Release is not official yet. The project trunk's still at 0.9-SNAPSHOT. Please feel free to update the documentation. On Sunday, January 26, 2014 1:34 PM, Scott C. Cote scottcc...@gmail.com wrote: Drew, I'm sorry - I'm derelict (as opposed to dirichlet) in responding

Re: generic latent variable recommender question

2014-01-25 Thread Suneel Marthi
N(0, log\epsilon) =   Normal Distribution with Mean = 0 and Variance = log(epsilon) On Saturday, January 25, 2014 7:33 PM, Pat Ferrel p...@occamsmachete.com wrote: For anti-flood and in the vein of “UI” you can build a recommender that recommends categories or genres then get

Re: Clustering in Mahout 0.9 candidate

2014-01-24 Thread Suneel Marthi
Pat, Andrew's not filed a JIRA for this, so thanks for filing M-1410 to track this. The fix would be to modify ClusterIterator.iterateSeq() - (for the Sequential mode) to read the vector key along with the vector. For the MR mode, CIMapper.java needs to be modified to read the vector key

Re: Running Mahout Example

2014-01-22 Thread Suneel Marthi
Try examples /bin/cluster-reuters.sh Sent from my iPhone On Jan 22, 2014, at 9:56 AM, Sznajder ForMailingList bs4mailingl...@gmail.com wrote: Hi, I wished to run the mahout example for Kmeans algorithm. I suppose that it is: org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

Re: Running Mahout Example

2014-01-22 Thread Suneel Marthi
)         at java.lang.ClassLoader.loadClass(ClassLoader.java:619) Could not find the main class: classpath.  Program will exit. Running on hadoop, using /mnt/hdgpfs/shared_home/hadoop/IHC-0.20.2/bin/hadoop and HADOOP_CONF_DIR=/mnt/hdgpfs/shared_home/hadoop/IHC-0.20.2/conf Benjamin On Wed, Jan 22, 2014 at 4:59 PM, Suneel Marthi

Re: MAHOUT 0.9 Release - New URL

2014-01-22 Thread Suneel Marthi
Fixed the issues that were reported this week and restored FP mining into the codebase. Here's the URL for the final release in staging:- https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/ The artifacts have been signed with the

Re: Mahout 0.9 Release - Call for Volunteers

2014-01-22 Thread Suneel Marthi
Sekine, The only thing u r doing differently is that u r on Java 1.7u51.  I am not seeing these issues as are many others who have been testing this release. Could what you r seeing be related to http://jaxenter.com/java-security-patch-breaks-guava-library-49360.html ? On Thursday,

Re: The maintainer of FPG algorithm

2014-01-22 Thread Suneel Marthi
Now that FPG has been resurrected for 0.9, there is one another FPG implementation that was submitted and is pending review. See https://issues.apache.org/jira/browse/MAHOUT-1355. On Wednesday, January 22, 2014 10:15 PM, Ted Dunning ted.dunn...@gmail.com wrote: There is no assignment

Re: MAHOUT 0.9 Release - New URL

2014-01-20 Thread Suneel Marthi
Hmmm... that's an issue. Since both Dirichlet and Meanshift clustering have been removed from 0.9, cluster-syntheticcontrol.sh options 4,5 are not gonna work and should have been removed for 0.9. To PMC,  - rollback the release, fix this issue (and other patches that were submitted in the

Re: MAHOUT 0.9 Release - New URL

2014-01-20 Thread Suneel Marthi
This is an issue (trivial one though) that needs to be fixed for 0.9 Release, will be rerolling the release today (in the next few hrs) and putting out a new release candidate in staging. Thanks for reporting this Andrew P. On Monday, January 20, 2014 12:34 AM, Andrew Palumbo

Re: About Parallel Frequent Growth algorithm

2014-01-20 Thread Suneel Marthi
I was asked this question too and I had no clear answer. May be it wasn't right to remove FP from the codebase. Not having this may well be one another reason for users to look at options other than Mahout. Given the issues that Frank's reported with Streaming KMeans (and I am seeing them too)

Re: Mahout 0.9 Release - Call for Volunteers

2014-01-18 Thread Suneel Marthi
from my local maven repo and indeed the tests that were failing due to that succeed. Now I just get the good ole: Unable to load realm mapping info from SCDynamicStore and the subsequently expected  KrbException Thanks, Andrew From: Suneel Marthi suneel_mar...@yahoo.com Reply-To: Suneel Marthi

Mahout 0.9 Release - Call for Volunteers

2014-01-16 Thread Suneel Marthi
Here's the new URL for Mahout 0.9 Release: https://repository.apache.org/content/repositories/orgapachemahout-1001/org/apache/mahout/mahout-buildtools/0.9/ For those volunteering to test this, some of the things to be verified: a) Verify that u can unpack the release (tar or zip) b) Verify u r

Re: Mahout 0.9 Release Candidate - VOTE

2014-01-16 Thread Suneel Marthi
. On Thu, Jan 16, 2014 at 7:04 PM, Suneel Marthi suneel_mar...@yahoo.comwrote: It would be .tar.gz file and you would find it under mahout/distribution. On Wednesday, January 15, 2014 11:45 PM, Chameera Wijebandara chameerawijeband...@gmail.com wrote: Ok let's see after fixed the URL Thank

Re: Mahout 0.9 Release - Call for Volunteers

2014-01-16 Thread Suneel Marthi
/mahout/mahout-buildtools/0.9/ koji -- http://soleami.com/blog/mahout-and-machine-learning-training-course-is-here.html (14/01/16 23:23), Chameera Wijebandara wrote: Hi Suneel, Still it getting 404 error. Thanks,       Chameera On Thu, Jan 16, 2014 at 7:11 PM, Suneel Marthi suneel_mar

Re: Mahout 0.9 Release - Call for Volunteers

2014-01-16 Thread Suneel Marthi
: Suneel Marthi; user@mahout.apache.org; priv...@mahout.apache.org Subject: Re: Mahout 0.9 Release - Call for Volunteers Tests for Mahout Core fail on OS X 10.8.5 (12F45) java version 1.7.0_17 Java(TM) SE Runtime Environment (build 1.7.0_17-b02) Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed

Re: mahout text mining

2014-01-16 Thread Suneel Marthi
See http://chimpler.wordpress.com/2013/03/13/using-the-mahout-naive-bayes-classifier-to-automatically-classify-twitter-messages/ for classifying twitter messages. Lucene has support for ngrams, stopwords, porter stemmer, snowball stemmer, language specific analyzers etc... Mahout uses Lucene

Re: Mahout 0.9 Release Candidate - VOTE

2014-01-15 Thread Suneel Marthi
? Cheers, .S On Tue, Jan 14, 2014 at 7:03 PM, Suneel Marthi suneel_mar...@yahoo.comwrote: Here's the link to Release artifacts for Mahout 0.9: https://repository.apache.org/content/repositories/orgapachemahout-1000/ For those volunteering to test this, some of the stuff to look out

Re: Mahout 0.9 Release Candidate - VOTE

2014-01-14 Thread Suneel Marthi
to volunteer to test this release. What is the procedure/steps to get started and what pre-reqs I need to have? Cheers .S On Tue, Jan 14, 2014 at 6:52 PM, Suneel Marthi suneel_mar...@yahoo.comwrote: Calling for volunteers to test this Release. On Friday, January 10, 2014 7:39 PM, Suneel Marthi

Re: Mahout 0.9 Release Candidate - VOTE

2014-01-14 Thread Suneel Marthi
before the installation so I assumed maven dependencies are all available . On Tue, Jan 14, 2014 at 7:03 PM, Suneel Marthi suneel_mar...@yahoo.comwrote: Here's the link to Release artifacts for Mahout 0.9: https://repository.apache.org/content/repositories/orgapachemahout-1000/ For those

Re: Logistic Regression cost function

2014-01-13 Thread Suneel Marthi
Mahout's impl is based off of Leon Bottou's paper on this subject. I don't gave the link handy but it's referenced in the code or try google search Sent from my iPhone On Jan 13, 2014, at 7:14 AM, Frank Scholten fr...@frankscholten.nl wrote: Hi, I followed the Coursera Machine Learning

Re: Logistic Regression cost function

2014-01-13 Thread Suneel Marthi
publications. I don't see any mention of one of his papers in the code. I only see www.eecs.tufts.edu/~dsculley/papers/combined-ranking-and-regression.pdf in MixedGradient but this is something different. On Mon, Jan 13, 2014 at 1:27 PM, Suneel Marthi suneel_mar...@yahoo.comwrote: Mahout's impl

Re: Logistic Regression cost function

2014-01-13 Thread Suneel Marthi
/~dsculley/papers/combined-ranking-and-regression.pdf in MixedGradient but this is something different. On Mon, Jan 13, 2014 at 1:27 PM, Suneel Marthi suneel_mar...@yahoo.comwrote: Mahout's impl is based off of Leon Bottou's paper on this subject.  I don't gave the link handy but it's

Re: ArrayIndexOutOfBoundsException with mahout vectordump and cvb ?

2014-01-13 Thread Suneel Marthi
: stylesheet: Value: 9243 Key: tim46679: Value: 9244 Key: topnav.search_where: Value: 9245 Key: www.expedia.com: Value: 9246 Key: xv: Value: 9247 Count: 9248 14/01/13 17:35:39 INFO driver.MahoutDriver: Program took 54565 ms (Minutes: 0.90941667) On Thu, Jan 9, 2014 at 4:12 PM, Suneel Marthi

Re: ArrayIndexOutOfBoundsException with mahout vectordump and cvb ?

2014-01-09 Thread Suneel Marthi
The issue seems to be with ur dictionary. What is the length of dictionary? On Thursday, January 9, 2014 6:49 PM, Yang tedd...@gmail.com wrote: I am trying to run the lda (now called cvb) function, I followed the steps listed in many online sources. the final step after getting the lda

Re: Hidden Markov Models in Mahout

2014-01-09 Thread Suneel Marthi
HMM implementations still exist in Mahout today but I don't think there are any examples of its usage. Please see package org.apache.mahout.classifier.sequencelearning.hmm.* On Thursday, January 9, 2014 10:40 PM, Koji Sekiguchi k...@r.email.ne.jp wrote: It should exist somewhere as I

Re: K-means: No input clusters found

2013-12-24 Thread Suneel Marthi
kmeans-init-clusters should be in a file with a name like 'part-' and not the way you have it (kmeans-init-clusters). On Tuesday, December 24, 2013 2:15 PM, Sameer Tilak ssti...@live.com wrote: Hi all, I get the following problem whehn I run k-mens clustering on my real data. Any

Re: Problem with mahout classpath after update

2013-12-23 Thread Suneel Marthi
Which version of Mahout are you running, from ur pastebin stacktrace it seems like Mahout 0.7 (please upgrade to the latest version). Please upgrade to the latest version of Mahout. On Monday, December 23, 2013 8:59 AM, Kevin Moulart kevinmoul...@gmail.com wrote: Hi I had mahout working

Re: mahout svd OOM error

2013-12-20 Thread Suneel Marthi
DistributedLanczosSolver has been deprecated (and the blog post u mention is old). Use Stochastic SVD (SSVD) instead. On Friday, December 20, 2013 12:41 AM, Partha Pratim Talukdar partha.taluk...@cs.cmu.edu wrote: Hello, I am running mahout (v0.8) svd over a sparse matrix of size

Re: unexpected results in seqdump of reuters-matrix in quick tour of text analysis

2013-12-20 Thread Suneel Marthi
(uname -a): Darwin Scotts-MacBook-Air.local 12.5.0 Darwin Kernel Version 12.5.0: Sun Sep 29 13:33:47 PDT 2013; root:xnu-2050.48.12~1/RELEASE_X86_64 x86_64 On 12/19/13 1:08 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: I don't see a need for uploading ur commands.  Clean up HDFS (both output

Re: clusterdump

2013-12-20 Thread Suneel Marthi
Are you working off of trunk? 'clusterdump' is being used in examples/bin/cluster-reuters.sh. On Friday, December 20, 2013 5:33 PM, Sameer Tilak ssti...@live.com wrote: Hi All, I was able to do the clustering and need some help with viewing the result. I get the following problem.

Re: clusterdump

2013-12-20 Thread Suneel Marthi
I would investigate all of those 'Unable to add .' messages first. Checkout the latest code and run a clean build. On Friday, December 20, 2013 5:58 PM, Sameer Tilak ssti...@live.com wrote: Suneel: Yes, I am working off of trunk. I saw that example. In my case the data is numeric -- I

Re: unexpected results in seqdump of reuters-matrix in quick tour of text analysis

2013-12-20 Thread Suneel Marthi
; root:xnu-2050.48.12~1/RELEASE_X86_64 x86_64 On 12/19/13 1:08 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: I don't see a need for uploading ur commands.  Clean up HDFS (both output and temp folders) and try running the 5 steps again - extract reuters, seqdirectory, seq2sparse, rowid job

Re: unexpected results in seqdump of reuters-matrix in quick tour of text analysis

2013-12-20 Thread Suneel Marthi
Which cdump.txt ? On Friday, December 20, 2013 7:29 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: You could use clusterdump to see the output of your clusters. Eg:   $MAHOUT clusterdump \     -i ${WORK_DIR}/reuters-kmeans/clusters-*-final \     -o ${WORK_DIR}/reuters-kmeans

Re: unexpected results in seqdump of reuters-matrix in quick tour of text analysis

2013-12-20 Thread Suneel Marthi
of the vectors that are members of the cluster.  Do I have it?  Am I getting this? Thanks, SCott  On 12/20/13 6:32 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: Which cdump.txt ? On Friday, December 20, 2013 7:29 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: You could use clusterdump

Re: unexpected results in seqdump of reuters-matrix in quick tour of text analysis

2013-12-19 Thread Suneel Marthi
What you are seeing is the output matrix of the RowSimilarity job.  You are right there should be 21578 documents only in the reuters corpus. a) How many documents do you have in your docIndex?  DocIndex is one of the artifacts of the RowIDJob and should have been executed prior to the

Re: Exception in thread main java.lang.NoClassDefFoundError: org/apache/mahout/math/Vector

2013-12-19 Thread Suneel Marthi
U r missing mahout-math.jar from your classpath because u r only keying off mahout-core. Include mahout-math.jar in your javac classpath. On Thursday, December 19, 2013 1:04 PM, Sameer Tilak ssti...@live.com wrote: Hi everyone, I used the following commands to generate the jar file: javac

Re: unexpected results in seqdump of reuters-matrix in quick tour of text analysis

2013-12-19 Thread Suneel Marthi
, Suneel Marthi suneel_mar...@yahoo.com wrote: What you are seeing is the output matrix of the RowSimilarity job.  You are right there should be 21578 documents only in the reuters corpus. a) How many documents do you have in your docIndex?  DocIndex is one of the artifacts of the RowIDJob

Re: unexpected results in seqdump of reuters-matrix in quick tour of text analysis

2013-12-19 Thread Suneel Marthi
would I do it? Thanks, SCott On 12/19/13 1:00 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: Yep, that's what has happened in ur case. the wiki doesn't have but please specify the -ow (overwrite) option while running the RowsimilarityJob. That should clear up both the output and temp folders

Re: unexpected results in seqdump of reuters-matrix in quick tour of text analysis

2013-12-19 Thread Suneel Marthi
Yep, that's what has happened in ur case. the wiki doesn't have but please specify the -ow (overwrite) option while running the RowsimilarityJob. That should clear up both the output and temp folders before running the job. On Thursday, December 19, 2013 1:50 PM, Suneel Marthi suneel_mar

Re: Visualizing cluster trough command line

2013-12-13 Thread Suneel Marthi
U can use clusterdump to generate GraphML, CSV, Text and JSON outputs. mahout clusterdump -i cluster-output/clusters-0-final -of GRAPH_ML -o xyz.graphml -p cluster-output/clusteredPoints On Friday, December 13, 2013 7:58 AM, David G davidgr...@gmail.com wrote: I find your idea

Re: K-means clustering: clusterdump

2013-12-12 Thread Suneel Marthi
It should be -i (--input), thanks for pointing this out will update the online documentation. On Thursday, December 12, 2013 3:14 PM, Sameer Tilak ssti...@live.com wrote: Hi, I am running K-means clustering following the script on Wiki:

Re: Mahout and Hadoop 2.2.0

2013-12-11 Thread Suneel Marthi
the intended release date of the next mahout release that will be compatible with Hadoop 2.2.0? On Thursday, November 21, 2013 12:36 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: Targeted for Dec 2013. On Thursday, November 21, 2013 3:26 PM, Hi There srudamas...@yahoo.com wrote

Re: Mahout and Hadoop 2.2.0

2013-12-11 Thread Suneel Marthi
Sebastian, R we still using SplitInputJob, seems like its been replaced by a much newer SplitInput. Do u think this needs to be purged from the codebase for 0.9, its been marked as deprecated anyways? On Wednesday, December 11, 2013 2:08 PM, Suneel Marthi suneel_mar...@yahoo.com wrote

Re: Mahout and Hadoop 2.2.0

2013-12-11 Thread Suneel Marthi
On Dec 9, 2013, at 19:54, Hi There srudamas...@yahoo.com wrote: Is Dec 2013 still the intended release date of the next mahout release that will be compatible with Hadoop 2.2.0? On Thursday, November 21, 2013 12:36 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: Targeted for Dec

Re: SVM Implementation for mahout?

2013-12-07 Thread Suneel Marthi
Any specific reasons u r looking for an SVM implementation only?  R u sure that those patches r still relevant given the codebase today? On Saturday, December 7, 2013 2:58 PM, Fernando Santos fernandoleandro1...@gmail.com wrote: Thanks Manuel. It seems that these two

Re: Mahout Web Service GlassFish 4 Deployment Error

2013-12-04 Thread Suneel Marthi
Thinking loud here. Mahout's still using servlet 2.5 and jsp 2.1 api (from what's in the pom today), may be its time to upgrade to be JEE 6 compliant - viz. support servlet 3.x and jsp 2.2. Looking at the web.xml, it still refers to web app 2.3 DTD; which could be the reason for the CDI

Re: Avoiding OOM for large datasets

2013-12-04 Thread Suneel Marthi
Amir, This has been reported before by several others (and has been my experience too). The OOM happens during Canopy Generation phase of Canopy clustering because it only runs with a single reducer. If you are using Mahout 0.8 (or trunk), suggest that u look at the new Streaming Kmeans

Re: Mahout Web Service GlassFish 4 Deployment Error

2013-12-04 Thread Suneel Marthi
On 04.12.2013, at 18:04, Suneel Marthi wrote: Thinking loud here. Mahout's still using servlet 2.5 and jsp 2.1 api (from what's in the pom today), may be its time to upgrade to be JEE 6 compliant - viz. support servlet 3.x and jsp 2.2. Looking at the web.xml, it still refers to web app

Re: Clustering without Hadoop

2013-12-01 Thread Suneel Marthi
Shan, All of Mahout implementations use Hadoop API, but if u r trying to run kmeans in sequential (non-MapReduce) mode; pass in  runSequential = true instead of false as the last parameter to KMeansDriver.run() or Amit run them in LOCAL_MODE as pointed out earlier by Amit. On Sunday,

Re: Info about KMEans clustering

2013-11-28 Thread Suneel Marthi
This is not an issue with Mahout and more to do with ur environment. U seem to be missing Hadoop in it path, Also mahout 0.8 is officially not supported on Hadoop 2.2. Sent from my iPhone On Nov 28, 2013, at 4:39 AM, Angelo Immediata angelo...@gmail.com wrote: Hi all I'm pretty new to

Re: java.lang.NoClassDefFoundError: com/google/common/base/Preconditions

2013-11-27 Thread Suneel Marthi
you r missing Google Guava library which has these classes.  R u running a mvn build on Mahout snapshot? On Thursday, November 28, 2013 1:56 AM, Tharindu Rusira tharindurus...@gmail.com wrote: Hi all, I'm working on Mahout 0.9-SNAPSHOT version checked out from the svn trunk. The following

Re: java.lang.NoClassDefFoundError: com/google/common/base/Preconditions

2013-11-27 Thread Suneel Marthi
with the code to find a workaround so that it does not require these Precondition checks. (I've attached a patch if you are interested) :) Thanks a lot.  -Tharindu On Thu, Nov 28, 2013 at 12:29 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: you r missing Google Guava library which has these classes

Re: Only one reducer running on canopy generator

2013-11-25 Thread Suneel Marthi
Canopy Clustering is a 2 step process: Canopy Generation followed by Canopy Clustering. For Canopy Generation, it uses a single reducer (and this cannot be overidden), while the Clustering task uses multiple reducers. You seem to be hitting OOM during the Canopy generation phase. On

Re: Mahout fpg

2013-11-22 Thread Suneel Marthi
On Friday, November 22, 2013 4:55 AM, Jason Lee wua...@gmail.com wrote: I noticed lots of algorithms implementations has deprecated in Mahout 0.8 and removed in 0.9,  but no reasons or comments been marked. Can i ask why? I was asked this question before. Most of the algorithms that were

Re: Mahout and Hadoop 2.2.0

2013-11-21 Thread Suneel Marthi
Targeted for Dec 2013. On Thursday, November 21, 2013 3:26 PM, Hi There srudamas...@yahoo.com wrote: Thanks for the reply! Is there a timeline for then the next release will be? Thanks, Victor On Tuesday, November 19, 2013 7:30 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: Hi

Re: Mahout fpg

2013-11-20 Thread Suneel Marthi
From the stacktrace: FAILEDjava.lang.NumberFormatException: For input string: A1234567    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)   Obviously, the input's incorrect. On Wednesday, November 20, 2013 6:02 PM, Sameer Tilak ssti...@live.com wrote:

<    1   2   3   4   5   >