question

2016-06-01 Thread Khurrum Nasim
Hello All, Seeking some advice regarding the following: I have a JSON ETL task. You know we all done some ETL in our lives before - extract data, apply some transformation to it, and load it back. I have a fairly huge amount of JSON that I need to iterate over and check for the

Re: Question on how and should the Scala code change from => to ⇒ symbol?

2015-11-06 Thread Henry Saputra
Hi Dmitriy, Thanks for the reply. So were you doing it manually or did you use tools to convert it as part of commit or build flow? - Henry On Fri, Nov 6, 2015 at 2:08 PM, Dmitriy Lyubimov wrote: > We, well, I was trying to migrate to the unicode one. Perhaps a bit >

Re: Question on how and should the Scala code change from => to ⇒ symbol?

2015-11-06 Thread Dmitriy Lyubimov
like i said, only if i happened to make a substantial change to a file. On Fri, Nov 6, 2015 at 3:10 PM, Dmitriy Lyubimov wrote: > find-and-replace in idea :) > > On Fri, Nov 6, 2015 at 2:40 PM, Henry Saputra > wrote: > >> Hi Dmitriy, >> >> Thanks for

Re: Question on how and should the Scala code change from => to ⇒ symbol?

2015-11-06 Thread Dmitriy Lyubimov
find-and-replace in idea :) On Fri, Nov 6, 2015 at 2:40 PM, Henry Saputra wrote: > Hi Dmitriy, > > Thanks for the reply. So were you doing it manually or did you use > tools to convert it as part of commit or build flow? > > - Henry > > On Fri, Nov 6, 2015 at 2:08 PM,

Re: Question on how and should the Scala code change from => to ⇒ symbol?

2015-11-06 Thread Henry Saputra
Cool, thanks for the info :) On Friday, November 6, 2015, Dmitriy Lyubimov wrote: > like i said, only if i happened to make a substantial change to a file. > > On Fri, Nov 6, 2015 at 3:10 PM, Dmitriy Lyubimov > wrote: > > > find-and-replace

Question on how and should the Scala code change from => to ⇒ symbol?

2015-11-06 Thread Henry Saputra
Hi Mahout devs, I am looking at the Mahout Scala code and some parts of the code use ⇒ symbol and some use =>. I know we can use sacalriform to format the Scala code in sbt, but I am not sure it is done with maven? And also should we use ⇒ to represent => in Scala code for patches and

Re: Question on how and should the Scala code change from => to ⇒ symbol?

2015-11-06 Thread Dmitriy Lyubimov
We, well, I was trying to migrate to the unicode one. Perhaps a bit unilaterally, simply because that's how I have been writing the rest of my code for the past couple years. But of course I did that on occasional basis. The rule I was following was on the file uniformity basis. If I replace one

Question with contributing first steps

2015-03-04 Thread Олег Зотов
Hi I want to contribute to the Mahout and I have two questions: 1) What about Mahout and Google Summer of Code this year? 2) To take the first step, I fixed one not so difficult bug, and already more than 10 days ago sent pull request, but still did not see any response - I did something wrong?

Re: Question with contributing first steps

2015-03-04 Thread Dmitriy Lyubimov
(1) no mentors this year. (2) what was the PR #? On Wed, Mar 4, 2015 at 2:35 PM, Олег Зотов olegzoto...@gmail.com wrote: Hi I want to contribute to the Mahout and I have two questions: 1) What about Mahout and Google Summer of Code this year? 2) To take the first step, I fixed one not so

Re: Question with contributing first steps

2015-03-04 Thread Олег Зотов
question re:scala/java: yes, with exception of math module, which is the only non-deprecated module containing any java, i think it is fair to say the rest of non-deprecated stuff modules are Scala only (or almost only, i think h20 has some java code mixed in). On Wed, Mar 4, 2015 at 2:41 PM

Re: Question with contributing first steps

2015-03-04 Thread Suneel Marthi
, and nobody these days cares so much maintaining these. Second, the author of this code, who would be qualified to assess the validity of the fix, has not come forward, which kind of goes back to the first point To answer your further question re:scala/java: yes, with exception

Re: Question with contributing first steps

2015-03-04 Thread Andrew Musselman
these. Second, the author of this code, who would be qualified to assess the validity of the fix, has not come forward, which kind of goes back to the first point To answer your further question re:scala/java: yes, with exception of math module, which is the only non-deprecated module containing any

Re: Question with contributing first steps

2015-03-04 Thread Suneel Marthi
, the author of this code, who would be qualified to assess the validity of the fix, has not come forward, which kind of goes back to the first point To answer your further question re:scala/java: yes, with exception of math module, which is the only non-deprecated

Re: Question with contributing first steps

2015-03-04 Thread Олег Зотов
to the first point To answer your further question re:scala/java: yes, with exception of math module, which is the only non-deprecated module containing any java, i think it is fair to say the rest of non-deprecated stuff modules are Scala only (or almost only, i think h20 has some

Re: Question with contributing first steps

2015-03-04 Thread Dmitriy Lyubimov
forward, which kind of goes back to the first point To answer your further question re:scala/java: yes, with exception of math module, which is the only non-deprecated module containing any java, i think it is fair to say the rest of non-deprecated stuff modules are Scala only (or almost only, i

Re: Question with contributing first steps

2015-03-04 Thread Олег Зотов
of goes back to the first point To answer your further question re:scala/java: yes, with exception of math module, which is the only non-deprecated module containing any java, i think it is fair to say the rest of non-deprecated stuff modules are Scala only (or almost only, i think

Re: Question with contributing first steps

2015-03-04 Thread Andrew Musselman
, and nobody these days cares so much maintaining these. Second, the author of this code, who would be qualified to assess the validity of the fix, has not come forward, which kind of goes back to the first point To answer your further question re:scala/java: yes, with exception

Question about Spark versions

2015-02-26 Thread Pat Ferrel
Spark releases every few weeks. In the meantime some users will have chosen a version to stay with for awhile. Now that we are moving to 1.2.1 what does that mean for users who are working with the version of Mahout that is using 1.1.0? Should we be releasing or tagging builds to sync with

Re: Question about Spark versions

2015-02-26 Thread Dmitriy Lyubimov
algebraic optimizer binary should be compatible with pretty wide range of spark. At very least, current head is backward compatible with 1.1.x. The only thing that locked it to that is using unpersist api. Before that it should've been compatible all the way to at least 0.9. spark 0.8.something

Re: drmFromHDFS rowLabelBindings question

2014-09-14 Thread Dmitriy Lyubimov
@mahout.apache.org /divdivSubject: Re: drmFromHDFS rowLabelBindings question /divdiv /div The serialization can be in engine specific modules as with cooccurrence and ItemSimiarity. cooccurrence is in math-scala, ItemSmilarity is the engine specific driver. There is nothing engine specific about

Re: drmFromHDFS rowLabelBindings question

2014-09-13 Thread Dmitriy Lyubimov
support for somebody as pragmatical as myself. For folks that are looking for a nice thesis project this idea might be indefinitely more attractive though. Even then though, question comes if they'd be able to match the amount of effort poured into Spark QL, and therefore, at least match its

RE: drmFromHDFS rowLabelBindings question

2014-09-13 Thread Saikat Kanjilal
One question based on this discussion, is there anything we can provide on top of spark ddf that would be useful in working within mahout DSL, maybe what we really need to do is to build a thin layer with mahout nice-ties that links in spark ddf and nicely serves as a translation layer between

Re: drmFromHDFS rowLabelBindings question

2014-09-13 Thread Dmitriy Lyubimov
On Sat, Sep 13, 2014 at 10:01 AM, Saikat Kanjilal sxk1...@hotmail.com wrote: One question based on this discussion, is there anything we can provide on top of spark ddf that would be useful in working within mahout DSL, maybe what we really need to do is to build a thin layer with mahout nice

RE: drmFromHDFS rowLabelBindings question

2014-09-13 Thread Saikat Kanjilal
directly into the engine specific code, in general I think there may be some complexity in directly incorporating with spark ddf. Thoughts? Date: Sat, 13 Sep 2014 10:21:18 -0700 Subject: Re: drmFromHDFS rowLabelBindings question From: dlie...@gmail.com To: dev@mahout.apache.org On Sat, Sep

Re: drmFromHDFS rowLabelBindings question

2014-09-13 Thread Dmitriy Lyubimov
:21:18 -0700 Subject: Re: drmFromHDFS rowLabelBindings question From: dlie...@gmail.com To: dev@mahout.apache.org On Sat, Sep 13, 2014 at 10:01 AM, Saikat Kanjilal sxk1...@hotmail.com wrote: One question based on this discussion, is there anything we can provide on top of spark

RE: drmFromHDFS rowLabelBindings question

2014-09-13 Thread Saikat Kanjilal
understanding because I still feel like an adaptation layer is needed. Date: Sat, 13 Sep 2014 10:44:05 -0700 Subject: Re: drmFromHDFS rowLabelBindings question From: dlie...@gmail.com To: dev@mahout.apache.org sorry. doesn't make sense to me. too abstract. On Sat, Sep 13, 2014 at 10:28 AM

RE: drmFromHDFS rowLabelBindings question

2014-09-13 Thread Andrew Palumbo
@mahout.apache.org /divdivSubject: Re: drmFromHDFS rowLabelBindings question /divdiv /div The serialization can be in engine specific modules as with cooccurrence and ItemSimiarity. cooccurrence is in math-scala, ItemSmilarity is the engine specific driver. There is nothing engine specific about

drmFromHDFS rowLabelBindings question

2014-09-12 Thread Andrew Palumbo
I'm having some trouble getting the rowLabelBindings from a Sting-keyed (Chekpointed...Spark)Drm from read in from HDFS. I'm reading in a sequence file of form Text,VectorWritable which is output from seq2sparse. The Drm has 7598 rows and the vectors seem to be read in properly. When I try

Re: drmFromHDFS rowLabelBindings question

2014-09-12 Thread Anand Avati
Did you try a simple: mahout val rowlabelbindings = drmTFIDF.getRowLabelBindings mahout rowlabelbindings.size If the new HashMap constructor is not taking in all the entries from its parameter, try rowlabelbindings.clone instead? I know this doesn't answer why the new HashMap has only 1 entry.

RE: drmFromHDFS rowLabelBindings question

2014-09-12 Thread Andrew Palumbo
To: dev@mahout.apache.org Subject: drmFromHDFS rowLabelBindings question Date: Fri, 12 Sep 2014 13:36:43 -0400 I'm having some trouble getting the rowLabelBindings from a Sting-keyed (Chekpointed...Spark)Drm from read in from HDFS. I'm reading in a sequence file of form Text

Re: drmFromHDFS rowLabelBindings question

2014-09-12 Thread Dmitriy Lyubimov
)- I think i was just going the long (wrong) way to get them. Is there an easy way to extract these? Date: Fri, 12 Sep 2014 11:30:37 -0700 Subject: Re: drmFromHDFS rowLabelBindings question From: dlie...@gmail.com To: dev@mahout.apache.org Actually, as it stands, collect doesn't

Re: drmFromHDFS rowLabelBindings question

2014-09-12 Thread Anand Avati
On Fri, Sep 12, 2014 at 11:30 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: Actually, as it stands, collect doesn't support labels (either as keys or Named Vectors). There are 2 considerations: (1) I chose to ignore any use of NamedVectors in DRM since DRM already has row keys, and two

Re: drmFromHDFS rowLabelBindings question

2014-09-12 Thread Dmitriy Lyubimov
Subject: Re: drmFromHDFS rowLabelBindings question From: dlie...@gmail.com To: dev@mahout.apache.org Actually, as it stands, collect doesn't support labels (either as keys or Named Vectors). There are 2 considerations: (1) I chose to ignore any use of NamedVectors in DRM since DRM

Re: drmFromHDFS rowLabelBindings question

2014-09-12 Thread Dmitriy Lyubimov
On Fri, Sep 12, 2014 at 11:56 AM, Anand Avati av...@gluster.org wrote: On Fri, Sep 12, 2014 at 11:30 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: Actually, as it stands, collect doesn't support labels (either as keys or Named Vectors). There are 2 considerations: (1) I chose to

RE: drmFromHDFS rowLabelBindings question

2014-09-12 Thread Andrew Palumbo
I didnt realize that that is what you were referring to earlier, Anand. I was looking at that too. I tried changing it around a bit, but like I said, my scala sucks. Date: Fri, 12 Sep 2014 12:00:47 -0700 Subject: Re: drmFromHDFS rowLabelBindings question From: av...@gluster.org

Re: drmFromHDFS rowLabelBindings question

2014-09-12 Thread Anand Avati
On Fri, Sep 12, 2014 at 12:00 PM, Anand Avati av...@gluster.org wrote: On Fri, Sep 12, 2014 at 11:57 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: bit i you are really compelled that it is something that might be needed, the best way probably would be indeed create an optional parameter to

Re: drmFromHDFS rowLabelBindings question

2014-09-12 Thread Dmitriy Lyubimov
i guess if the code is there but doesn't work then it is jira-bug worthy then. I can look at it but not soon enough for Andrew i guess. On Fri, Sep 12, 2014 at 12:17 PM, Anand Avati av...@gluster.org wrote: On Fri, Sep 12, 2014 at 12:00 PM, Anand Avati av...@gluster.org wrote: On Fri,

RE: drmFromHDFS rowLabelBindings question

2014-09-12 Thread Andrew Palumbo
not a pressing issue. Appreciate it. Date: Fri, 12 Sep 2014 12:35:21 -0700 Subject: Re: drmFromHDFS rowLabelBindings question From: av...@gluster.org To: dev@mahout.apache.org On Fri, Sep 12, 2014 at 12:17 PM, Anand Avati av...@gluster.org wrote: On Fri, Sep 12, 2014 at 12:00 PM, Anand Avati

RE: drmFromHDFS rowLabelBindings question

2014-09-12 Thread Andrew Palumbo
It doesn't look like it has anything to do with the conversion. after: val rowBindings = d.map(t = (t._1._1.toString, t._2: java.lang.Integer)).toMap rowBindings.size is one From: ap@outlook.com To: dev@mahout.apache.org Subject: RE: drmFromHDFS rowLabelBindings question Date: Fri

RE: drmFromHDFS rowLabelBindings question

2014-09-12 Thread Andrew Palumbo
: drmFromHDFS rowLabelBindings question Date: Fri, 12 Sep 2014 16:14:38 -0400 It doesn't look like it has anything to do with the conversion. after: val rowBindings = d.map(t = (t._1._1.toString, t._2: java.lang.Integer)).toMap rowBindings.size is one From: ap@outlook.com To: dev

Re: drmFromHDFS rowLabelBindings question

2014-09-12 Thread Pat Ferrel
@outlook.com To: dev@mahout.apache.org Subject: RE: drmFromHDFS rowLabelBindings question Date: Fri, 12 Sep 2014 15:53:48 -0400 Thanks guys, I was wondering about the java.util.Map conversion too. I'll try copying everything into a java.util.HashMap and passing that to setRowBindings. I'll play

RE: drmFromHDFS rowLabelBindings question

2014-09-12 Thread Andrew Palumbo
: drmFromHDFS rowLabelBindings question From: pat.fer...@gmail.com Date: Fri, 12 Sep 2014 14:41:35 -0700 To: dev@mahout.apache.org Not sure if this helps but we (Sebastian and I) created an IndexedDataset which maintains row and column HashBiMaps that use the Int key to map to/from Strings

Re: drmFromHDFS rowLabelBindings question

2014-09-12 Thread Anand Avati
a reduce/aggregate operator in as engine neutral/close to algebraic way as possible, or keep any kind of reduction/aggregate phase of operation backend specific (which kind of sucks) Thanks Subject: Re: drmFromHDFS rowLabelBindings question From: pat.fer...@gmail.com Date: Fri, 12 Sep 2014

Re: drmFromHDFS rowLabelBindings question

2014-09-12 Thread Pat Ferrel
of sucks) Thanks Subject: Re: drmFromHDFS rowLabelBindings question From: pat.fer...@gmail.com Date: Fri, 12 Sep 2014 14:41:35 -0700 To: dev@mahout.apache.org Not sure if this helps but we (Sebastian and I) created an IndexedDataset which maintains row and column HashBiMaps that use

Re: drmFromHDFS rowLabelBindings question

2014-09-12 Thread Dmitriy Lyubimov
(which kind of sucks) Thanks Subject: Re: drmFromHDFS rowLabelBindings question From: pat.fer...@gmail.com Date: Fri, 12 Sep 2014 14:41:35 -0700 To: dev@mahout.apache.org Not sure if this helps but we (Sebastian and I) created an IndexedDataset which maintains row

RE: drmFromHDFS rowLabelBindings question

2014-09-12 Thread Andrew Palumbo
2014 16:53:11 -0700 Subject: Re: drmFromHDFS rowLabelBindings question From: dlie...@gmail.com To: dev@mahout.apache.org Note that there is no way (yet) to perform aggregate or reduce like operation through the DSL. Though the backends (both spark and h2o) support reduce-like operations

RE: drmFromHDFS rowLabelBindings question

2014-09-12 Thread ap . dev
: drmFromHDFS rowLabelBindings question /divdiv /div The serialization can be in engine specific modules as with cooccurrence and ItemSimiarity. cooccurrence is in math-scala, ItemSmilarity is the engine specific driver. There is nothing engine specific about IndexedDatasets and an optimization

General DSL idiom question

2014-07-13 Thread Ted Dunning
I have a program that I am trying to build that has this pattern: broadcast state to all blocks block map to do a bit of computation, create local state merge all of the local states back to the global state repeat What is the suggestion for merging the local state back to the

Re: General DSL idiom question

2014-07-13 Thread Anand Avati
On Sun, Jul 13, 2014 at 4:22 PM, Ted Dunning ted.dunn...@gmail.com wrote: I have a program that I am trying to build that has this pattern: broadcast state to all blocks block map to do a bit of computation, create local state merge all of the local states back to the global

Re: General DSL idiom question

2014-07-13 Thread Ted Dunning
Yeah. Collect was where I had gotten, and was rather sulky about the results. It does seem like a reduce is going to be necessary. Anybody else have thoughts on this? Sent from my iPhone On Jul 13, 2014, at 17:58, Anand Avati av...@gluster.org wrote: collect(), hoping the result fits

Re: General DSL idiom question

2014-07-13 Thread Anand Avati
How about a new drm API: type ReduceFunc = (Vector, Vector) = Vector def reduce(rf: ReduceFunc): Vector = { ... } The row keys in this case are ignored/erased, but I'm not sure if they are useful (or even meaningful) for reduction. Such an API should be sufficient for kmeans (in

Re: General DSL idiom question

2014-07-13 Thread Dmitriy Lyubimov
On Sun, Jul 13, 2014 at 4:22 PM, Ted Dunning ted.dunn...@gmail.com wrote: I have a program that I am trying to build that has this pattern: broadcast state to all blocks block map to do a bit of computation, create local state merge all of the local states back to the global

Re: General DSL idiom question

2014-07-13 Thread Dmitriy Lyubimov
the only problem with that I see is that would not be algebra any more. that would be functional programming, and as such there are probably better frameworks to address these kind of things than a DRM. Drm currently suggest just to exist to engine level primitives, i.e. do something like

Re: General DSL idiom question

2014-07-13 Thread Dmitriy Lyubimov
strictly speaking, it would be A.rdd.reduce(_._2 + _._2). oh well On Sun, Jul 13, 2014 at 10:08 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: the only problem with that I see is that would not be algebra any more. that would be functional programming, and as such there are probably better

Re: General DSL idiom question

2014-07-13 Thread Ted Dunning
K-means needs Matrix oriented reducer. K-sparse encoding needs a Matrix. Micro-batch SGD needs Vector oriented. Streaming k-means might be able to do with a matrix, but a structure with a scalar and a matrix would be handier. Also, in all the algorithms I have looked at, the reduce follows a

Question on handling Mahout CHANGELOG

2013-07-26 Thread Suneel Marthi
Given the Mahout 0.8 has been released, how do we handle the CHANGELOG file? Do we mark 0.8 as Released and start a new 0.9 - unreleased section? or Do we blow out all entries in present CHANGELOG for 0.8 and start afresh for 0.9? Looking for guidance and what the best practices are to handle

Re: Question on handling Mahout CHANGELOG

2013-07-26 Thread Sebastian Schelter
I like the first option, thats also what Giraph is doing. 2013/7/26 Suneel Marthi suneel_mar...@yahoo.com Given the Mahout 0.8 has been released, how do we handle the CHANGELOG file? Do we mark 0.8 as Released and start a new 0.9 - unreleased section? or Do we blow out all entries in

Re: Question on handling Mahout CHANGELOG

2013-07-26 Thread Suneel Marthi
So be it. Thanks Sebastian. From: Sebastian Schelter s...@apache.org To: dev@mahout.apache.org; Suneel Marthi suneel_mar...@yahoo.com Sent: Friday, July 26, 2013 11:58 AM Subject: Re: Question on handling Mahout CHANGELOG I like the first option, thats

Re: Question on handling Mahout CHANGELOG

2013-07-26 Thread Stevo Slavić
Schelter s...@apache.org To: dev@mahout.apache.org; Suneel Marthi suneel_mar...@yahoo.com Sent: Friday, July 26, 2013 11:58 AM Subject: Re: Question on handling Mahout CHANGELOG I like the first option, thats also what Giraph is doing. 2013/7/26 Suneel Marthi suneel_mar...@yahoo.com Given

[jira] [Updated] (MAHOUT-952) ARFFVectorIterable/MapBackedArffModel doesn't handle question mark '?', other ARFF issues

2013-06-01 Thread Grant Ingersoll (JIRA)
? ARFFVectorIterable/MapBackedArffModel doesn't handle question mark '?', other ARFF issues - Key: MAHOUT-952 URL: https://issues.apache.org/jira/browse

[jira] [Resolved] (MAHOUT-952) ARFFVectorIterable/MapBackedArffModel doesn't handle question mark '?', other ARFF issues

2013-06-01 Thread Sebastian Schelter (JIRA)
ARFFVectorIterable/MapBackedArffModel doesn't handle question mark '?', other ARFF issues - Key: MAHOUT-952 URL: https://issues.apache.org/jira/browse/MAHOUT-952

Re: Gsoc 2013 question

2013-04-10 Thread Isabel Drost-Fromm
On Tuesday, April 09, 2013 10:46:04 PM George Zografos wrote: Do you mind telling me how much knowledge of java one must have in order to handle mahout issue #1177? (or #1179) I think for those you need a decent amount of Java knowledge - on idioms, patterns and best practices commonly used in

Gsoc 2013 question

2013-04-09 Thread George Zografos
Hello mahout dev community. I have a question regarding a project idea for GSOC 2013. Should I post it here or to JIRA as a comment?

Re: Gsoc 2013 question

2013-04-09 Thread Shannon Quinn
Hi there. If you don't have a fully-formed project idea or are otherwise looking for suggestions, feel free to post your question here. Shannon On 4/9/13 1:38 PM, George Zografos wrote: Hello mahout dev community. I have a question regarding a project idea for GSOC 2013. Should I post

Re: Gsoc 2013 question

2013-04-09 Thread George Zografos
. On Tue, Apr 9, 2013 at 9:37 PM, Shannon Quinn squ...@gatech.edu wrote: Hi there. If you don't have a fully-formed project idea or are otherwise looking for suggestions, feel free to post your question here. Shannon On 4/9/13 1:38 PM, George Zografos wrote: Hello mahout dev community

Small question with Kmeans Clustering

2013-02-26 Thread 冯胜
Hi, Dear all I’ve been running the Kmeans Clustering algorithm with mahout for a few days. A small question here: The number of clusters have been set as 8, Why the middle output always have 16 clusters? Anybody can share some knowledge on this? very appreciated. ThanksRegards Feng.

Re: Small question with Kmeans Clustering

2013-02-26 Thread Grant Ingersoll
with mahout for a few days. A small question here: The number of clusters have been set as 8, Why the middle output always have 16 clusters? Anybody can share some knowledge on this? very appreciated. ThanksRegards Feng. Grant Ingersoll

Re: PCA doc question for devs:

2012-09-05 Thread Dmitriy Lyubimov
Also: if yes, U\Sigma product may be desired as PCA output, would it make sense to do a patch to produce it right out of SSVD? Thanks. On Wed, Sep 5, 2012 at 4:10 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: Hello, I have a question w.r.t what to advise people in the SSVD manual for PCA

Re: Working on PCA tutorial. Question

2012-02-24 Thread Dmitriy Lyubimov
Ok i made edits suggested by Nathan. I don't think i see an error in fold-in fold-out formulas though. Anyone else wants to take a look before it goes on wiki and sends my credibility out of the window? https://github.com/dlyubimov/mahout-commits/blob/ssvd-docs/SSVD-CLI.pdf?raw=true Thanks. -d

[jira] [Commented] (MAHOUT-952) ARFFVectorIterable/MapBackedArffModel doesn't handle question mark '?', other ARFF issues

2012-02-24 Thread Dave Kor (Commented) (JIRA)
data values 99.9% of the time. On a related note, I just found another bug with ArffVectorIterable breaking down when the .arff file contain instance weights. I will open a new issue for this. ARFFVectorIterable/MapBackedArffModel doesn't handle question mark '?', other ARFF

[jira] [Issue Comment Edited] (MAHOUT-952) ARFFVectorIterable/MapBackedArffModel doesn't handle question mark '?', other ARFF issues

2012-02-24 Thread Dave Kor (Issue Comment Edited) (JIRA)
since, if I am not wrong (can I could be), only missing numeric values are encoded as '?'. ARFFVectorIterable/MapBackedArffModel doesn't handle question mark '?', other ARFF issues

[jira] [Issue Comment Edited] (MAHOUT-952) ARFFVectorIterable/MapBackedArffModel doesn't handle question mark '?', other ARFF issues

2012-02-24 Thread Dave Kor (Issue Comment Edited) (JIRA)
question mark '?', other ARFF issues - Key: MAHOUT-952 URL: https://issues.apache.org/jira/browse/MAHOUT-952 Project: Mahout Issue Type: Bug

[jira] [Issue Comment Edited] (MAHOUT-952) ARFFVectorIterable/MapBackedArffModel doesn't handle question mark '?', other ARFF issues

2012-02-24 Thread Dave Kor (Issue Comment Edited) (JIRA)
it makes sense to only do something when the parsing in processNumeric fails. ARFFVectorIterable/MapBackedArffModel doesn't handle question mark '?', other ARFF issues

Re: Working on PCA tutorial. Question

2012-02-23 Thread Dmitriy Lyubimov
Thank you, Nathan. On Wed, Feb 22, 2012 at 7:01 PM, Nathan Halko nat...@spotinfluence.com wrote: Hi Dmitriy,  Just a few comments: --the computed factors are approximate  A \approx U\SigmaV^{T} Thanks, agreed. -- the projection steps seemed transposed to me but they are consistent

Re: Working on PCA tutorial. Question

2012-02-23 Thread Dmitriy Lyubimov
Wow. Cantor patterns for Givens rotations. I wondered if it already had a name or somebody already figured to do something similar. It looks like you really got into that level of details there. That's extremely cool, sir ! On Thu, Feb 23, 2012 at 4:45 PM, Dmitriy Lyubimov dlie...@gmail.com

Working on PCA tutorial. Question

2012-02-22 Thread Dmitriy Lyubimov
Hi, working on PCA section in SSVD usage . Just to confirm, if we run and svd over input with mean subtracted, then U matrix presents original data points converted to PCA space, right? thanks. -d

Re: Working on PCA tutorial. Question

2012-02-22 Thread Nathan Halko
Hi Dmitriy, Just a few comments: --the computed factors are approximate A \approx U\SigmaV^{T} -- the projection steps seemed transposed to me but they are consistent throughout ie. (2) \tilde{u} = \tilde{c}_{r} V \Sigma^{-1} p. 3: transpose \xi to emphasize row vector - 'mean of all

[jira] [Commented] (MAHOUT-952) ARFFVectorIterable/MapBackedArffModel doesn't handle question mark '?', other ARFF issues

2012-02-19 Thread Dave Kor (Commented) (JIRA)
values aren't very important, you can create a .arff that does not contain missing values using Weka's ReplaceMissingValues filter to replace missing values with the attribute's mean. ARFFVectorIterable/MapBackedArffModel doesn't handle question mark '?', other ARFF issues

[jira] [Commented] (MAHOUT-952) ARFFVectorIterable/MapBackedArffModel doesn't handle question mark '?', other ARFF issues

2012-02-19 Thread Joe Prasanna Kumar (Commented) (JIRA)
Joe. ARFFVectorIterable/MapBackedArffModel doesn't handle question mark '?', other ARFF issues - Key: MAHOUT-952 URL: https://issues.apache.org/jira/browse

[jira] [Updated] (MAHOUT-952) ARFFVectorIterable/MapBackedArffModel doesn't handle question mark '?', other ARFF issues

2012-02-19 Thread Joe Prasanna Kumar (Updated) (JIRA)
ARFFVectorIterable/MapBackedArffModel doesn't handle question mark '?', other ARFF issues - Key: MAHOUT-952 URL: https://issues.apache.org/jira/browse/MAHOUT

[jira] [Issue Comment Edited] (MAHOUT-952) ARFFVectorIterable/MapBackedArffModel doesn't handle question mark '?', other ARFF issues

2012-02-19 Thread Joe Prasanna Kumar (Issue Comment Edited) (JIRA)
: the attached patch addresses the ? issue for the case of dense arff files was (Author: joekumar): address the ? issue for the case of dense arff files ARFFVectorIterable/MapBackedArffModel doesn't handle question mark '?', other ARFF

Re: CIMapper Question

2012-02-12 Thread Paritosh Ranjan
Can something like this help? public class CIMapperT extends Cluster extends MapperWritableComparable?,VectorWritable,IntWritable,T { ... } On 12-02-2012 06:48, Jeff Eastman wrote: I'm wondering how to tease the elephant into accepting any concrete instance of the interface

Re: CIMapper Question

2012-02-12 Thread Sean Owen
The problem really arises when you have to tell the Job what the class of the Mapper key/value is. It needs something concrete. The issue is not here in the Mapper declaration. The general answer is, no, it has to somehow know what it's reading before it reads it. You can accomplish this by, say,

Re: CIMapper Question

2012-02-12 Thread Sean Owen
Exactly right, and that's exactly the answer in some form. PolymorphicWritable isn't suitable if you're writing a lot of records as the overhead of writing a 40-byte string is too much at scale. On Sun, Feb 12, 2012 at 4:01 PM, Ted Dunning ted.dunn...@gmail.com wrote: But this sounds like a

Re: CIMapper Question

2012-02-12 Thread Jeff Eastman
This approach worked out, not exactly as below, but I was able to create a ClusterWritable which used PolymorphicWritable to read and write its Cluster value field. This makes it through the mapper and reducer but I'm still working on getting it all to fly in the ClusterIterator. On 2/12/12

Re: CIMapper Question

2012-02-12 Thread Lance Norskog
Another option is TupleWritable. But pull the source and make sure it works, I had problems. On Sun, Feb 12, 2012 at 9:22 AM, Jeff Eastman j...@windwardsolutions.com wrote: This approach worked out, not exactly as below, but I was able to create a ClusterWritable which used PolymorphicWritable

CIMapper Question

2012-02-11 Thread Jeff Eastman
I'm wondering how to tease the elephant into accepting any concrete instance of the interface o.a.m.clustering.Cluster when writing trained clusters in the cleanup() method of CIMapper. I've gotten the MR version of the ClusterIterator to get to that point in testing but it blows chunks with

[jira] [Created] (MAHOUT-952) ARFFVectorIterable/MapBackedArffModel doesn't handle question mark '?', other ARFF issues

2012-01-19 Thread Stuart Smith (Created) (JIRA)
ARFFVectorIterable/MapBackedArffModel doesn't handle question mark '?', other ARFF issues - Key: MAHOUT-952 URL: https://issues.apache.org/jira/browse/MAHOUT-952

[jira] [Updated] (MAHOUT-952) ARFFVectorIterable/MapBackedArffModel doesn't handle question mark '?', other ARFF issues

2012-01-19 Thread Stuart Smith (Updated) (JIRA)
, and call setLabel() (Apparently just throws that away). Looks like the DenseVectors keep thinking the cardinality is 534, when it should be 1800+ when I know more, I'll create a new issue ARFFVectorIterable/MapBackedArffModel doesn't handle question mark '?', other ARFF issues

Re: Question regarding sequence file iterators

2011-08-31 Thread Sean Owen
Yeah this could be done -- we'd have to save off all the iterators that get created in the transform() method and then close them later. Let me have a go at it. On Tue, Aug 30, 2011 at 11:15 PM, Dmitriy Lyubimov dlie...@gmail.comwrote: I guess this is a question for Sean. I see

Question regarding sequence file iterators

2011-08-30 Thread Dmitriy Lyubimov
I guess this is a question for Sean. I see that SequenceFileValueIterator implements Closeable, so i can release it. There's a very convenient implementation for globs, SequenceFileDirIterator. But i don't see any close() or Closeable. how do i release all those handles it holds? Thanks

Re: Mahout newbiw question

2011-07-26 Thread Sean Owen
...@yahoo.com To: dev@mahout.apache.org dev@mahout.apache.org Sent: Tuesday, July 26, 2011 3:58 AM Subject: Mahout newbiw question Trying to run the examples in Chapter 13 of the book, get the following error when trying to execute bash $MAHOUT_HOME/bin/mahout cat donut.csv Running

Re: Mahout newbiw question

2011-07-26 Thread Suneel Marthi
To: dev@mahout.apache.org; Suneel Marthi suneel_mar...@yahoo.com Sent: Tuesday, July 26, 2011 4:38 AM Subject: Re: Mahout newbiw question Yes, 0.5 goes with 0.20.2. HEAD/0.6 goes with 0.20.203.0 On Tue, Jul 26, 2011 at 9:02 AM, Suneel Marthi suneel_mar...@yahoo.com wrote: Never mind, figured

Re: Mahout newbiw question

2011-07-26 Thread Sean Owen
(Downgraded Hadoop to *0.20.2* you mean?) (I'd suggest you actually upgrade Mahout to 0.6/HEAD instead.) This means you have incompatible versions of the compiled code lying around. Make sure you aren't including different versions twice, and/or recompile your local copy. The method in question

Re: File format question about Random forest.

2011-07-15 Thread Xiaobo Gu
But if we use CSV files, how can we generate descriptors for datasets? Cheers Xiaobo Gu On Thu, Jul 14, 2011 at 1:27 AM, deneche abdelhakim adene...@gmail.com wrote: I guess yes. as long as you don't use quotes or double quotes to embed the fields. On Wed, Jul 13, 2011 at 2:58 PM, Xiaobo Gu

Re: File format question about Random forest.

2011-07-15 Thread Xiaobo Gu
Can we make the file descriptor as following: 1. make a small csv file with the same format as the actual dataset, say a CSV file with header and only one record, 2. Use java weka.core.converters.CSVLoader filename.csv filename.arff to convert the small CSV into a ARFF file, see

Re: File format question about Random forest.

2011-07-15 Thread Xiaobo Gu
Do the -p and -f option of org.apache.mahout.df.tools.Describe have to be HDFS URLs, can they be local file system paths? On Fri, Jul 15, 2011 at 9:28 PM, Xiaobo Gu guxiaobo1...@gmail.com wrote: Can we make the file descriptor as following: 1. make a small csv file with the same format as the

Question on entropy calculation

2011-07-15 Thread Sean Owen
I stumped myself looking at the implementation of LogLikelihood.entropy(). This is Shannon entropy right? just the sume of -x*log(x) for all x in the input? I understand why it could be desirable to normalize the input to sum to 1, but we don't since it doesn't matter in most contexts. So if N =

Re: Question on entropy calculation

2011-07-15 Thread Ted Dunning
On Fri, Jul 15, 2011 at 9:38 AM, Sean Owen sro...@gmail.com wrote: I stumped myself looking at the implementation of LogLikelihood.entropy(). This is Shannon entropy right? just the sume of -x*log(x) for all x in the input? Sort of. It would be Shannon entropy if the sum x_i = 1. I

Re: Question on entropy calculation

2011-07-15 Thread Sean Owen
On Fri, Jul 15, 2011 at 6:47 PM, Ted Dunning ted.dunn...@gmail.com wrote: Sort of.  It would be Shannon entropy if the sum x_i = 1. Right, yes that's why one would divide by N = sum(x) to make that so. But what it computes now is the sum of -x * log(x/N). Seems like a bit My question

  1   2   >