reply: I want to contribute MLlib two quality measures(ARHR and HR) for top N recommendation system. Is this meaningful?

2014-08-27 Thread Lizhengbing (bing, BIPA)
In fact, prec@k is similar to HR and ndcg@k is similar to ARHR After my study, I cannot find a best measure to evaluate recommendation system Xiangrui, do you think it is reasonable to create a class to provide popular measures for evaluating recommendation system? Popular measures of

Re: Contributing to MLlib: Proposal for Clustering Algorithms

2014-08-27 Thread RJ Nowling
Hi Yu, A standardized API has not been implemented yet. I think it would be better to implement the other clustering algorithms then extract a common API. Others may feel differently. :) Just a note, there was a pre-existing JIRA for hierarchical KMeans SPARK-2429

Re: Contributing to MLlib: Proposal for Clustering Algorithms

2014-08-27 Thread Jeremy Freeman
Hey RJ, Sorry for the delay, I'd be happy to take a look at this if you can post the code! I think splitting the largest cluster in each round is fairly common, but ideally it would be an option to do it one way or the other. -- Jeremy - jeremy freeman, phd neuroscientist

Re: Contributing to MLlib: Proposal for Clustering Algorithms

2014-08-27 Thread RJ Nowling
Thanks, Jeremy. I'm abandoning my initial approach, and I'll work on optimizing your example (so it doesn't do the breeze-vector conversions every time KMeans is called). I need to finish a few other projects first, though, so it may be a couple weeks. In the mean time, Yu also created a JIRA

Re: Adding support for a new object store

2014-08-27 Thread Reynold Xin
Hi Rajendran, I'm assuming you have some concept of schema and you are intending to integrate with SchemaRDD instead of normal RDDs. More responses inline below. On Fri, Aug 22, 2014 at 2:21 AM, Rajendran Appavu appra...@in.ibm.com wrote: I am new to Spark source code and looking to see if

Re: RDD replication in Spark

2014-08-27 Thread Cheng Lian
You may start from here https://github.com/apache/spark/blob/4fa2fda88fc7beebb579ba808e400113b512533b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L706-L712 . ​ On Mon, Aug 25, 2014 at 9:05 PM, rapelly kartheek kartheek.m...@gmail.com wrote: Hi, I've exercised multiple

Re: Handling stale PRs

2014-08-27 Thread Nicholas Chammas
On Tue, Aug 26, 2014 at 2:21 PM, Josh Rosen rosenvi...@gmail.com wrote: Last weekend, I started hacking on a Google App Engine app for helping with pull request review (screenshot: http://i.imgur.com/wwpZKYZ.png). BTW Josh, how can we stay up-to-date on your work on this tool? A JIRA issue,

[GraphX] JIRA / PR to fix breakage in GraphGenerator.logNormalGraph in PR #720

2014-08-27 Thread RJ Nowling
Hi all, PR #720 https://github.com/apache/spark/pull/720 made multiple changes to GraphGenerator.logNormalGraph including: - Replacing the call to functions for generating random vertices and edges with in-line implementations with different equations. Based on reading the Pregel

Re: Adding support for a new object store

2014-08-27 Thread Reynold Xin
Linking to the JIRA tracking APIs to hook into the planner: https://issues.apache.org/jira/browse/SPARK-3248 On Wed, Aug 27, 2014 at 1:56 PM, Reynold Xin r...@databricks.com wrote: Hi Rajendran, I'm assuming you have some concept of schema and you are intending to integrate with SchemaRDD

Re: Handling stale PRs

2014-08-27 Thread Nishkam Ravi
Wonder if it would make sense to introduce a notion of 'Reviewers' as an intermediate tier to help distribute the load? While anyone can review and comment on an open PR, reviewers would be able to say aye or nay subject to confirmation by a committer? Thanks, Nishkam On Wed, Aug 27, 2014 at

Re: HiveContext, schemaRDD.printSchema get different dataTypes, feature or a bug? really strange and surprised...

2014-08-27 Thread Cheng Lian
I believe in your case, the “magic” happens in TableReader.fillObject https://github.com/apache/spark/blob/4fa2fda88fc7beebb579ba808e400113b512533b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L706-L712. Here we unwrap the field value according to the object inspector of that

Re: Handling stale PRs

2014-08-27 Thread Patrick Wendell
Hey Nishkam, To some extent we already have this process - many community members help review patches and some earn a reputation where committer's will take an LGTM from them seriously. I'd be interested in seeing if any other projects recognize people who do this. - Patrick On Wed, Aug 27,

Re: Handling stale PRs

2014-08-27 Thread Josh Rosen
I have a very simple dashboard running at http://spark-prs.appspot.com/.   Currently, this mirrors the functionality of Patrick’s github-shim, but it should be very easy to extend with other features. The source is at https://github.com/databricks/spark-pr-dashboard (pull requests and issues

Re: Handling stale PRs

2014-08-27 Thread Nicholas Chammas
Alright! That was quick. :) On Wed, Aug 27, 2014 at 6:48 PM, Josh Rosen rosenvi...@gmail.com wrote: I have a very simple dashboard running at http://spark-prs.appspot.com/. Currently, this mirrors the functionality of Patrick’s github-shim, but it should be very easy to extend with other

Re: jenkins maintenance/downtime, aug 28th, 730am-9am PDT

2014-08-27 Thread Nicholas Chammas
Looks like we're currently at 1.568 so we should be getting a nice slew of UI tweaks and bug fixes. Neat! On Wed, Aug 27, 2014 at 7:13 PM, shane knapp skn...@berkeley.edu wrote: tomorrow morning i will be upgrading jenkins to the latest/greatest (1.577). at 730am, i will put jenkins in to a

Re: Handling stale PRs

2014-08-27 Thread Nishkam Ravi
I see. Yeah, it would be interesting to know if any other project has considered formalizing this notion. It may also enable assignment of reviews (potentially automated using Josh's system) and maybe anonymity as well? On the downside, it isn't easily implemented and probably doesn't come without

Update on Pig on Spark initiative

2014-08-27 Thread Mayur Rustagi
Hi, We have migrated Pig functionality on top of Spark passing 100% e2e for success cases in pig test suite. That means UDF, Joins other functionality is working quite nicely. We are in the process of merging with Apache Pig trunk(something that should happen over the next 2 weeks). Meanwhile if

Re: Update on Pig on Spark initiative

2014-08-27 Thread Matei Zaharia
Awesome to hear this, Mayur! Thanks for putting this together. Matei On August 27, 2014 at 10:04:12 PM, Mayur Rustagi (mayur.rust...@gmail.com) wrote: Hi, We have migrated Pig functionality on top of Spark passing 100% e2e for success cases in pig test suite. That means UDF, Joins other

[Spark SQL] query nested structure data

2014-08-27 Thread wenchen
I am going to dig into this issue: https://issues.apache.org/jira/browse/SPARK-2096 However, I noticed that there is already a NestedSqlParser in sql/core/test org.apache.spark.sql.parquet. I checked this parser and it could solve the issue I mentioned before. But why the author of the parser