[jira] Commented: (MAHOUT-305) Combine both cooccurrence-based CF M/R jobs

2010-04-26 Thread Ankur (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12860882#action_12860882 ] Ankur commented on MAHOUT-305: -- CooccurrenceCombiner caches items internally and increments

[jira] Commented: (MAHOUT-305) Combine both cooccurrence-based CF M/R jobs

2010-04-26 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12860914#action_12860914 ] Sean Owen commented on MAHOUT-305: -- OK, I think I get the (item1,item2) - (item2,count)

[jira] Commented: (MAHOUT-305) Combine both cooccurrence-based CF M/R jobs

2010-04-26 Thread Ankur (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12860939#action_12860939 ] Ankur commented on MAHOUT-305: -- But the answer is the partitioner ? Yes Am I right that

[jira] Created: (MAHOUT-385) Unify Vector Writables

2010-04-26 Thread Sean Owen (JIRA)
Unify Vector Writables -- Key: MAHOUT-385 URL: https://issues.apache.org/jira/browse/MAHOUT-385 Project: Mahout Issue Type: Improvement Components: Math Affects Versions: 0.3 Reporter: Sean Owen

[jira] Updated: (MAHOUT-385) Unify Vector Writables

2010-04-26 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-385: - Attachment: MAHOUT-385.patch Unify Vector Writables -- Key:

[jira] Commented: (MAHOUT-236) Cluster Evaluation Tools

2010-04-26 Thread Jeff Eastman (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12860981#action_12860981 ] Jeff Eastman commented on MAHOUT-236: - Ok, the above patch was committed on the 21st

[GSOC] Congrats to all students

2010-04-26 Thread Grant Ingersoll
Looks like student GSOC announcements are up (http://socghop.appspot.com/gsoc/program/list_projects/google/gsoc2010). Mahout got quite a few projects (5) accepted this year, which is a true credit to the ASF, Mahout, the mentors, and most of all the students! We had a good number of very

Re: [GSOC] Congrats to all students

2010-04-26 Thread Sisir Koppaka
Thanks everyone! This is a fantastic opportunity, and I'll try to make the best of this for myself, as well as Mahout. Hopefully, we'll have a great compilation of deep learning networks within the next few releases. BTW, congrats to everyone on Mahout becoming a TLP! On Tue, Apr 27, 2010 at

[jira] Commented: (MAHOUT-371) [GSoC] Proposal to implement Distributed SVD++ Recommender using Hadoop

2010-04-26 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861087#action_12861087 ] Sean Owen commented on MAHOUT-371: -- Looks like this was accept to GSoC, nice. Let the

[jira] Commented: (MAHOUT-305) Combine both cooccurrence-based CF M/R jobs

2010-04-26 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861095#action_12861095 ] Sean Owen commented on MAHOUT-305: -- I'm about to commit another pass at this since it's

Re: [jira] Commented: (MAHOUT-305) Combine both cooccurrence-based CF M/R jobs

2010-04-26 Thread Ted Dunning
On Mon, Apr 26, 2010 at 1:46 PM, Sean Owen (JIRA) j...@apache.org wrote: Ted how do you like to pick which items to pay attention to for co-occurrence? I'm looking for something simple to start. LLR is my standard answer. Though it's running pretty well (well a lot better than it was) at

[jira] Commented: (MAHOUT-305) Combine both cooccurrence-based CF M/R jobs

2010-04-26 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861144#action_12861144 ] Sean Owen commented on MAHOUT-305: -- Ted says he likes LLR, and doesn't like throwing out

[jira] Commented: (MAHOUT-371) [GSoC] Proposal to implement Distributed SVD++ Recommender using Hadoop

2010-04-26 Thread Richard Simon Just (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861149#action_12861149 ] Richard Simon Just commented on MAHOUT-371: --- Awesome! I won't lie, I'm super

[jira] Commented: (MAHOUT-371) [GSoC] Proposal to implement Distributed SVD++ Recommender using Hadoop

2010-04-26 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861154#action_12861154 ] Sean Owen commented on MAHOUT-371: -- Your schedule maps it out well. In the next month, get

[jira] Commented: (MAHOUT-371) [GSoC] Proposal to implement Distributed SVD++ Recommender using Hadoop

2010-04-26 Thread Richard Simon Just (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861159#action_12861159 ] Richard Simon Just commented on MAHOUT-371: --- Excellent! I haven't downloaded the

[jira] Commented: (MAHOUT-305) Combine both cooccurrence-based CF M/R jobs

2010-04-26 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861171#action_12861171 ] Ted Dunning commented on MAHOUT-305: {quote} Ted says he ... doesn't like throwing out

[jira] Commented: (MAHOUT-297) Canopy and Kmeans clustering slows down on using SeqAccVector for center

2010-04-26 Thread Jeff Eastman (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861194#action_12861194 ] Jeff Eastman commented on MAHOUT-297: - I don't understand why the constructors for

[jira] Issue Comment Edited: (MAHOUT-297) Canopy and Kmeans clustering slows down on using SeqAccVector for center

2010-04-26 Thread Jeff Eastman (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861194#action_12861194 ] Jeff Eastman edited comment on MAHOUT-297 at 4/26/10 9:15 PM: --

Re: [VOTE] Apache Nutch 1.1 Release Candidate #2

2010-04-26 Thread Grant Ingersoll
Might I suggest, that since Nutch is now a TLP that you delay this release by a few weeks and have the vote done under the auspices of the Nutch PMC? Cheers, Grant On Apr 26, 2010, at 1:55 AM, Mattmann, Chris A (388J) wrote: Hi Folks, I have posted an updated candidate for the Apache Nutch

Re: Running ANT; was -- Re: [VOTE] Apache Nutch 1.1 Release Candidate #2

2010-04-26 Thread Mattmann, Chris A (388J)
Hi David, Thanks. In fact, running ant is probably simpler than running Nutch. The steps would be: * what OS are you on (Ant is available for all of them to my knowledge)? * if you need ant, grab a distro from ant.apache.org, otherwise, I'll assume that you've got ant installed and

Re: [VOTE] Apache Nutch 1.1 Release Candidate #2

2010-04-26 Thread Mattmann, Chris A (388J)
Hi Grant, Thanks. I think it actually makes sense to finish off 1.1, and since there is overlap with the Nutch PMC and the Lucene PMC and since the thread started in Lucene before the TLP, I think it would be great e.g., if Andrzej, and Sami could check the release and that way we still have

[jira] Closed: (NUTCH-808) Evaluate ORM Frameworks which support non-relational column-oriented datastores and RDBMs

2010-04-26 Thread Enis Soztutar (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar closed NUTCH-808. --- Resolution: Fixed We have decided to go on with implementing an ORM layer as per the discussion on

Re: [VOTE] Apache Nutch 1.1 Release Candidate #2

2010-04-26 Thread Mattmann, Chris A (388J)
Hey Andrzej, Okey dokey, np! Let's get the patch in first :) I can cut as many RCs as needed. Cheers, Chris On 4/26/10 11:30 AM, Andrzej Bialecki a...@getopt.org wrote: On 2010-04-26 17:19, Mattmann, Chris A (388J) wrote: Hi Grant, Thanks. I think it actually makes sense to finish off 1.1,

Re: NetCDF jars=Maven Central Repos?

2010-04-26 Thread Mattmann, Chris A (388J)
Hi Folks, I never heard back regarding this message. Any thoughts? Thanks! Cheers, Chris On 4/12/10 8:53 AM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Hi there NetCDF'ers, We've been working [1] on integrating NetCDF support into Apache Tika [2]. Tika uses Maven2 as its

Re: [netcdfgroup] NetCDF jars=Maven Central Repos?

2010-04-26 Thread Mattmann, Chris A (388J)
(copying tika-dev@lucene.apache.org so that we can get some dev help in Tika-land) Hi John, Thanks for the information - we have a number of experienced developers over in Tika-ville that likely can help provide a patch to your ant build scripts to upload via Ant (and likely Ivy) to the Maven

Re: [netcdfgroup] NetCDF jars=Maven Central Repos?

2010-04-26 Thread Mattmann, Chris A (388J)
FYI, [1] from here refers to: [1] http://issues.apache.org/jira/browse/TIKA-153 On 4/26/10 6:46 PM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: (copying tika-dev@lucene.apache.org so that we can get some dev help in Tika-land) Hi John, Thanks for the information - we have

[jira] Commented: (LUCENE-1585) Allow to control how payloads are merged

2010-04-26 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12860891#action_12860891 ] Shai Erera commented on LUCENE-1585: Michael, I would like to take a stab at it if you

Re: Proposal about Version API relaxation

2010-04-26 Thread Robert Muir
On Sun, Apr 25, 2010 at 4:31 PM, Mark Miller markrmil...@gmail.com wrote: I still don't like just seeing what happens. What we all agree should be best for users as well as devs is not always going to be in alignment with letting the chips fall where they may. I don't think seeing whether

Re: Proposal about Version API relaxation

2010-04-26 Thread Mark Miller
On 4/26/10 7:57 AM, Robert Muir wrote: On Sun, Apr 25, 2010 at 4:31 PM, Mark Miller markrmil...@gmail.com mailto:markrmil...@gmail.com wrote: I still don't like just seeing what happens. What we all agree should be best for users as well as devs is not always going to be in

Re: Proposal about Version API relaxation

2010-04-26 Thread Robert Muir
On Mon, Apr 26, 2010 at 8:15 AM, Mark Miller markrmil...@gmail.com wrote: It's not that simple. If you want to commit a patch without having it reverted, you *do* have to do certain things - currently, you have to attempt back compat. Or don't commit. You guys seem to think its a free for

Re: announcing new TLPs [was: ASF Board Meeting Summary - April 21, 2010 - new TLP reporting schedule?]

2010-04-26 Thread Grant Ingersoll
My edits inline. On Apr 26, 2010, at 3:45 AM, Sean Owen wrote: Here's my suggested boilerplate -- see below and please suggest edits if desired. There's a 150 word limit. Apache Mahout provides scalable implementations of machine learning algorithms on top of Apache Hadoop and other

Re: Proposal about Version API relaxation

2010-04-26 Thread Mark Miller
On 4/26/10 8:23 AM, Robert Muir wrote: On Mon, Apr 26, 2010 at 8:15 AM, Mark Miller markrmil...@gmail.com mailto:markrmil...@gmail.com wrote: It's not that simple. If you want to commit a patch without having it reverted, you *do* have to do certain things - currently, you have to

[jira] Updated: (LUCENE-2412) Architecture Diagrams needed for Lucene, Solr and Nutch

2010-04-26 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-2412: Attachment: solr-arch.pdf Solr arch. Architecture Diagrams needed for Lucene, Solr and

Re: Proposal about Version API relaxation

2010-04-26 Thread Michael McCandless
OK I think a pretty clear proposal is taking shape -- I'll call a vote. We've kinda discussed it to death now... Mike On Mon, Apr 26, 2010 at 8:25 AM, Mark Miller markrmil...@gmail.com wrote: On 4/26/10 8:23 AM, Robert Muir wrote: On Mon, Apr 26, 2010 at 8:15 AM, Mark Miller

[VOTE] Open up a separate line for unstable Solr/Lucene development

2010-04-26 Thread Michael McCandless
This is a vote for the proposal discussed on the 'Proposal about Version API relaxation' thread. The vote is to open up a separate parallel line of development, called unstable (on trunk), where non-back-compatible changes, slated for the next major release, may be safely developed. But it's not

Re: [VOTE] Open up a separate line for unstable Solr/Lucene development

2010-04-26 Thread Mark Miller
On 4/26/10 8:54 AM, Michael McCandless wrote: Please vote! +1 -- - Mark http://www.lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org

ANNOUNCE: Nutch becomes an Apache Top-Level Project (TLP)

2010-04-26 Thread Andrzej Bialecki
Hi all, I'm happy to announce that the ASF Board has accepted the resolution to separate Nutch from the Lucene project and make it into a top-level project (full text of the resolution can be viewed here [1]). Thanks to all who voted and who worked on preparing this proposal! This means that in

Re: [VOTE] Open up a separate line for unstable Solr/Lucene development

2010-04-26 Thread Robert Muir
On Mon, Apr 26, 2010 at 8:54 AM, Michael McCandless luc...@mikemccandless.com wrote: Changes that go into stable need to be merged forwards to unstable -- this may happen commit by commit, or be periodically swept, or some combination (like flex) -- we can hash out this logistical detail out

RE: [VOTE] Open up a separate line for unstable Solr/Lucene development

2010-04-26 Thread Uwe Schindler
Hi, This is a vote for the proposal discussed on the 'Proposal about Version API relaxation' thread. The vote is to open up a separate parallel line of development, called unstable (on trunk), where non-back-compatible changes, slated for the next major release, may be safely developed.

[jira] Commented: (MAHOUT-305) Combine both cooccurrence-based CF M/R jobs

2010-04-26 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12860931#action_12860931 ] Ted Dunning commented on MAHOUT-305: I think that the key to speed with a cooccurrence

Re: [VOTE] Open up a separate line for unstable Solr/Lucene development

2010-04-26 Thread Michael McCandless
OK let's cancel this vote. I'll change the proposal, to dev on trunk and port back to stable, and call a new vote. Mike On Mon, Apr 26, 2010 at 11:51 AM, Shai Erera ser...@gmail.com wrote: I'm also -1 on that particular point. I think that dev should happen on trunk always, and backporting is

[VOTE] Take 2: Open up a separate line for unstable Solr/Lucene development

2010-04-26 Thread Michael McCandless
This is a vote for the proposal discussed on the 'Proposal about Version API relaxation' thread. This thread replaces the first VOTE thread! The vote is to open up a separate parallel line of development, called unstable (on trunk), where non-back-compatible changes, slated for the next major

Re: [VOTE] Take 2: Open up a separate line for unstable Solr/Lucene development

2010-04-26 Thread Shai Erera
+1 Shai On Mon, Apr 26, 2010 at 7:06 PM, Robert Muir rcm...@gmail.com wrote: +1 On Mon, Apr 26, 2010 at 11:59 AM, Michael McCandless luc...@mikemccandless.com wrote: This is a vote for the proposal discussed on the 'Proposal about Version API relaxation' thread. This thread replaces

[jira] Assigned: (LUCENE-1585) Allow to control how payloads are merged

2010-04-26 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch reassigned LUCENE-1585: - Assignee: Shai Erera (was: Michael Busch) Allow to control how payloads are merged

Re: Lucene RAM buffer size limit

2010-04-26 Thread Michael Busch
With DocumentsWriterPerThread we can allow 2GB per thread, so that should be a good step forward. For realtime indexing on the RAM buffer I'm planning to remove even that per-thread limit, because then you really want to make use of all the RAM you have available on your machine. Michael

RE: [VOTE] Take 2: Open up a separate line for unstable Solr/Lucene development

2010-04-26 Thread Uwe Schindler
+1 - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Monday, April 26, 2010 6:00 PM To: dev@lucene.apache.org Subject: [VOTE] Take 2: Open up a

Re: [VOTE] Take 2: Open up a separate line for unstable Solr/Lucene development

2010-04-26 Thread Michael Busch
+1 Michael On 4/26/10 8:59 AM, Michael McCandless wrote: This is a vote for the proposal discussed on the 'Proposal about Version API relaxation' thread. This thread replaces the first VOTE thread! The vote is to open up a separate parallel line of development, called unstable (on trunk),

Re: Solr core gives data dir precedence to SolrConfig over CoreDescriptor

2010-04-26 Thread Chris Hostetter
: Subject: Solr core gives data dir precedence to SolrConfig over CoreDescriptor : : I've run into issues with this in tests with the Cloud patch. : : Anyone know if this is planned behavior - its fairly annoying. Doesn't it seem : that if you set the data dir on the Core descriptor, that should

Re: [VOTE] Take 2: Open up a separate line for unstable Solr/Lucene development

2010-04-26 Thread DM Smith
On 04/26/2010 02:43 PM, Chris Hostetter wrote: I didn't follow the Version API relaxation thread (my fault: i thought it was focused solely on how we were dealing with o.a.l.Version and lots of smart people were talking in ernest so i left it to them to make smart choices) but looking at this

Re: [VOTE] Take 2: Open up a separate line for unstable Solr/Lucene development

2010-04-26 Thread Mark Miller
On 4/26/10 2:43 PM, Chris Hostetter wrote: My best guess: that what this is really suggesting is that trunk *always* be targeted at the next major release (ie: 4.0, 5.0, 6.0, etc...) and that development of minor releases (ie: 3.2, 3.3, ...; 4.1., 4.2, etc...) happen on more stable branches off

Re: Solr core gives data dir precedence to SolrConfig over CoreDescriptor

2010-04-26 Thread Ryan McKinley
On Sun, Apr 25, 2010 at 3:20 PM, Mark Miller markrmil...@gmail.com wrote: I've run into issues with this in tests with the Cloud patch. Anyone know if this is planned behavior - its fairly annoying. Doesn't it seem that if you set the data dir on the Core descriptor, that should take

Re: [VOTE] Take 2: Open up a separate line for unstable Solr/Lucene development

2010-04-26 Thread Earwin Burrfoot
I'd like to +1 on this with all my tiny non-committer might. On Mon, Apr 26, 2010 at 23:06, Michael McCandless luc...@mikemccandless.com wrote: This is exactly the intention behind the proposal we are voting on. Big changes, that'd be destabilizing if attempted on the stable branch, would be

Re: [VOTE] Take 2: Open up a separate line for unstable Solr/Lucene development

2010-04-26 Thread Shai Erera
An interesting point was made on Version - we cannot remove it from trunk just to reintroduce it when trunk is released as .0 and then followed by .1 .2 stable releases … otherwise it would appear/disappear constantly :)? So I guess Versuon should go away entirely? Shai On Monday, April 26,

[jira] Commented: (LUCENE-2373) Change StandardTermsDictWriter to work with streaming and append-only filesystems

2010-04-26 Thread Lance Norskog (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861214#action_12861214 ] Lance Norskog commented on LUCENE-2373: --- Does this make it possible to add a good

Re: [VOTE] Take 2: Open up a separate line for unstable Solr/Lucene development

2010-04-26 Thread Doron Cohen
+1 I would like to note (and depict) one aspect of this change (although it is most probably clear to active followers of these threads). Deprecation warnings had been quite useful for users when upgrading to a *major* release. The voted change cancels them. Example follows. Picture the

Re: Lucene RAM buffer size limit

2010-04-26 Thread Shai Erera
Hi Tom I don't know of an easy way to understand the relationship between the max RAM and the buffer size. I ran the test w/ 8GB heap and 2048 MB RAM buffer. indexing 16M documents (roughly 288GB data) took 7400 seconds (by 8 threads). I will post the full benchmark output when I finish indexing