[jira] Commented: (MAHOUT-305) Combine both cooccurrence-based CF M/R jobs

2010-04-23 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12860223#action_12860223 ] Sean Owen commented on MAHOUT-305: -- And now more thoughts: Yes all the code is checked

[jira] Commented: (MAHOUT-305) Combine both cooccurrence-based CF M/R jobs

2010-04-23 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12860284#action_12860284 ] Sean Owen commented on MAHOUT-305: -- What do you mean about the secondary sort and is it

Re: Mahout In Action

2010-04-23 Thread Jeff Eastman
I also wonder how much my recent clustering changes have affected the examples in the clustering sections. I know the book is currently aimed at Mahout 0.3 but users trying the examples with trunk may be frustrated by the recent changes in file naming. Do the examples exist in an unannotated

Re: Mahout In Action

2010-04-23 Thread Robin Anil
Its not aimed at 0.3 per say. Right now its evolving with the code. For. eg. the quality factor is something that will go in there. I keep updating the code with the latest changes and so does Sean. There isnt much that got affected by your latest commit though(it compiles). Though I haven't fully

Re: Mahout In Action

2010-04-23 Thread Sean Owen
I think the goal is that the book is completely up to date with the code as of the day we have to send it to press. That will be right about 0.4, which I assume happens at the end of the summer after GSoC is digested. I just submitted changes today to match my changes this morning. If any of you

[jira] Commented: (TIKA-379) Html elements and attributes not available in XHTML representation

2010-04-23 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12860170#action_12860170 ] Jukka Zitting commented on TIKA-379: Re: second patch - Seems like a good approach.

Apache Tika is a top-level project!

2010-04-23 Thread Mattmann, Chris A (388J)
Hi All, The board has approved the Tika TLP. Yay! I've started the process of moving Tika to its TLP status, and filed INFRA issues [1] [2] and [3] for moving the mailing lists, SVN, and creating UNIX groups respectively. If there's anything I missed, let me know. I've asked that all current

Re: Directory.deleteFile confusingly throws IOException

2010-04-23 Thread Shai Erera
Ok, I'm good w/ leaving the IOE. We can wait 'till Lucene moves to Java 7 (2013 ?) and then we'll revisit this :). Shai On Fri, Apr 23, 2010 at 1:47 PM, Earwin Burrfoot ear...@gmail.com wrote: There's also place for alternate Directories, which can throw readable-loggable exceptions without

[jira] Created: (LUCENE-2415) Remove JakarteRegExCapabilities shim to access package protected field

2010-04-23 Thread Uwe Schindler (JIRA)
Remove JakarteRegExCapabilities shim to access package protected field -- Key: LUCENE-2415 URL: https://issues.apache.org/jira/browse/LUCENE-2415 Project: Lucene - Java

Re: Per-Thread DW and IW

2010-04-23 Thread Shai Erera
The big picture includes what you write, but also other usage, such as loading different slices into memory, introduce the complementary API to ParallelReader, query a single slice only etc. Shai On Thu, Apr 22, 2010 at 5:04 PM, Michael McCandless luc...@mikemccandless.com wrote: I like the

[jira] Updated: (LUCENE-2265) improve automaton performance by running on byte[]

2010-04-23 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2265: --- Attachment: LUCENE-2265.patch Checkpointing progress from Robert I on this

[jira] Issue Comment Edited: (LUCENE-1458) Further steps towards flexible indexing

2010-04-23 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781287#action_12781287 ] Robert Muir edited comment on LUCENE-1458 at 4/23/10 9:40 AM: --

[jira] Updated: (LUCENE-2265) improve automaton performance by running on byte[]

2010-04-23 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2265: --- Attachment: LUCENE-2265.patch Last patch was a bit stale -- this one is current,

HUG talk on Public Terabyte Dataset project

2010-04-23 Thread Ken Krugler
Hi all, Just wrote a blog post about the talk I gave on Wed night at the Hadoop Bay Area user group meetup: http://bixolabs.com/2010/04/22/hadoop-user-group-meetup-talk/ Key points for Tika are: 1. Tika worked well for processing the resulting HTML. 2. The sample analysis we did, on the

Mahout In Action

2010-04-23 Thread Jeff Eastman
Section 4.5.1 says: The third line shows how it is based on item-item similarities, not user-user similarities as before. The algorithms are similar, but not entirely symmetric. They do have notably different properties. For instance, the running time of an item-based recommender scales up as

SVN and Lucene 2.9.1

2010-04-23 Thread Jason Rutherglen
I'm browsing: http://lucene.apache.org/java/docs/developer-resources.html and there's http://svn.apache.org/repos/asf/lucene/dev/trunk however just beneath in the http://svn.apache.org/repos/asf/lucene/dev/branches/ directory there's nothing. Where's Lucene 2.9.1 source?

Re: Mahout In Action

2010-04-23 Thread Jeff Eastman
The APIs did not change but the clustered points directory changed from points to clusteredPoints and the various clusters directories changed from (e.g. canopies, clusters, clusters-n, canopies-n, state-n) to just clusters-n, where clusters-0 is used for the initial clusters needed for kmeans

RE: SVN and Lucene 2.9.1

2010-04-23 Thread Uwe Schindler
The old releases are in the old folder where they were always: http://svn.apache.org/repos/asf/lucene/java/branches/ - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Jason Rutherglen

Re: Mahout In Action

2010-04-23 Thread Robin Anil
If you are making more changes do that, you are more than welcome to. Just fix a convention. For example, in the clustering algorithms chapter, it was points and clusters-[0-n] like you said. and in dirichlet it was state-n. So it will be better if we stick to a single convention and the book will

RE: SVN and Lucene 2.9.1

2010-04-23 Thread Uwe Schindler
Robert, as you changed the SVN URLs on the website. Maybe we should also have a pointer to the branches of pre-Lucene-Solr merge (for both Lucene and Solr), for people to be able to checkout older branches? Jason, The new Lucene folder in SVN is for the combined Lucene/Solr development, so

Re: Mahout In Action

2010-04-23 Thread Jeff Eastman
My main goal for reworking the file nomenclature was to make the various clustering file names follow a consistent naming convention. I don't expect that to change again any time soon but I noticed that some of the examples need to be updated to work with trunk (0.4). On 4/23/10 11:11 AM,

[jira] Updated: (SOLR-571) LRUCache autowarmCount should support percentages

2010-04-23 Thread JIRA
[ https://issues.apache.org/jira/browse/SOLR-571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomás Fernández Löbbe updated SOLR-571: --- Attachment: SOLR-571.patch On the attached patch, I modified classes LRUCache and

RE: SVN and Lucene 2.9.1

2010-04-23 Thread Uwe Schindler
Versioned sites never contained SVN links in Lucene. This general developer information is only included in the unversioned site. As this general site also links to all release specific pages, we should maybe also provide a link to the old svn folder. Just for convenience, because people may

Re: Mahout In Action

2010-04-23 Thread Jeff Eastman
See ClusterBase for those constants On 4/23/10 11:53 AM, Robin Anil wrote: May I suggest keeping constants in a public String value. That way people will not hard code clsuters-0 and so on and instead use Clusterer.CLUSTER_DIR On Fri, Apr 23, 2010 at 11:55 PM, Jeff Eastman

[jira] Commented: (SOLR-1531) Provide an option to remove the data directory on core unload

2010-04-23 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12860366#action_12860366 ] Hoss Man commented on SOLR-1531: Paolo: thanks for working on this. i gave your patch a

[jira] Resolved: (SOLR-1887) Improve log message at DefaultSolrHighlighter when use.FastVectorHighlighter=true

2010-04-23 Thread Koji Sekiguchi (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi resolved SOLR-1887. -- Resolution: Fixed Committed revision 937579. Improve log message at DefaultSolrHighlighter

[jira] Commented: (SOLR-1238) exception in solrJ when authentication is used

2010-04-23 Thread Jon Baer (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12860474#action_12860474 ] Jon Baer commented on SOLR-1238: FYI, As per what I read here @

[jira] Issue Comment Edited: (SOLR-1238) exception in solrJ when authentication is used

2010-04-23 Thread Jon Baer (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12860474#action_12860474 ] Jon Baer edited comment on SOLR-1238 at 4/24/10 12:09 AM: -- FYI, As