Re: GIT does not support empty directories

2010-04-16 Thread Ted Dunning
Put a readme file in the directory and be done with it. On Fri, Apr 16, 2010 at 8:40 AM, Robert Muir rcm...@gmail.com wrote: I don't like the idea of complicating lucene/solr's build system any more than it already is, unless its absolutely necessary. its already too complicated. Instead of

Re: GIT does not support empty directories

2010-04-16 Thread Ted Dunning
That is where I learned the trick. On Fri, Apr 16, 2010 at 1:05 PM, Andrzej Bialecki a...@getopt.org wrote: On 2010-04-16 21:33, Ted Dunning wrote: Put a readme file in the directory and be done with it. That's a trick I used with CVS 15 years ago ... these newfangled gizmos aren't so

Re: Implementing near duplicate detection algorithm using IDF statistics

2010-03-24 Thread Ted Dunning
For reference, you can get a rental copy of this article for less than the cost of the full PDF download here: http://www.deepdyve.com/lp/association-for-computing-machinery/collection-statistics-for-fast-duplicate-document-detection-0o7i3Sx0Wd (joining the ACM is also a good thing to do) (and

Re: Branding Solr+Lucene

2010-03-22 Thread Ted Dunning
On Mon, Mar 22, 2010 at 11:30 AM, Yonik Seeley ysee...@gmail.com wrote: On Mon, Mar 22, 2010 at 2:20 PM, Ryan McKinley ryan...@gmail.com wrote: I'm confused... what is the need for a new name? The only place where there is a conflict is in the top level svn tree... Agree, no need to

Re: rough outline of where Solr's going

2010-03-16 Thread Ted Dunning
The key word here is end-user. On Tue, Mar 16, 2010 at 10:57 AM, Kevin Osborn osbo...@yahoo.com wrote: I definitely agree with Chris here. Although Lucene and Solr are highly related, the version numbering should communicate whether Solr has changed in a significant or minor way to the

[jira] Commented: (SOLR-1375) BloomFilter on a field

2010-03-16 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846166#action_12846166 ] Ted Dunning commented on SOLR-1375: --- Sorry to comment late here, but when indexing

[jira] Commented: (SOLR-1814) select count(distinct fieldname) in SOLR

2010-03-10 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-1814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12843632#action_12843632 ] Ted Dunning commented on SOLR-1814: --- Trove is GPL. The Mahout project has a partial set

[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-02-19 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835963#action_12835963 ] Ted Dunning commented on SOLR-1724: --- Will this http access also allow a cluster

Re: SolrCloud - Using collections, slices and shards in the wild

2010-02-10 Thread Ted Dunning
query pattern involves most data sources while for you, the dominant pattern will likely involve a single data source. On Tue, Feb 9, 2010 at 9:02 PM, Jon Gifford jon.giff...@gmail.com wrote: 1) Support one index per customer, and many customers (thus, many independent indices) -- Ted Dunning

Re: SolrCloud - Using collections, slices and shards in the wild

2010-02-10 Thread Ted Dunning
, but the SolrCloud stuff seems simpler and closer to what I need. We shall see :-) -- Ted Dunning, CTO DeepDyve

Re: priority queue in query component

2010-02-09 Thread Ted Dunning
(srsp.shard) = 0) { // TODO: remove previous from priority queue // continue; // } } Is there a ticket open for this issue? What would it take to fix? Thanks, Mike -- Lance Norskog goks...@gmail.com -- Ted

Re: priority queue in query component

2010-02-06 Thread Ted Dunning
queue // continue; // } } Is there a ticket open for this issue? What would it take to fix? Thanks, Mike -- Ted Dunning, CTO DeepDyve

[jira] Commented: (SOLR-1301) Solr + Hadoop

2010-02-02 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828961#action_12828961 ] Ted Dunning commented on SOLR-1301: --- {quote} Based on these observation, I have few

[jira] Commented: (SOLR-1301) Solr + Hadoop

2010-01-29 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806547#action_12806547 ] Ted Dunning commented on SOLR-1301: --- It is critical to put indexes in the task local area

[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-01-21 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803371#action_12803371 ] Ted Dunning commented on SOLR-1724: --- {quote} ... I agree, I'm not really into ephemeral ZK

Re: Case-insensitive searches and facet case

2010-01-20 Thread Ted Dunning
and analyze it differently. -- Ted Dunning, CTO DeepDyve

Re: Solr Cloud wiki and branch notes

2010-01-17 Thread Ted Dunning
there is no need to change anything in Solr, just extend Katta to run Solr searchers ... :P -- Ted Dunning, CTO DeepDyve

Re: Solr Cloud wiki and branch notes

2010-01-17 Thread Ted Dunning
only be one copy of a physical shard, it seemed strange to call it a replica. Yeah .. it's a replica with a replication factor of 1 :) -- Ted Dunning, CTO DeepDyve

Re: Solr Cloud wiki and branch notes

2010-01-17 Thread Ted Dunning
some compelling cases for doing so). Seems like a good goal would be to support the customer having various levels of control. -- Ted Dunning, CTO DeepDyve

Re: Solr Cloud wiki and branch notes

2010-01-16 Thread Ted Dunning
names, leading users to assume that core == collection. Global index is two words but it's unambiguous. I'm fine with the collection if we clarify the definition and avoid using this term for other stuff. -- Ted Dunning, CTO DeepDyve

[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-01-16 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801296#action_12801296 ] Ted Dunning commented on SOLR-1724: --- {quote} We actually started out that way... (when

Re: [jira] Commented: (SOLR-1301) Solr + Hadoop

2010-01-15 Thread Ted Dunning
This can also be a big performance win. Jason Venner reports significant index and cluster start time improvements by indexing to local disk, zipping and then uploading the resulting zip file. Hadoop has significant file open overhead so moving one zip file wins big over many index component

Re: [jira] Commented: (SOLR-1301) Solr + Hadoop

2010-01-15 Thread Ted Dunning
The reason I would a major speed win when expect indexing to local disk and copying later is that you get much more efficient reading of documents with normal hadoop mechanisms. Throwing documents to the various Solr master indexers is bound to be slower than having 20 machines reading at local

Re: [jira] Commented: (SOLR-1301) Solr + Hadoop

2010-01-15 Thread Ted Dunning
. -- Ted Dunning, CTO DeepDyve

[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

2010-01-15 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801051#action_12801051 ] Ted Dunning commented on SOLR-1724: --- Katta had some interesting issues in the design

Re: Solr Cloud wiki and branch notes

2010-01-15 Thread Ted Dunning
On Fri, Jan 15, 2010 at 4:36 PM, Andrzej Bialecki a...@getopt.org wrote: My 0.02 PLN on the subject ... Polish currency seems pretty strong lately. There are a lot of good ideas for this small sum. Terminology * (global) search index * index shard: * partitioning: * search node: *

Re: SolrCloud logical shards

2010-01-14 Thread Ted Dunning
14, 2010 at 9:08 AM, Yonik Seeley yo...@lucidimagination.comwrote: Should we use logical shard for this, or does anyone have any better ideas? -- Ted Dunning, CTO DeepDyve

Re: SolrCloud logical shards

2010-01-14 Thread Ted Dunning
of replication, or due to merging shards, etc), and a separate word for a logical slice of the index seems desirable. -- Ted Dunning, CTO DeepDyve

Re: SolrCloud logical shards

2010-01-14 Thread Ted Dunning
that unambiguously identifies it), and then shard could be the logical entity. But I've kind of gotten used to thinking of shards as the actual physical queryable things... -Yonik http://www.lucidimagination.com -- Ted Dunning, CTO DeepDyve

Re: SolrCloud logical shards

2010-01-14 Thread Ted Dunning
lindex (and also enough of the URL that unambiguously identifies it), and then shard could be the logical entity. But I've kind of gotten used to thinking of shards as the actual physical queryable things... -Yonik http://www.lucidimagination.com -- Ted Dunning, CTO

Re: SolrCloud logical shards

2010-01-14 Thread Ted Dunning
...@gmail.com wrote: Logical-to-physical mapping should not assume that the logical has an integral number of the physical. Overlapping and partial physical shards should be addressable as a logical shard. If you're going to do something this major, do it right. -- Ted Dunning, CTO DeepDyve