[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15636656#comment-15636656 ] Erick Erickson commented on SOLR-1632: -- Please ask usage questions on the user's list, see "mailing lists" here: http://lucene.apache.org/solr/resources.html You'll get a lot more eyeballs on the question and likely a much faster answer. > Distributed IDF > --- > > Key: SOLR-1632 > URL: https://issues.apache.org/jira/browse/SOLR-1632 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.5 >Reporter: Andrzej Bialecki >Assignee: Anshum Gupta > Fix For: 5.0, 6.0 > > Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, > SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, > SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, > SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, > SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, > distrib.patch > > > Distributed IDF is a valuable enhancement for distributed search across > non-uniform shards. This issue tracks the proposed implementation of an API > to support this functionality in Solr. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15635553#comment-15635553 ] blackwing commented on SOLR-1632: - I've activated distrubted idf. I've two shards for my collection, shard1 contains 1000 docs and shard2 contains 800 doc. So maxDoc to calculate idf for a particular doc score is 1000+800? > Distributed IDF > --- > > Key: SOLR-1632 > URL: https://issues.apache.org/jira/browse/SOLR-1632 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.5 >Reporter: Andrzej Bialecki >Assignee: Anshum Gupta > Fix For: 5.0, 6.0 > > Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, > SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, > SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, > SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, > SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, > distrib.patch > > > Distributed IDF is a valuable enhancement for distributed search across > non-uniform shards. This issue tracks the proposed implementation of an API > to support this functionality in Solr. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14805205#comment-14805205 ] Varun Thacker commented on SOLR-1632: - I think the check should be modified from {{ontrolScore.floatValue() > shardScore.floatValue())}} to {{controlScore.floatValue() >= shardScore.floatValue())}} . I understand the motivation here that once a term starts getting 'rare' the score will be higher as the stats are just from the individual shards. The first part of the test doesn't seem to be triggering this though: {code} del("*:*"); for (int i = 0; i < clients.size(); i++) { int shard = i + 1; for (int j = 0; j <= i; j++) { index_specific(i, id, docId++, "a_t", "one two three", "shard_i", shard); } } {code} > Distributed IDF > --- > > Key: SOLR-1632 > URL: https://issues.apache.org/jira/browse/SOLR-1632 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.5 >Reporter: Andrzej Bialecki >Assignee: Anshum Gupta > Fix For: 5.0, Trunk > > Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, > SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, > SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, > SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, > SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, > distrib.patch > > > Distributed IDF is a valuable enhancement for distributed search across > non-uniform shards. This issue tracks the proposed implementation of an API > to support this functionality in Solr. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14737856#comment-14737856 ] Yonik Seeley commented on SOLR-1632: LUCENE-6758 removed part of the test of this issue: {code} --- lucene/dev/trunk/solr/core/src/test/org/apache/solr/search/stats/TestDefaultStatsCache.java 2015/09/09 03:13:44 1701894 +++ lucene/dev/trunk/solr/core/src/test/org/apache/solr/search/stats/TestDefaultStatsCache.java 2015/09/09 03:16:15 1701895 @@ -79,10 +79,6 @@ if (clients.size() == 1) { // only one shard assertEquals(controlScore, shardScore); -} else { - assertTrue("control:" + controlScore.floatValue() + " shard:" - + shardScore.floatValue(), - controlScore.floatValue() > shardScore.floatValue()); } } {code} http://svn.apache.org/viewvc?view=revision=1701895 Was it testing something important, and can it be replaced with something else? > Distributed IDF > --- > > Key: SOLR-1632 > URL: https://issues.apache.org/jira/browse/SOLR-1632 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.5 >Reporter: Andrzej Bialecki >Assignee: Anshum Gupta > Fix For: 5.0, Trunk > > Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, > SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, > SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, > SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, > SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, > distrib.patch > > > Distributed IDF is a valuable enhancement for distributed search across > non-uniform shards. This issue tracks the proposed implementation of an API > to support this functionality in Solr. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14282225#comment-14282225 ] Anshum Gupta commented on SOLR-1632: [~ysee...@gmail.com]: I did give it a thought but it would be tricky to support something like stats=implementation for each request. We could however have something like 'stats=local' or 'stats=global' where in the later case, it uses the implementation specified in the config. But yes, we could evaluate that more. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Assignee: Anshum Gupta Fix For: 5.0, Trunk Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260436#comment-14260436 ] ASF subversion and git services commented on SOLR-1632: --- Commit 1648428 from [~anshumg] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1648428 ] SOLR-1632: Distributed IDF, finally. (merge from trunk) Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Assignee: Anshum Gupta Fix For: 5.0, Trunk Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259225#comment-14259225 ] Yonik Seeley commented on SOLR-1632: bq. This isn't switched on by default as it certainly comes at some cost What would be really nice is to enable this on a per-request basis. Perhaps via globalStats=true We can open up a new issue if it's difficult enough... Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Assignee: Anshum Gupta Fix For: 5.0, Trunk Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255813#comment-14255813 ] Erick Erickson commented on SOLR-1632: -- WhoooHo! Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Assignee: Anshum Gupta Fix For: 5.0, Trunk Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256107#comment-14256107 ] Shawn Heisey commented on SOLR-1632: The commit is too large to digest easily. I assume this is on by default? Can it be enabled and disabled? I will likely be using this once it's available, but do we have any idea what the performance impact is? Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Assignee: Anshum Gupta Fix For: 5.0, Trunk Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256121#comment-14256121 ] Anshum Gupta commented on SOLR-1632: This isn't switched on by default as it certainly comes at some cost (there are no free lunches, remember?) :) It can be switched on by specifying what implementation you want via top-level solrconfig setting or System property i.e.: {code} statsCache class=org.apache.solr.search.stats.ExactStatsCache/ {code} About the performance impact, I tested it on my machine (which is not really a great thing to do as there's barely any possibility of network issues here) for about 6mn (real and mocked up Jeopardy questions dataset) docs and regular queries and the performance impact was barely noticeable. I still need to document this (which I'll add to the ref guide once this makes it into 5x) and I suppose things would be easier to understand for the end user then. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Assignee: Anshum Gupta Fix For: 5.0, Trunk Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256378#comment-14256378 ] Mark Miller commented on SOLR-1632: --- We should get some results across real machines, but I also turned my micro bench work onto this. I didn't confirm that the settings are actually taking affect, or review the latest work, but I ran the benchmark twice, once with LocalStatsCache and once with ExactStatsCache. bq. statsCache class=org.apache.solr.search.stats.ExactStatsCache/ bq. statsCache class=org.apache.solr.search.stats.LocalStatsCache/ The test uses two machines, one to create and send the docs/queries, another to run the Solr JVMs. I ran a query test using a ton of wikipedia data across 6 jvm instances, 6 shards, no replication. I indexed a ton of docs, and then used a bunch of threads and bunch of CloudSolrServer's to pound in some queries. Performance appeared nearly identical. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Assignee: Anshum Gupta Fix For: 5.0, Trunk Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256394#comment-14256394 ] Anshum Gupta commented on SOLR-1632: Right, I saw similar behavior on my tests. I think the impact really would be when there's a ton of query terms across multiple shards that actually use the network. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Assignee: Anshum Gupta Fix For: 5.0, Trunk Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255502#comment-14255502 ] ASF subversion and git services commented on SOLR-1632: --- Commit 1647253 from [~anshumg] in branch 'dev/trunk' [ https://svn.apache.org/r1647253 ] SOLR-1632: Distributed IDF, finally. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Assignee: Anshum Gupta Fix For: 5.0, Trunk Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255504#comment-14255504 ] Anshum Gupta commented on SOLR-1632: Thanks to everyone who's contributed on this one! The list is long :) I've committed this to trunk, if all stays well, will commit it into 5x later in the (coming) week. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Assignee: Anshum Gupta Fix For: 5.0, Trunk Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250963#comment-14250963 ] Anshum Gupta commented on SOLR-1632: I plan on committing this sometime over the weekend. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Assignee: Anshum Gupta Fix For: 5.0, Trunk Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246473#comment-14246473 ] Anshum Gupta commented on SOLR-1632: I think we should get this in now. This would not be enabled by default i.e. LocalStatsCache impl would be used anyways. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Assignee: Anshum Gupta Fix For: 5.0, Trunk Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152329#comment-14152329 ] Anshum Gupta commented on SOLR-1632: Thanks for updating the patch [~vzhovtiuk]. The tests pass now. I'm looking at the updated patch. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Assignee: Mark Miller Fix For: 4.9, Trunk Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14149902#comment-14149902 ] Anshum Gupta commented on SOLR-1632: I've uploaded and updated patch that applies to current trunk but has a failing TestLRUStatsCache at the review board. [~vzhovtiuk] Can you have a look at it too if you have time? Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Assignee: Mark Miller Fix For: 4.9, Trunk Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-5488.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142964#comment-14142964 ] Anshum Gupta commented on SOLR-1632: I'd created a reviewboard request to look and compare the last few patches. Thought I'd share that here. https://reviews.apache.org/r/25855/ Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Assignee: Mark Miller Fix For: 4.9, Trunk Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-5488.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930406#comment-13930406 ] Markus Jelsma commented on SOLR-1632: - No, but i think this happened when the QueryCommand code {code} public StatsSource getStatsSource() { return statsSource; } public QueryCommand setStatsSource(StatsSource dfSource) { this.statsSource = dfSource; return this; } {code} got removed. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Assignee: Mark Miller Fix For: 4.7, 5.0 Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925596#comment-13925596 ] Markus Jelsma commented on SOLR-1632: - Hi Vitaly, are you sure it still works? I tried your and few older patches again but docCounts are no longer the sum of the cluster size. The GET_STATS query is executed though. Two node test cluster: {code} 384841 [qtp1175813699-17] INFO org.apache.solr.core.SolrCore – [collection1] webapp=/solr path=/select params={distrib=falsedebug=trackwt=javabinrequestPurpose=GET_TERM_STATSversion=2rows=10debugQuery=falseshard.url=http://127.0.1.1:8983/solr/collection1/NOW=139039677rid=-collection1-139039677-12shards.purpose=2q=wikiisShard=true} status=0 QTime=1 384848 [qtp1175813699-17] INFO org.apache.solr.core.SolrCore – [collection1] webapp=/solr path=/select params={distrib=falsedebug=trackwt=javabinrequestPurpose=GET_TOP_IDS,GET_STATS,GET_TERMS,GET_MLT_RESULTS,SET_TERM_STATSversion=2rows=10org.apache.solr.stats.colStats=content_nl,121630,115956,16436279,11372267org.apache.solr.stats.terms=content_nl:wikiNOW=139039677shard.url=http://127.0.1.1:8983/solr/collection1/debugQuery=falsefl=id,scoreshards.purpose=5636rid=-collection1-139039677-12start=0q=wikiorg.apache.solr.stats.termStats=content_nl:wiki,284,645isShard=truefsv=true} hits=138 status=0 QTime=1 384863 [qtp1175813699-17] INFO org.apache.solr.core.SolrCore – [collection1] webapp=/solr path=/select params={ids=http://nl.wikipedia.org/wiki/Overleg_sjabloon:Infobox_film,http://nl.wikipedia.org/wiki/Overleg_sjabloon:Navigatie_Bijbel,http://nl.wikipedia.org/wiki/Overleg_help:Gebruik_van_sjablonen,http://nl.wikipedia.org/wiki/Overleg_sjabloon:Citeer_boek,http://nl.wikipedia.org/wiki/Overleg_sjabloon:Wiktdistrib=falsedebug=trackwt=javabinrequestPurpose=GET_FIELDS,GET_DEBUGversion=2rows=10debugQuery=trueshard.url=http://127.0.1.1:8983/solr/collection1/NOW=139039677rid=-collection1-139039677-12shards.purpose=320q=wikiisShard=true} status=0 QTime=7 384870 [qtp1175813699-13] INFO org.apache.solr.core.SolrCore – [collection1] webapp=/solr path=/select params={debugQuery=trueq=wiki} rid=-collection1-139039677-12 hits=284 status=0 QTime=33 {code} {code} 380242 [qtp1175813699-16] INFO org.apache.solr.core.SolrCore – [collection1] webapp=/solr path=/select params={distrib=falsedebug=trackwt=javabinrequestPurpose=GET_TERM_STATSversion=2rows=10debugQuery=falseshard.url=http://127.0.1.1:7574/solr/collection1/NOW=139039677rid=-collection1-139039677-12shards.purpose=2q=wikiisShard=true} status=0 QTime=0 380249 [qtp1175813699-16] INFO org.apache.solr.core.SolrCore – [collection1] webapp=/solr path=/select params={distrib=falsedebug=trackwt=javabinrequestPurpose=GET_TOP_IDS,GET_STATS,GET_TERMS,GET_MLT_RESULTS,SET_TERM_STATSversion=2rows=10org.apache.solr.stats.colStats=content_nl,121630,115956,16436279,11372267org.apache.solr.stats.terms=content_nl:wikiNOW=139039677shard.url=http://127.0.1.1:7574/solr/collection1/debugQuery=falsefl=id,scoreshards.purpose=5636rid=-collection1-139039677-12start=0q=wikiorg.apache.solr.stats.termStats=content_nl:wiki,284,645isShard=truefsv=true} hits=146 status=0 QTime=2 380263 [qtp1175813699-16] INFO org.apache.solr.core.SolrCore – [collection1] webapp=/solr path=/select params={ids=http://nl.wikipedia.org/wiki/Overleg_sjabloon:Navigatie,http://nl.wikipedia.org/wiki/Overleg_help:Waarom_staat_mijn_bestand_op_de_beoordelingslijst,http://nl.wikipedia.org/wiki/Overleg_help:Wikipediachat,http://nl.wikipedia.org/wiki/Overleg_sjabloon:Coördinaten,http://nl.wikipedia.org/wiki/Overleg_sjabloon:Sjabdoc/docdistrib=falsedebug=trackwt=javabinrequestPurpose=GET_FIELDS,GET_DEBUGversion=2rows=10debugQuery=trueshard.url=http://127.0.1.1:7574/solr/collection1/NOW=139039677rid=-collection1-139039677-12shards.purpose=320q=wikiisShard=true} status=0 QTime=6 {code} But i get these scores: {code} 12.8123455 = (MATCH) weight(content_nl:wiki in 18636) [], result of: 12.8123455 = score(doc=18636,freq=33.0 = termFreq=33.0 ), product of: 6.0355678 = idf(docFreq=138, docCount=57897) 2.122807 = tfNorm, computed from: 33.0 = termFreq=33.0 1.2 = parameter k1 0.0 = parameter b (norms omitted for field) {code} {code} 12.558066 = (MATCH) weight(content_nl:wiki in 60634) [], result of: 12.558066 = score(doc=60634,freq=25.0 = termFreq=25.0 ), product of: 5.982207 = idf(docFreq=146, docCount=58059) 2.0992365 = tfNorm, computed from: 25.0 = termFreq=25.0 1.2 = parameter k1 0.0 = parameter b (norms omitted for field) {code} Did it work for you? Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925920#comment-13925920 ] Mark Miller commented on SOLR-1632: --- bq. I tried your and few older patches again but docCounts are no longer the sum of the cluster size. Do you see what is missing in the tests to catch this? Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Assignee: Mark Miller Fix For: 4.7, 5.0 Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843025#comment-13843025 ] Markus Jelsma commented on SOLR-1632: - It is much faster now, even usable. But i haven't tried it in a larger cluster yet. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Assignee: Mark Miller Fix For: 5.0, 4.7 Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13842271#comment-13842271 ] Mark Miller commented on SOLR-1632: --- [~markus17], how was performance with your most recent patch compared to what you first reported? Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Assignee: Mark Miller Fix For: 5.0, 4.7 Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1301.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13842077#comment-13842077 ] Mark Miller commented on SOLR-1632: --- I've got two main concerns - the thread local and it looks like the statscache is not thread safe but shared across threads. The threadlocal is concerning because you can have thousands of threads and each will cache how many stats? I wish we could do something better. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Assignee: Mark Miller Fix For: 5.0, 4.7 Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13833392#comment-13833392 ] Mark Miller commented on SOLR-1632: --- I'm looking at a couple of the test fails before I go to bed tonight: {quote} [junit4] Tests with failures: [junit4] - org.apache.solr.handler.component.QueryElevationComponentTest.testGroupedQuery [junit4] - org.apache.solr.TestDistributedSearch.testDistribSearch [junit4] - org.apache.solr.search.stats.TestLRUStatsCache.testDistribSearch [junit4] - org.apache.solr.TestGroupingSearch.testGroupingGroupSortingScore_basicWithGroupSortEqualToSort [junit4] - org.apache.solr.TestGroupingSearch.testGroupingGroupSortingScore_withTotalGroupCount [junit4] - org.apache.solr.TestGroupingSearch.testGroupingGroupSortingScore_basic [junit4] - org.apache.solr.search.stats.TestExactStatsCache.testDistribSearch [junit4] - org.apache.solr.update.AddBlockUpdateTest.testXML [junit4] - org.apache.solr.update.AddBlockUpdateTest.testSolrJXML {quote} Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Fix For: 5.0 Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13833478#comment-13833478 ] Mark Miller commented on SOLR-1632: --- The config you need to use to turn this on is now: statsCache class=org.apache.solr.search.stats.ExactStatsCache/ It needs to go in the top level config section. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Assignee: Mark Miller Fix For: 5.0, 4.7 Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13833483#comment-13833483 ] Mark Miller commented on SOLR-1632: --- The thread local still scares me ... need to look closer at that. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Assignee: Mark Miller Fix For: 5.0, 4.7 Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13803043#comment-13803043 ] David commented on SOLR-1632: - is this patch currently working in 5.0? Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Fix For: 5.0 Attachments: 3x_SOLR-1632_doesntwork.patch, distrib-2.patch, distrib.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13803040#comment-13803040 ] David commented on SOLR-1632: - It seems like this task should have a much higher priority. Distributed IDF is very important for scoring across non-uniform shards. I am currently using Solr Cloud with grouping and without distributed IDF my boost functions are rendered nearly useless in terms of the result ordering expected. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Fix For: 5.0 Attachments: 3x_SOLR-1632_doesntwork.patch, distrib-2.patch, distrib.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13803064#comment-13803064 ] Markus Jelsma commented on SOLR-1632: - No, it does not work at all. I did spend some time on it but had other things to do. In the end i removed my (not working) changes and uploaded a patch that at least compiles against the revision of that time. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Fix For: 5.0 Attachments: 3x_SOLR-1632_doesntwork.patch, distrib-2.patch, distrib.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582022#comment-13582022 ] Markus Jelsma commented on SOLR-1632: - No, not yet. Please let me do some real tests, there must be issues, the patch is over a year old! :) Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Fix For: 5.0 Attachments: 3x_SOLR-1632_doesntwork.patch, distrib-2.patch, distrib.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582142#comment-13582142 ] Markus Jelsma commented on SOLR-1632: - It doesn't really seem to work, we're seeing lots of NPE's and if a response comes through IDF is not consistent for all terms. Most request return one of the NPE's below. Sometimes it works, and then the second request just fails. {code} java.lang.NullPointerException at org.apache.solr.search.stats.ExactStatsCache.sendGlobalStats(LRUStatsCache.java:202) at org.apache.solr.handler.component.QueryComponent.createMainQuery(QueryComponent.java:783) at org.apache.solr.handler.component.QueryComponent.regularDistributedProcess(QueryComponent.java:618) at... {code} {code} java.lang.NullPointerException at org.apache.solr.search.stats.LRUStatsCache.sendGlobalStats(LRUStatsCache.java:228) at org.apache.solr.handler.component.QueryComponent.createMainQuery(QueryComponent.java:783) at org.apache.solr.handler.component.QueryComponent.regularDistributedProcess(QueryComponent.java:618) at... {code} We also see this one from time to time, it looks like this is thrown is there are `no servers hosting shard`: {code} java.lang.NullPointerException at org.apache.solr.search.stats.LRUStatsCache.mergeToGlobalStats(LRUStatsCache.java:112) at org.apache.solr.handler.component.QueryComponent.updateStats(QueryComponent.java:743) at org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:659) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:634) at .. {code} It's also imposes a huge performance penalty with both LRUStatsCache and ExactStatsCache, if you're used to 40ms response times you'll see the average jump to 2 seconds with very frequent 5 second spikes. Performance stays poor if logging is disabled. The logs are also swamped with logs like: {code} 2013-02-20 11:54:48,091 WARN [search.stats.LRUStatsCache] - [http-8080-exec-5] - : ## Missing global colStats info: FIELD, using local 2013-02-20 11:54:48,091 WARN [search.stats.LRUStatsCache] - [http-8080-exec-5] - : ## Missing global termStats info: FIELD:TERM, using local {code} Both StatsCacheImpls behave like this. Each query logs lines like above. Maybe performance is poor because it tries to look up terms everytime but i'm not sure yet. Finally something crazy i'd like to share :) {code} -Infinity = (MATCH) sum of: -Infinity = (MATCH) max plus 0.35 times others of: -Infinity = (MATCH) weight(content_nl:amsterdam^1.6 in 449) [], result of: -Infinity = score(doc=449,freq=1.0 = termFreq=1.0 ), product of: 1.6 = boost -Infinity = idf(docFreq=29800090, docCount=-1) 1.0 = tfNorm, computed from: 1.0 = termFreq=1.0 1.2 = parameter k1 0.0 = parameter b (norms omitted for field) {code} If someone happens to recognize the issues above, i'm all ears :) Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Fix For: 5.0 Attachments: 3x_SOLR-1632_doesntwork.patch, distrib-2.patch, distrib.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582178#comment-13582178 ] Mark Miller commented on SOLR-1632: --- Hmm, that makes it look like the current tests for this must be pretty weak then. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Fix For: 5.0 Attachments: 3x_SOLR-1632_doesntwork.patch, distrib-2.patch, distrib.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582188#comment-13582188 ] Markus Jelsma commented on SOLR-1632: - Things have changed a lot in the past 13 months and i haven't figured it all out yet. I'll try to make sense out of it but some expert opinion and trial on the patch and all would be more than helpful. Is Andrzej not around? Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Fix For: 5.0 Attachments: 3x_SOLR-1632_doesntwork.patch, distrib-2.patch, distrib.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13581945#comment-13581945 ] Mark Miller commented on SOLR-1632: --- Nice. I mentioned this to AB not too long ago, but I'm of the mind to simply commit this. It will default to off, and we can continue to work on it. So unless someone steps in, I'll commit what Markus has put up. Markus, have you tried this out at all beyond the unit tests - eg on a cluster? Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Fix For: 5.0 Attachments: 3x_SOLR-1632_doesntwork.patch, distrib-2.patch, distrib.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13543008#comment-13543008 ] Markus Jelsma commented on SOLR-1632: - Any progress to report or does anyone have a patch that is updated for trunk? Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: 3x_SOLR-1632_doesntwork.patch, distrib-2.patch, distrib.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195619#comment-13195619 ] Yonik Seeley commented on SOLR-1632: bq. There is nothing different from a MTQ generated BQ than a huge BQ a solr user submits. Multi-term queries like range query, prefix query, etc, do not depend on term stats, and can consist of millions of terms. It's a waste to attempt to return term stats for them (estimated or not). It would also be a shame to use estimates rather than exact numbers for what will be the common case (i.e. when there's really only a couple of terms you need stats for): +title:blue whale +title_whole:[a TO g} or +title:blue whale +date:[2001-01-01 TO 2010-01-01} Ideally, we wouldn't even do a rewrite in order to collect terms - rewrite itself has gotten much more expensive in some circumstances (i.e. iterating the first 350 terms to determine what style of rewrite should be used) Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195632#comment-13195632 ] Robert Muir commented on SOLR-1632: --- {quote} Multi-term queries like range query, prefix query, etc, do not depend on term stats, and can consist of millions of terms. {quote} No, they cannot. it can't be millions of terms because a million exceeds the boolean max clause count, in which it will always use a filter. {quote} Ideally, we wouldn't even do a rewrite in order to collect terms {quote} You don't have to, Lucene's test case (ShardSearchingTestBase) doesn't do an extra rewrite to collect terms. {code} @Override public Query rewrite(Query original) throws IOException { final Query rewritten = super.rewrite(original); final SetTerm terms = new HashSetTerm(); rewritten.extractTerms(terms); // Make a single request to remote nodes for term // stats: ... return rewritten; } {code} {quote} - rewrite itself has gotten much more expensive in some circumstances (i.e. iterating the first 350 terms to determine what style of rewrite should be used) {quote} Got any benchmarks to back this up with? Its incorrect to say rewrite has gotten more expensive? More expensive than what? Its the opposite: its actually much faster when rewriting to boolean queries in 4.0 because it always works per-segment. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195635#comment-13195635 ] Yonik Seeley commented on SOLR-1632: bq. it can't be millions of terms because a million exceeds the boolean max clause count, in which it will always use a filter. So depending on exactly how many terms the range query covers, extractTerms may or may not return any. So extractTerms() may return 300 terms the first time, and then after someone adds some docs to the index it may suddenly return 0. This just strengthens the case that we should be consistent and just always ignore the terms from these MTQs. bq. Its incorrect to say rewrite has gotten more expensive? More expensive than what? Sorry, I wasn't specific enough. I meant compared to back when Solr had it's own RangeFilter and PrefixFilter that it would wrap in a ConstantScoreQuery. There never was any rewrite-to-boolean-query or consulting the index, so it's obviously a faster rewrite(). But back to the original question - I still see no reason to request/return/cache terms/stats from these multi-term queries when by definition they should not change the results of the request. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195637#comment-13195637 ] Uwe Schindler commented on SOLR-1632: - bq. Sorry, I wasn't specific enough. I meant compared to back when Solr had it's own RangeFilter and PrefixFilter that it would wrap in a ConstantScoreQuery. There never was any rewrite-to-boolean-query or consulting the index, so it's obviously a faster rewrite(). Just set in Solr the rewrite mode of MTQ to CONSTANT_SCORE_FILTER_REWRITE - done. There is no discussion needed and no custom RangeQuery in Solr. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195638#comment-13195638 ] Yonik Seeley commented on SOLR-1632: bq. Just set in Solr the rewrite mode of MTQ to CONSTANT_SCORE_FILTER_REWRITE - done. Right - I was considering the best way to do this (passing that info around solr about when to use what method). It solves both issues - relatively expensive rewrites that are not needed, and ignoring the MTQ terms. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195644#comment-13195644 ] Robert Muir commented on SOLR-1632: --- {quote} But back to the original question - I still see no reason to request/return/cache terms/stats from these multi-term queries when by definition they should not change the results of the request. {quote} My original point (forgetting about the specifics of MTQ, how things are being scored, or anything) is still that its a general case of Query that can have lots of Terms. So if there are concerns about lots of terms, I still think its worth considering having some limits on how many Terms would be exchanged. Maybe BooleanQuery's max clause count is already good enough, but another way to do it would be to have an approximate implementation that approximates when the term count for a query gets too high. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194814#comment-13194814 ] Robert Muir commented on SOLR-1632: --- Thanks Andrzej: I think it will be nice that all of lucene's scoring algorithms can work in distributed mode. Just one question about the patch: in StatsUtil I can't tell if termFromString matches termToString? termToString seems to base64 encode the term text (a good idea, since terms can be binary), but I don't see the corresponding decode in termFromString (there is an XXX: comment though). Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194842#comment-13194842 ] Andrzej Bialecki commented on SOLR-1632: - Hmm, indeed... I must have switched to toString() for debugging (its easier to eyeball an ascii string than a base64 string ;) ). This should use base64 throughout. I'll prepare a patch shortly. (BTW, I'm aware that passing around blobs of base64 inside SolrParams is ugly. I'm open to suggestions how to handle this better). Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194860#comment-13194860 ] Yonik Seeley commented on SOLR-1632: bq. (BTW, I'm aware that passing around blobs of base64 inside SolrParams is ugly. I'm open to suggestions how to handle this better). I'd prefer non-base64 at the Solr transport level (e.g. termStats=how,now,brown,cow). It will be both smaller, and much easier to debug other things. Although Lucene can technically index arbitrary binary now, Solr does not use that anywhere (and won't for 4.0). It would take a good amount of infrastructure work all over to truly allow that. If/when we allow arbitrary binary terms, it should be relatively easy to extend the syntax we pick today to allow selectively base64 encoded terms. There are already a number of places in Solr where we use StrUtil.join (a comma separated list of strings) to specify a list of terms (both in distrib faceting and distrib search for example). Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194878#comment-13194878 ] Robert Muir commented on SOLR-1632: --- {quote} Although Lucene can technically index arbitrary binary now, Solr does not use that anywhere (and won't for 4.0). {quote} Thats not actually true. Collation uses it already. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194893#comment-13194893 ] Yonik Seeley commented on SOLR-1632: bq. Thats not actually true. Collation uses it already. Hmmm, that's normally just for sorting though. I wonder if that works with distributed search today? Anyway, we have a schema - that can allow us to do what makes sense depending on the field (i.e. only use base64 or \x?? for fields where there will be non-character terms) Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194899#comment-13194899 ] Robert Muir commented on SOLR-1632: --- Its also used for locale-sensitive range queries (and of course termquery etc works too, but thats not interesting). Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194907#comment-13194907 ] Andrzej Bialecki commented on SOLR-1632: - \x or %xx escaping could be ok, I guess - it's safe, and in most cases it's still readable, unlike base64. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194910#comment-13194910 ] Yonik Seeley commented on SOLR-1632: bq. Its also used for locale-sensitive range queries Given that range queries (and other multi-term queries) are constant scoring and may contain *many* terms, hopefully we avoid requesting term stats for these? Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194915#comment-13194915 ] Andrzej Bialecki commented on SOLR-1632: - bq. hopefully we avoid requesting term stats for these? There is no provision for this yet in the current patch. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194921#comment-13194921 ] Robert Muir commented on SOLR-1632: --- {quote} There is no provision for this yet in the current patch. {quote} There is nothing different from a MTQ generated BQ than a huge BQ a solr user submits. In my opinion instead of saying screw scoring certain types of queries, this stuff should be done by InExact implementations (and maybe that should be the default, fine). e.g. a nice heuristic could look at the local stats and say: sure there are 100 terms but 50 are low-freq, lets assume additive constant C for those, batch the other terms into e.g. 5 ranges and only request stats on 5 surrogate terms representative of those groups. Just make sure any heuristic is always *added* to what is surely present locally, e.g. distributed docfreq is always = local docfreq. Then no scoring algorithms will break. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192357#comment-13192357 ] Shawn Heisey commented on SOLR-1632: Is this something that can be added to branch_3x? With high fuzz and ignore whitespace, the patch applies, but then fails to compile. It also fails to compile when I set fuzz to zero, pay attention to whitespace, and manually fix the patch rejects. I couldn't figure out how to fix the problems. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192374#comment-13192374 ] Andrzej Bialecki commented on SOLR-1632: - bq. Is this something that can be added to branch_3x? Not without porting - Lucene / Solr API-s have changed significantly, and this patch uses some low-level API-s that are different between trunk and 3x. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192394#comment-13192394 ] Yonik Seeley commented on SOLR-1632: Haven't had time to look this over that closely, but this did jump out at me: +public class CollectionStats { + public String field; + public int maxDoc; + public int docCount; Shouldn't we be using longs here so we can support more than 2B docs? Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192407#comment-13192407 ] Andrzej Bialecki commented on SOLR-1632: - Yeah, I was curious about this too. However, this is how CollectionStatistics is defined in Lucene, so it's something that we have to change in Lucene too. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192438#comment-13192438 ] Robert Muir commented on SOLR-1632: --- {quote} However, this is how CollectionStatistics is defined in Lucene, so it's something that we have to change in Lucene too. {quote} TermStatistics too. Lets open a separate issue for this. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142603#comment-13142603 ] Mark Miller commented on SOLR-1632: --- Recently I updated this patch to trunk and got rid of the threadlocal usage and Query rewriting that was the reason we had to pull this from trunk long ago - then I attempted to override stats on IndexSearcher with global stats - this is when I realized that had no affect on scoring anymore - this will now be addressed LUCENE-3555. Unfortunately, I didn't pay attention and lost that code. It's unfortunate, because it would have been a nice head start on this issue - I think we may want to make other changes/improvements, but would have been a start with something working. It was a half pain to do since the patch has to be manually applied, but perhaps doing it a second time is faster... Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142634#comment-13142634 ] Mark Miller commented on SOLR-1632: --- Correction: i got rid of the rewrite that was added for the multi searcher type behavior - I hadn't solved the issue of rewrite to get the terms to retrieve stats for - that patch was not yet going to work with multiterm queries. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142641#comment-13142641 ] Mark Miller commented on SOLR-1632: --- Although, actually I'm not even sure if that rewrite is really a problem - I almost don't think it will tickle the same issue as the rewrite that was happening before the search. I didn't have a chance to test it or look into it in depth or anything yet though. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12998262#comment-12998262 ] Thorsten Scherler commented on SOLR-1632: - Regarding the comment Perhaps one idea is to use a visitor pattern to decouple tree traversal with the operations being performed. can you please explain where to implement the Listener/visitor. I had a quick look at the patch and it seems to me that the main functionality is in trunk/src/java/org/apache/solr/search/SolrIndexSearcher.java and the rest is more caching concerns, right? Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892167#action_12892167 ] LiLi commented on SOLR-1632: My solr version is 1.4. I patched it but failed. SolrCacheString, Integer cache = perShardCache.get(shard); it suggests that The type SolrCache is not generic; it cannot be parameterized with arguments String, Integer The SolrCache is a interface: public interface SolrCache extends SolrInfoMBean patching file src/common/org/apache/solr/common/params/ShardParams.java patching file src/java/org/apache/solr/core/SolrConfig.java Hunk #1 succeeded at 30 with fuzz 2 (offset 2 lines). Hunk #2 FAILED at 197. 1 out of 2 hunks FAILED -- saving rejects to file src/java/org/apache/solr/core/ SolrConfig.java.rej patching file src/java/org/apache/solr/core/SolrCore.java Hunk #5 succeeded at 821 (offset 3 lines). patching file src/java/org/apache/solr/handler/component/QueryComponent.java Hunk #1 succeeded at 40 with fuzz 2 (offset -2 lines). Hunk #6 succeeded at 302 (offset 13 lines). Hunk #7 succeeded at 324 with fuzz 2 (offset 12 lines). Hunk #8 succeeded at 343 (offset 21 lines). Hunk #9 succeeded at 367 (offset 21 lines). Hunk #10 succeeded at 423 (offset 28 lines). patching file src/java/org/apache/solr/handler/component/SearchHandler.java patching file src/java/org/apache/solr/handler/component/ShardRequest.java Hunk #1 FAILED at 37. 1 out of 1 hunk FAILED -- saving rejects to file src/java/org/apache/solr/handle r/component/ShardRequest.java.rej patching file src/java/org/apache/solr/search/DFCache.java patching file src/java/org/apache/solr/search/DFSource.java patching file src/java/org/apache/solr/search/DefaultDFCache.java patching file src/java/org/apache/solr/search/ExactDFCache.java patching file src/java/org/apache/solr/search/LRUDFCache.java patching file src/java/org/apache/solr/search/SolrIndexSearcher.java Hunk #1 succeeded at 77 (offset 3 lines). Hunk #2 succeeded at 149 (offset 3 lines). Hunk #3 succeeded at 699 (offset 46 lines). Hunk #4 succeeded at 927 (offset 59 lines). Hunk #5 succeeded at 1041 (offset 59 lines). Hunk #6 succeeded at 1190 with fuzz 1 (offset 180 lines). Hunk #7 FAILED at 1276. Hunk #8 FAILED at 1311. Hunk #9 succeeded at 1608 (offset 104 lines). Hunk #10 succeeded at 1716 (offset 113 lines). Hunk #11 succeeded at 1774 (offset 113 lines). 2 out of 11 hunks FAILED -- saving rejects to file src/java/org/apache/solr/sear ch/SolrIndexSearcher.java.rej patching file src/java/org/apache/solr/util/SolrPluginUtils.java can't find file to patch at input line 1206 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -- |Index: trunk/src/test/org/apache/solr/BaseDistributedSearchTestCase.java |=== |--- trunk/src/test/org/apache/solr/BaseDistributedSearchTestCase.java (revisio n 893413) |+++ trunk/src/test/org/apache/solr/BaseDistributedSearchTestCase.java (working copy) -- File to patch: Skip this patch? [y] n File to patch: Skip this patch? [y] Skipping patch. 4 out of 4 hunks ignored patching file src/test/org/apache/solr/search/TestDefaultDFCache.java patching file src/test/org/apache/solr/search/TestExactDFCache.java patching file src/test/org/apache/solr/search/TestLRUDFCache.java patching file src/test/test-files/solr/conf/solrconfig-defaultdfcache.xml patching file src/test/test-files/solr/conf/solrconfig-exactdfcache.xml patching file src/test/test-files/solr/conf/solrconfig-lrudfcache.xml Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856517#action_12856517 ] Yonik Seeley commented on SOLR-1632: Rewrite not working through function query is not the end of the problems either... there is also stuff like extractTerms. There is also the issue of Lucene changing rapidly... and the difficulty of adding new methods to ValueSource and making sure that all implementations correctly propagate them through to sub ValueSources. Perhaps one idea is to use a visitor pattern to decouple tree traversal with the operations being performed. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856220#action_12856220 ] Yonik Seeley commented on SOLR-1632: Was looking into this a little offline with Mark, who noticed that some queries were not being rewritten, and would thus throw an exception during weighting. It looks like the issue is this: rewrite() doesn't work for function queries (there is no propagation mechanism to go through value sources). This is a problem when real queries are embedded in function queries. Solr Function queries do have a mechanism to weight (via ValueSource.createWeight()). QueryValueSource does Weight w = q.weight(searcher); and that implementation of weight calls Query query = searcher.rewrite(this); This patch calls rewrite explicitly (which does nothing for embedded queries), and then when using the DFSource implementation of searcher, rewrite does nothing, and hence the embedded query is never rewritten and the subsequent createWeight() throws an exception. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793283#action_12793283 ] Marc Sturlese commented on SOLR-1632: - Wich should be the value of the parameter shard.purpose to enable or disable the exact version of global IDF? Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789174#action_12789174 ] Andrzej Bialecki commented on SOLR-1632: - I'm not sure what approach you are referring to. Following the terminology in that thread, this implementation follows the approach where there is a single merged big idf map at the master, and it's sent out to slaves on each query. However, when exactly this merging and sending happens is implementation-specific - in the ExactDFSource it happens on every query, but I hope the API can support other scenarios as well. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789379#action_12789379 ] Otis Gospodnetic commented on SOLR-1632: I didn't look a the patch, but from your comments it looks like you already have that 1 merged big idf map, which is really what I was aiming at, so that's good! I was just thinking that this map (file) would be periodically updated and pushed to slaves, so that slaves can compute the global IDF *locally* instead of any kind of extra requests. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789607#action_12789607 ] Andrzej Bialecki commented on SOLR-1632: - I believe the API that I propose would support such implementation as well. Please note that it's usually not feasible to compute and distribute the complete IDF table for all terms - you would have to replicate a union of all term dictionaries across the cluster. In practice, you limit the amount of information by various means, e.g. only distributing data related to the current request (this implementation) or reducing the frequency of updates (e.g. LRU caching), or approximating global DF with a constant for frequent terms (where the contribution of their IDF to the score would be negligible anyway). Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789120#action_12789120 ] Otis Gospodnetic commented on SOLR-1632: What about this approach: http://markmail.org/message/mjfmpzfspguepixx ? Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.