[Lucene.Net] [jira] Commented: (LUCENENET-399) Port changes from Java Lucene 2.9.3 and 2.9.4 releases
[ https://issues.apache.org/jira/browse/LUCENENET-399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007711#comment-13007711 ] Digy commented on LUCENENET-399: Forget that question, I already committed. (First commit, then think about it :) ) DIGY Port changes from Java Lucene 2.9.3 and 2.9.4 releases -- Key: LUCENENET-399 URL: https://issues.apache.org/jira/browse/LUCENENET-399 Project: Lucene.Net Issue Type: Task Components: Lucene.Net Core, Lucene.Net Test Reporter: Troy Howard Assignee: Scott Lombard Fix For: Lucene.Net 2.9.4 Time Spent: 2h Remaining Estimate: 40h Port changes from Java Lucene 2.9.3 and 2.9.4 releases. The Lucene.Net 2.9.4 release will roll up the changes from both of those releases into one. Unit tests should be added or updated to reflect the changes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (LUCENE-2945) Surround Query doesn't properly handle equals/hashcode
[ https://issues.apache.org/jira/browse/LUCENE-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Elschot updated LUCENE-2945: - Attachment: LUCENE-2945d.patch Basically the 2945d patch of 16 March 2011 is a refactoring of the 2945c patch. The static inner classes have been moved to package private classes, and their common function was moved to a new super class. Also a few more test cases were added. Test cases for testing not equals might be still be added, but I don't see a real need to do that. As this adds handling equals/hashcode and has hardly any redundancy, I think this is committable. Surround Query doesn't properly handle equals/hashcode -- Key: LUCENE-2945 URL: https://issues.apache.org/jira/browse/LUCENE-2945 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0.3, 3.1, 4.0 Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 3.1.1, 4.0 Attachments: LUCENE-2945-partial1.patch, LUCENE-2945.patch, LUCENE-2945.patch, LUCENE-2945.patch, LUCENE-2945c.patch, LUCENE-2945d.patch In looking at using the surround queries with Solr, I am hitting issues caused by collisions due to equals/hashcode not being implemented on the anonymous inner classes that are created by things like DistanceQuery (branch 3.x, near line 76) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-2945) Surround Query doesn't properly handle equals/hashcode
[ https://issues.apache.org/jira/browse/LUCENE-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007381#comment-13007381 ] Paul Elschot edited comment on LUCENE-2945 at 3/16/11 8:27 AM: --- Basically the 2945d patch of 16 March 2011 is a refactoring of the 2945c patch. The static inner classes have been moved to package private classes, and their common function was moved to a new super class. Also a few more test cases were added. Test cases for testing not equals might be still be added, but I don't see a real need to do that. As this adds handling equals/hashcode and has hardly any redundancy, I think this is close to committable. The patch also deprecates a compare..() method, I don't know whether the comments there are to the point. was (Author: paul.elsc...@xs4all.nl): Basically the 2945d patch of 16 March 2011 is a refactoring of the 2945c patch. The static inner classes have been moved to package private classes, and their common function was moved to a new super class. Also a few more test cases were added. Test cases for testing not equals might be still be added, but I don't see a real need to do that. As this adds handling equals/hashcode and has hardly any redundancy, I think this is committable. Surround Query doesn't properly handle equals/hashcode -- Key: LUCENE-2945 URL: https://issues.apache.org/jira/browse/LUCENE-2945 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0.3, 3.1, 4.0 Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 3.1.1, 4.0 Attachments: LUCENE-2945-partial1.patch, LUCENE-2945.patch, LUCENE-2945.patch, LUCENE-2945.patch, LUCENE-2945c.patch, LUCENE-2945d.patch In looking at using the surround queries with Solr, I am hitting issues caused by collisions due to equals/hashcode not being implemented on the anonymous inner classes that are created by things like DistanceQuery (branch 3.x, near line 76) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2968) SurroundQuery doesn't support SpanNot
[ https://issues.apache.org/jira/browse/LUCENE-2968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007385#comment-13007385 ] Paul Elschot commented on LUCENE-2968: -- SpanNot filters on no(t) overlap. Any idea for an operator name? spn nov nto ... ? SurroundQuery doesn't support SpanNot - Key: LUCENE-2968 URL: https://issues.apache.org/jira/browse/LUCENE-2968 Project: Lucene - Java Issue Type: Improvement Reporter: Grant Ingersoll Priority: Minor It would be nice if we could do span not in the surround query, as they are quite useful for keeping searches within a boundary (say a sentence) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2968) SurroundQuery doesn't support SpanNot
[ https://issues.apache.org/jira/browse/LUCENE-2968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007388#comment-13007388 ] Paul Elschot commented on LUCENE-2968: -- This could also be an opportunity to port Surround to the new query parser in Lucene. SurroundQuery doesn't support SpanNot - Key: LUCENE-2968 URL: https://issues.apache.org/jira/browse/LUCENE-2968 Project: Lucene - Java Issue Type: Improvement Reporter: Grant Ingersoll Priority: Minor It would be nice if we could do span not in the surround query, as they are quite useful for keeping searches within a boundary (say a sentence) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2430) Swapping cores with persistent switched on should save swapped core to defaultCoreName
Swapping cores with persistent switched on should save swapped core to defaultCoreName -- Key: SOLR-2430 URL: https://issues.apache.org/jira/browse/SOLR-2430 Project: Solr Issue Type: Bug Components: multicore Affects Versions: 4.0 Environment: CentOS Reporter: bidorbuy Running on the latest trunk version and configured multi-cores with persistent turned on and set a default-core. When swapping cores I would have expected default behavior to be that the swapped core name would be persisted as the new defaultCoreName. i.e. if switching from primary to staging, the defaultCoreName should be written to staging. When swapping out cores (i.e. from primary to staging) and then restarting Jetty, Solr falls back to the current configured default-core (=primary) instead of the previously swapped one (=staging). If this is intended, can perhaps the swap command be extended to force rewritting Solr.xml Current config file: ?xml version=1.0 encoding=UTF-8 ? solr sharedLib=lib persistent=true cores adminPath=/admin/cores shareSchema=true defaultCoreName=primary core name=primary instanceDir=conf/primary/ dataDir=../../data/primary/ core name=staging instanceDir=conf/staging/ dataDir=../../data/staging/ /cores /solr -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2412) Multipath hierarchical faceting
[ https://issues.apache.org/jira/browse/SOLR-2412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007411#comment-13007411 ] Toke Eskildsen commented on SOLR-2412: -- The syntax for calling is kept close to SOLR-64 and SOLR-792. The essential commands are {{qt=exprhefacet=true}} to activate faceting, {{efacet.hierarchical=trueefacet.field=mypath}} for hierarchical. Sorting is controlled with {{efacet.sort=count|index|locale}}. If locale is chosen, the locale is selected with {{efacet.sort.locale=da}}. The result set is limited with {{efacet.hierarchical.levels=99}} and {{efacet.limit=100}} to control the maximum depth and the maximum number of entries at each level. Example: {code} http://localhost:8983/solr/select/?q=*:*rows=0fl=idindent=0nqt=exprhefacet=trueefacet.field=path_ssefacet.hierarchical=trueefacet.hierarchical.levels=99efacet.limit=10 {code} {code} ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime204/int /lst result name=response numFound=100 start=0 doc str name=id1/str /doc /result lst name=efacet_counts lst name=efacet_fields lst name=path_ss str name=fieldpath_ss/str lst name=paths long name=recursivecount100/long long name=potentialtags100/long long name=totaltags101/long long name=count101/long int name=level0/int lst name=sub lst name=L0_T1 int name=count1/int lst name=sub long name=recursivecount9901/long long name=potentialtags9901/long long name=totaltags103/long long name=count103/long int name=level1/int lst name=sub lst name=L1_T1 int name=count1/int lst name=sub long name=recursivecount97/long long name=potentialtags97/long long name=totaltags97/long long name=count97/long int name=level2/int lst name=sub lst name=L2_T1 int name=count1/int /lst ... {code} I'm currently doing some performance (memory and speed) comparisons of SOLR-64, SOLR-792 and SOLR-2412, which will be added later. Multipath hierarchical faceting --- Key: SOLR-2412 URL: https://issues.apache.org/jira/browse/SOLR-2412 Project: Solr Issue Type: New Feature Components: SearchComponents - other Affects Versions: 4.0 Environment: Fast IO when huge hierarchies are used Reporter: Toke Eskildsen Labels: contrib, patch Attachments: SOLR-2412.patch Hierarchical faceting with slow startup, low memory overhead and fast response. Distinguishing features as compared to SOLR-64 and SOLR-792 are * Multiple paths per document * Query-time analysis of the facet-field; no special requirements for indexing besides retaining separator characters in the terms used for faceting * Optional custom sorting of tag values * Recursive counting of references to tags at all levels of the output This is a shell around LUCENE-2369, making it work with the Solr API. The underlying principle is to reference terms by their ordinals and create an index wide documents to tags map, augmented with a compressed representation of hierarchical levels. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Facing Problem in making query for File Based Spell Checker
Hello Guys, I am facing problem in making query for file based spell checker. Following is the class lst name=spellchecker str name=classnamesolr.FileBasedSpellChecker/str str name=namefile/str str name=sourceLocationspellings.txt/str str name=characterEncodingUTF-8/str str name=spellcheckIndexDir./spellcheckerFile/str /lst I am not able to access spellings.txt File inspite of doing all the configuration available on Solr Website. Please help me. Regards Saurabh Srivastava - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2430) Swapping cores with persistent switched on should save swapped core to defaultCoreName
[ https://issues.apache.org/jira/browse/SOLR-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007446#comment-13007446 ] Mark Miller commented on SOLR-2430: --- How about calling persist after call swap? Swapping cores with persistent switched on should save swapped core to defaultCoreName -- Key: SOLR-2430 URL: https://issues.apache.org/jira/browse/SOLR-2430 Project: Solr Issue Type: Bug Components: multicore Affects Versions: 4.0 Environment: CentOS Reporter: bidorbuy Labels: core, multicore Running on the latest trunk version and configured multi-cores with persistent turned on and set a default-core. When swapping cores I would have expected default behavior to be that the swapped core name would be persisted as the new defaultCoreName. i.e. if switching from primary to staging, the defaultCoreName should be written to staging. When swapping out cores (i.e. from primary to staging) and then restarting Jetty, Solr falls back to the current configured default-core (=primary) instead of the previously swapped one (=staging). If this is intended, can perhaps the swap command be extended to force rewritting Solr.xml Current config file: ?xml version=1.0 encoding=UTF-8 ? solr sharedLib=lib persistent=true cores adminPath=/admin/cores shareSchema=true defaultCoreName=primary core name=primary instanceDir=conf/primary/ dataDir=../../data/primary/ core name=staging instanceDir=conf/staging/ dataDir=../../data/staging/ /cores /solr -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2970) SpecialOperations.isFinite can have TERRIBLE TERRIBLE runtime in certain situations
SpecialOperations.isFinite can have TERRIBLE TERRIBLE runtime in certain situations --- Key: LUCENE-2970 URL: https://issues.apache.org/jira/browse/LUCENE-2970 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Robert Muir Assignee: Robert Muir Fix For: 4.0 in an application of mine, i experienced some very slow query times with finite automata (all the DFAs are acyclic) It turned out, the slowdown is some terrible runtime in SpecialOperations.isFinite -- this is used to determine if the DFA is acyclic or not. (in this case I am talking about even up to minutes of cpu). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2970) SpecialOperations.isFinite can have TERRIBLE TERRIBLE runtime in certain situations
[ https://issues.apache.org/jira/browse/LUCENE-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2970: Attachment: LUCENE-2970.patch Attached is a patch: imagine a regexp with lots of optionals e.g. [abcd]e?f?[gh]a?b? ... In this case the code is not linear in number of states... if we are at state A, and it has a transition to B, we determine that B is finite, then later if we are at C and it leads to B too, we need not determine if B is finite again, as we already did so. So, I keep visited for this. Additionally I changed it to use a Bitset instead of a HashSet, which helps the speed (but just a constant-time speedup). I took the old code, dumped it into AutomatonTestUtil as isFiniteSimple and the test just generates random automata and compares this versus the new implementation. I'd appreciate any reviews/suggestions any automaton-hackers want to give here. SpecialOperations.isFinite can have TERRIBLE TERRIBLE runtime in certain situations --- Key: LUCENE-2970 URL: https://issues.apache.org/jira/browse/LUCENE-2970 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Robert Muir Assignee: Robert Muir Fix For: 4.0 Attachments: LUCENE-2970.patch in an application of mine, i experienced some very slow query times with finite automata (all the DFAs are acyclic) It turned out, the slowdown is some terrible runtime in SpecialOperations.isFinite -- this is used to determine if the DFA is acyclic or not. (in this case I am talking about even up to minutes of cpu). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2430) Swapping cores with persistent switched on should save swapped core to defaultCoreName
[ https://issues.apache.org/jira/browse/SOLR-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007457#comment-13007457 ] bidorbuy commented on SOLR-2430: I don't think this is necessary as solr.xml has persistent=true set. Before the swap the admin interface shows: cwd=/home/prodza/jetty SolrHome=/home/prodza/solr/conf/primary/ and the solr.xml looks like this: -rw-rw-r-- 1 prodza prodza 348 Mar 11 22:19 solr.xml [prodza@localhost solr]$ cat solr.xml ?xml version=1.0 encoding=UTF-8 ? solr sharedLib=lib persistent=true cores adminPath=/admin/cores shareSchema=true defaultCoreName=primary core name=primary instanceDir=conf/primary/ dataDir=../../data/primary/ core name=staging instanceDir=conf/staging/ dataDir=../../data/staging/ /cores /solr After the swap (from primary to staging) via: http://MYHOST:8983/solr/admin/cores?action=SWAPcore=primaryother=staging the admin-interface shows: cwd=/home/prodza/jetty SolrHome=/home/prodza/solr/conf/staging/ The solr.xml has been updated (see filestamp): -rw-rw-r-- 1 prodza prodza 348 Mar 11 22:26 solr.xml [prodza@localhost solr]$ cat solr.xml ?xml version=1.0 encoding=UTF-8 ? solr sharedLib=lib persistent=true cores adminPath=/admin/cores shareSchema=true defaultCoreName=primary core name=primary instanceDir=conf/staging/ dataDir=../../data/staging/ core name=staging instanceDir=conf/primary/ dataDir=../../data/primary/ /cores /solr And the solr-log shows: 2011-03-11 22:26:11,421 INFO [solr.core.CoreContainer] [qtp2026549-22] : swaped: with staging 2011-03-11 22:26:11,421 INFO [solr.core.CoreContainer] [qtp2026549-22] : Persisting cores config to /home/prodza/solr/solr.xml Swapping cores with persistent switched on should save swapped core to defaultCoreName -- Key: SOLR-2430 URL: https://issues.apache.org/jira/browse/SOLR-2430 Project: Solr Issue Type: Bug Components: multicore Affects Versions: 4.0 Environment: CentOS Reporter: bidorbuy Labels: core, multicore Running on the latest trunk version and configured multi-cores with persistent turned on and set a default-core. When swapping cores I would have expected default behavior to be that the swapped core name would be persisted as the new defaultCoreName. i.e. if switching from primary to staging, the defaultCoreName should be written to staging. When swapping out cores (i.e. from primary to staging) and then restarting Jetty, Solr falls back to the current configured default-core (=primary) instead of the previously swapped one (=staging). If this is intended, can perhaps the swap command be extended to force rewritting Solr.xml Current config file: ?xml version=1.0 encoding=UTF-8 ? solr sharedLib=lib persistent=true cores adminPath=/admin/cores shareSchema=true defaultCoreName=primary core name=primary instanceDir=conf/primary/ dataDir=../../data/primary/ core name=staging instanceDir=conf/staging/ dataDir=../../data/staging/ /cores /solr -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter
[ https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007482#comment-13007482 ] Steven Rowe commented on LUCENE-2960: - {quote} bq. How about an IWC base class, extended by IWCinit and IWClive. IWCinit has setters for everything, and IW.getConfig() returns IWClive, which has no setters for things you can't set on the fly. I tried to implement this, but couldn't figure out a way to avoid code and javadoc duplication and/or separation for the live setters, which need to be on both the init and live versions. {quote} An annotation processor that looks for @Live annotations on setters, then generates source for a LiveIWC class, an instance of which would be returned by IW.getConfig(), would solve the duplication/separation problem. No extension required: LiveIWC could forward all getters and the live setters to a cloned IWC. Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter -- Key: LUCENE-2960 URL: https://issues.apache.org/jira/browse/LUCENE-2960 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Shay Banon Priority: Blocker Fix For: 3.1, 4.0 Attachments: LUCENE-2960.patch In 3.1 the ability to setRAMBufferSizeMB is deprecated, and removed in trunk. It would be great to be able to control that on a live IndexWriter. Other possible two methods that would be great to bring back are setTermIndexInterval and setReaderTermsIndexDivisor. Most of the other setters can actually be set on the MergePolicy itself, so no need for setters for those (I think). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2970) SpecialOperations.isFinite can have TERRIBLE TERRIBLE runtime in certain situations
[ https://issues.apache.org/jira/browse/LUCENE-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007488#comment-13007488 ] Michael McCandless commented on LUCENE-2970: Patch looks correct to me! The algo you impl'd is the same one described in Cormen, Leiserson, Rivest Algorithms book, as a side effect of doing a depth-first walk through the DFA. Their description of DFS colors the nodes -- white is unvisited, black is visited, gray is being visited (ie on my current path). A DFA then has a cycle if every you recurse and find a gray node. In your patch, the combination of path and visited maps to these colors, and you detect a cycle when path is set and visited is not. Maybe rename the test-only isFiniteSimple to isFiniteSLOW or something? Does the new random test case tend not to hit the super-slow cases...? SpecialOperations.isFinite can have TERRIBLE TERRIBLE runtime in certain situations --- Key: LUCENE-2970 URL: https://issues.apache.org/jira/browse/LUCENE-2970 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Robert Muir Assignee: Robert Muir Fix For: 4.0 Attachments: LUCENE-2970.patch in an application of mine, i experienced some very slow query times with finite automata (all the DFAs are acyclic) It turned out, the slowdown is some terrible runtime in SpecialOperations.isFinite -- this is used to determine if the DFA is acyclic or not. (in this case I am talking about even up to minutes of cpu). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2403) Problem with facet.sort=lex, shards, and facet.mincount
[ https://issues.apache.org/jira/browse/SOLR-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007497#comment-13007497 ] Toke Eskildsen commented on SOLR-2403: -- Dividing by shard count is fairly risky. An example could be the shards {code} Shard 1: A(9) B(6) C(10) D(8) Shard 2: A(4) B(5) C(4) D(3) {code} where the request of the top-3 elements with mincount=5 from each shard would give the merged result {code} B(11) C(10) {code} where the correct result would be {code} A(13) B(11) C(14) D(11) {code} The problem with using mincount=1 for each shard-call is of course that the single shard result sets needs to be humongous in order to ensure that the correct values are returned, when the field contains many value with low count and few values with high count. With shards like {code} Shard 1: A(1) B(1) C(1) D(1) E(1) F(9) G(1) H(1) Shard 2: A(1) B(1) C(1) D(1) E(1) F(1) G(1) H(10) {code} and a request for mincount=10, all terms must be returned from both shards in order to get the result {code} F(10) H(11) {code} As you, Yonik, point out, a variant of the problem exists when sorting on count. However, for count it is mitigated by the fact that the results from the individual shards are sorted by the selecting key (count). This means that the chance of missing or miscounting tags is low and can be lowered further by relatively little over-requesting. With lexical sorting, the selecting key (count again) is independent of the sorting key. Over-requesting helps, but only linear to the fraction of the full result-set from each shard that is requested. Furthermore, the need for over-requesting grows with the number of shards as the overlapping hills can be smaller while still summing up to mincount. I do not have any real solution for the problem. One minor improvement would be a collector that kept collecting terms with a mincount=y until limit=n or the number of collected terms with mincount=x was equal to m, where x is the original mincount and y is dependent on the number of shards. This would at least stop the collection process when the result set was guaranteed to contain enough values above the given threshold. It would work well with spikes but poorly with hills just below mincount x and it would still not guarantee correctness of the sums of the counts, only correctness of the terms. Problem with facet.sort=lex, shards, and facet.mincount --- Key: SOLR-2403 URL: https://issues.apache.org/jira/browse/SOLR-2403 Project: Solr Issue Type: Bug Components: search Affects Versions: 4.0 Environment: RHEL5, Ubuntu 10.04 Reporter: Peter Cline I tested this on a recent trunk snapshot (2/25), haven't verified with 3.1 or 1.4.1. I can if necessary and update. Solr is not returning the proper number of facet values when sorting alphabetically, using distributed search, and using a facet.mincount that excludes some of the values in the first facet.limit values. Easiest explained by example. Sorting alphabetically, the first 20 values for my subject_facet field have few documents. 19 facet values have only 1 document associated, and 1 has 2 documents. There are plenty after that have more than 2. {code} http://localhost:8082/solr/select?q=*:*facet=truefacet.field=subject_facetfacet.limit=20facet.sort=lexfacet.mincount=2 {code} comes back with the expected 20 facet values with = 2 documents associated. If I add a shards parameter that points back to itself, the result is different. {code} http://localhost:8082/solr/select?q=*:*facet=truefacet.field=subject_facetfacet.limit=20facet.sort=lexfacet.mincount=2shards=localhost:8082/solr {code} comes back with only 1 facet value: the single value in the first 20 that had more than 1 document. It appears to me that mincount is ignored when doing the original query to the shards, then applied afterwards. Let me know if you need any more info. Thanks, Peter -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2403) Problem with facet.sort=lex, shards, and facet.mincount
[ https://issues.apache.org/jira/browse/SOLR-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007508#comment-13007508 ] Yonik Seeley commented on SOLR-2403: bq. Dividing by shard count is fairly risky. Actually, it seems like it should help? (when mincount is relatively high at least). Let's take your example of facet.mincount=10, facet.limit=2, facet.sort=index {code} Shard 1: A(1) B(1) C(1) D(1) E(1) F(9) G(1) H(1) Shard 2: A(1) B(1) C(1) D(1) E(1) F(1) G(1) H(10) {code} mincount / nShards = 5, so the shard requests sent will be along the lines of facet.mincount=5, facet.limit=5, facet.sort=index (some over-requesting) and we will get back F(9), H(10) The second phase (facet refinement... to true up counts) will retrieve counts from each shard for constraints in the list that it didn't return the first time. So shard1 will be asked about H, and shard2 will be asked about F. The final response will be F(10),H(11) bq. Over-requesting helps, but only linear to the fraction of the full result-set from each shard that is requested. Yes, I think you're correct that over-requesting is less useful for sort=index than sort=count. Luckily, we can fix the mincount=1 problem and get exact answers for that case, which is the most important case. I think mincount 1 is relatively rare. Problem with facet.sort=lex, shards, and facet.mincount --- Key: SOLR-2403 URL: https://issues.apache.org/jira/browse/SOLR-2403 Project: Solr Issue Type: Bug Components: search Affects Versions: 4.0 Environment: RHEL5, Ubuntu 10.04 Reporter: Peter Cline I tested this on a recent trunk snapshot (2/25), haven't verified with 3.1 or 1.4.1. I can if necessary and update. Solr is not returning the proper number of facet values when sorting alphabetically, using distributed search, and using a facet.mincount that excludes some of the values in the first facet.limit values. Easiest explained by example. Sorting alphabetically, the first 20 values for my subject_facet field have few documents. 19 facet values have only 1 document associated, and 1 has 2 documents. There are plenty after that have more than 2. {code} http://localhost:8082/solr/select?q=*:*facet=truefacet.field=subject_facetfacet.limit=20facet.sort=lexfacet.mincount=2 {code} comes back with the expected 20 facet values with = 2 documents associated. If I add a shards parameter that points back to itself, the result is different. {code} http://localhost:8082/solr/select?q=*:*facet=truefacet.field=subject_facetfacet.limit=20facet.sort=lexfacet.mincount=2shards=localhost:8082/solr {code} comes back with only 1 facet value: the single value in the first 20 that had more than 1 document. It appears to me that mincount is ignored when doing the original query to the shards, then applied afterwards. Let me know if you need any more info. Thanks, Peter -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2970) SpecialOperations.isFinite can have TERRIBLE TERRIBLE runtime in certain situations
[ https://issues.apache.org/jira/browse/LUCENE-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007511#comment-13007511 ] Robert Muir commented on LUCENE-2970: - bq. A DFA then has a cycle if every you recurse and find a gray node well it seems it might work for an NFA too? Though i'm not sure how great of NFAs AutomatonTestUtil.randomAutomaton generates? if all else fails we can det as a side effect (this won't hurt lucene), but I'd like to know for sure, and to send the patch upstream. {quote} Maybe rename the test-only isFiniteSimple to isFiniteSLOW or something? Does the new random test case tend not to hit the super-slow cases...? {quote} The test definitely got faster, but maybe the type of DFAs i generate are not represented fairly by the random generator? In other words they are worst-case for the old method, but they are reasonable as far as queries, finite and contained as far as the number of terms they accept. SpecialOperations.isFinite can have TERRIBLE TERRIBLE runtime in certain situations --- Key: LUCENE-2970 URL: https://issues.apache.org/jira/browse/LUCENE-2970 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Robert Muir Assignee: Robert Muir Fix For: 4.0 Attachments: LUCENE-2970.patch in an application of mine, i experienced some very slow query times with finite automata (all the DFAs are acyclic) It turned out, the slowdown is some terrible runtime in SpecialOperations.isFinite -- this is used to determine if the DFA is acyclic or not. (in this case I am talking about even up to minutes of cpu). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2403) Problem with facet.sort=lex, shards, and facet.mincount
[ https://issues.apache.org/jira/browse/SOLR-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007513#comment-13007513 ] Toke Eskildsen commented on SOLR-2403: -- My first example was hills, while the second was spikes, where I agree that the divide-mincount-by-shard# or something similar works well. As it comes down to distribution of counts vs. mincount, we seem to be left with the unsatisfying it depends, but avoid using mincounts around the average count-answer. I forgot about the refinement phase. That would ensure that my suggestion of a collector with two separate mincounts would return the correct result for counts as well as terms, as long as it did not exceeded the given limits. Alas, it still only helps somewhat and might not be worth the hassle. Problem with facet.sort=lex, shards, and facet.mincount --- Key: SOLR-2403 URL: https://issues.apache.org/jira/browse/SOLR-2403 Project: Solr Issue Type: Bug Components: search Affects Versions: 4.0 Environment: RHEL5, Ubuntu 10.04 Reporter: Peter Cline I tested this on a recent trunk snapshot (2/25), haven't verified with 3.1 or 1.4.1. I can if necessary and update. Solr is not returning the proper number of facet values when sorting alphabetically, using distributed search, and using a facet.mincount that excludes some of the values in the first facet.limit values. Easiest explained by example. Sorting alphabetically, the first 20 values for my subject_facet field have few documents. 19 facet values have only 1 document associated, and 1 has 2 documents. There are plenty after that have more than 2. {code} http://localhost:8082/solr/select?q=*:*facet=truefacet.field=subject_facetfacet.limit=20facet.sort=lexfacet.mincount=2 {code} comes back with the expected 20 facet values with = 2 documents associated. If I add a shards parameter that points back to itself, the result is different. {code} http://localhost:8082/solr/select?q=*:*facet=truefacet.field=subject_facetfacet.limit=20facet.sort=lexfacet.mincount=2shards=localhost:8082/solr {code} comes back with only 1 facet value: the single value in the first 20 that had more than 1 document. It appears to me that mincount is ignored when doing the original query to the shards, then applied afterwards. Let me know if you need any more info. Thanks, Peter -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2970) SpecialOperations.isFinite can have TERRIBLE TERRIBLE runtime in certain situations
[ https://issues.apache.org/jira/browse/LUCENE-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007517#comment-13007517 ] Michael McCandless commented on LUCENE-2970: bq. well it seems it might work for an NFA too? Sorry, yes -- the algo doesn't care if it's N or D. It works for both. SpecialOperations.isFinite can have TERRIBLE TERRIBLE runtime in certain situations --- Key: LUCENE-2970 URL: https://issues.apache.org/jira/browse/LUCENE-2970 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Robert Muir Assignee: Robert Muir Fix For: 4.0 Attachments: LUCENE-2970.patch in an application of mine, i experienced some very slow query times with finite automata (all the DFAs are acyclic) It turned out, the slowdown is some terrible runtime in SpecialOperations.isFinite -- this is used to determine if the DFA is acyclic or not. (in this case I am talking about even up to minutes of cpu). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1499) SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via SolrJ
[ https://issues.apache.org/jira/browse/SOLR-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007516#comment-13007516 ] Ahmet Arslan commented on SOLR-1499: Hi Lance, I setup patch to latest trunk. It required some change though. I pointed out a solr URL (version 1.4.0) to upgrade from 1.4.0 to trunk. I received : Caused by: java.lang.RuntimeException: Invalid version (expected 2, but 1) or the data in not in 'javabin' format at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:99) at org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:41) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:478) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:245) What can be a work around to overcome this? SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via SolrJ - Key: SOLR-1499 URL: https://issues.apache.org/jira/browse/SOLR-1499 Project: Solr Issue Type: New Feature Components: contrib - DataImportHandler Reporter: Lance Norskog Assignee: Erik Hatcher Fix For: Next Attachments: SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch The SolrEntityProcessor queries an external Solr instance. The Solr documents returned are unpacked and emitted as DIH fields. The SolrEntityProcessor uses the following attributes: * solr='http://localhost:8983/solr/sms' ** This gives the URL of the target Solr instance. *** Note: the connection to the target Solr uses the binary SolrJ format. * query='Jeffersonsort=id+asc' ** This gives the base query string use with Solr. It can include any standard Solr request parameter. This attribute is processed under the variable resolution rules and can be driven in an inner stage of the indexing pipeline. * rows='10' ** This gives the number of rows to fetch per request.. ** The SolrEntityProcessor always fetches every document that matches the request.. * fields='id,tag' ** This selects the fields to be returned from the Solr request. ** These must also be declared as field elements. ** As with all fields, template processors can be used to alter the contents to be passed downwards. * timeout='30' ** This limits the query to 5 seconds. This can be used as a fail-safe to prevent the indexing session from freezing up. By default the timeout is 5 minutes. Limitations: * Solr errors are not handled correctly. * Loop control constructs have not been tested. * Multi-valued returned fields have not been tested. The unit tests give examples of how to use it as the root entity and an inner entity. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2970) SpecialOperations.isFinite can have TERRIBLE TERRIBLE runtime in certain situations
[ https://issues.apache.org/jira/browse/LUCENE-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007524#comment-13007524 ] Robert Muir commented on LUCENE-2970: - Ok, i feel better now. I think i have an explanation why the test doesn't hang. I think its because the automata we generate are pretty damn small (might are significantly larger). I think for our testing this is just fine, and actually desirable, as it helps debugging. The only largeish automata lucene tests through this stuff are for levenshtein, and we supply 'true' here (since we know its finite) and avoid this method entirely... and even those are special in that they always have the same general shape SpecialOperations.isFinite can have TERRIBLE TERRIBLE runtime in certain situations --- Key: LUCENE-2970 URL: https://issues.apache.org/jira/browse/LUCENE-2970 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Robert Muir Assignee: Robert Muir Fix For: 4.0 Attachments: LUCENE-2970.patch in an application of mine, i experienced some very slow query times with finite automata (all the DFAs are acyclic) It turned out, the slowdown is some terrible runtime in SpecialOperations.isFinite -- this is used to determine if the DFA is acyclic or not. (in this case I am talking about even up to minutes of cpu). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter
[ https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007528#comment-13007528 ] Michael McCandless commented on LUCENE-2960: {quote} bq. Oh yeah. But then we'd clone the full IWC on every set... this seems like overkill in the name of purity. So what? What exactly is overkill? Few wasted bytes and CPU ns for an object that's created a couple of times during application lifetime? There are also builders, which are very similar to what Steven is proposing. {quote} I don't like that this'd be an O(N*M) cost API when you use it. Sure, N and M are tiny, and you use this API very rarely, but I still don't like it ;) In addition, because this is all in the name of purity which as far as I can see has no real value besides purity. It's... incestuous. And, I'm a pragmatist, I guess. {quote} An annotation processor that looks for @Live annotations on setters, then generates source for a LiveIWC class, an instance of which would be returned by IW.getConfig(), would solve the duplication/separation problem. No extension required: LiveIWC could forward all getters and the live setters to a cloned IWC. {quote} I think this is overkill? (Ie to have @Live compile to LiveIWC vs InitIWC). Though, @Live would be nice for jdocs? bq. You win the fact that this is such an expert thing, and it should not confuse 99% of users who won't need to change these settings in a live way. Right -- simple things should be simple and complex things should be possible. The current patch achieves this -- the 99% of simple users that just want to config IW and create it find all of the config in one place. The 1% complex cases (need to change live settings) are able to do so, but must read the jdocs for each setter to verify it's supported. The API should be designed around the simple users not the complex ones, as the current patch does. So... I think the current patch is ready to commit (except, I'll remove the /html tag for infoStream defaultInfoStream). Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter -- Key: LUCENE-2960 URL: https://issues.apache.org/jira/browse/LUCENE-2960 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Shay Banon Priority: Blocker Fix For: 3.1, 4.0 Attachments: LUCENE-2960.patch In 3.1 the ability to setRAMBufferSizeMB is deprecated, and removed in trunk. It would be great to be able to control that on a live IndexWriter. Other possible two methods that would be great to bring back are setTermIndexInterval and setReaderTermsIndexDivisor. Most of the other setters can actually be set on the MergePolicy itself, so no need for setters for those (I think). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: FieldType API change proposal -- SOLR-2423
any concerns with this proposal? If not, i would like to commit soon. After 3.1 is released, i would merge with 3.x branch and add a deprecation. On Mon, Mar 14, 2011 at 12:57 PM, Ryan McKinley ryan...@gmail.com wrote: the default implementation would just use toString() For things that could use the type directly (Date/Numbers) they check instacneof. This is actually identical to what currently happens in DocumentBuilder, but would happen in the FieldType and would not check everything if it is: 1. instaceof BinaryField 2. instanceof Date ryan On Mon, Mar 14, 2011 at 12:52 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Mon, Mar 14, 2011 at 12:45 PM, Ryan McKinley ryan...@gmail.com wrote: Any opinions on this? I've been focused on getting this 3.1 release out (reviewing/fixing docs, packaging, etc). I'm not sure about Object... does that mean most FieldTypes would be doing instanceof checks? -Yonik http://lucidimagination.com thanks ryan On Sat, Mar 12, 2011 at 2:29 AM, Ryan McKinley ryan...@gmail.com wrote: I think FieldType should take an Object input rather then String -- this gives FieldTypes the option of using (and reusing) explicit types in addition to String. For embedded apps that fill SolrInputDocuments with real objects, the fields can use objects directly -- this means that Date does not have to get converted to a String and then back to a Date. This is a major API change, but I think the value is worth the trouble. Thoughts? ryan - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter
[ https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007537#comment-13007537 ] Mark Miller commented on LUCENE-2960: - {quote} The current patch achieves this – the 99% of simple users that just want to config IW and create it find all of the config in one place. The 1% complex cases (need to change live settings) are able to do so, but must read the jdocs for each setter to verify it's supported. {quote} The proposed alternatives sound just as good as this? In the proposed compromises, the 99% of simple users will see have one place to config IW as well for the avg 'set up front' use case. Perhaps the complex users could also have an API with a better pattern and it doesn't have to be either or as you seem to lead... An IWC that is 'partially' live and can be changed externally after passing to the IW is just an inferior pattern plain and simple, and doesn't gain you much. {quote} The API should be designed around the simple users not the complex ones, as the current patch does. {quote} This really depends. If the simple users can be satisfied, and the API can also be decent for complex users, win/win. I guess I would place my bets that there will not be a ton of deprecations loops of settings wanting to be live. Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter -- Key: LUCENE-2960 URL: https://issues.apache.org/jira/browse/LUCENE-2960 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Shay Banon Priority: Blocker Fix For: 3.1, 4.0 Attachments: LUCENE-2960.patch In 3.1 the ability to setRAMBufferSizeMB is deprecated, and removed in trunk. It would be great to be able to control that on a live IndexWriter. Other possible two methods that would be great to bring back are setTermIndexInterval and setReaderTermsIndexDivisor. Most of the other setters can actually be set on the MergePolicy itself, so no need for setters for those (I think). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter
[ https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007543#comment-13007543 ] Mark Miller commented on LUCENE-2960: - Though don't take that I don't agree as a hold up to finishing 3.1. Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter -- Key: LUCENE-2960 URL: https://issues.apache.org/jira/browse/LUCENE-2960 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Shay Banon Priority: Blocker Fix For: 3.1, 4.0 Attachments: LUCENE-2960.patch In 3.1 the ability to setRAMBufferSizeMB is deprecated, and removed in trunk. It would be great to be able to control that on a live IndexWriter. Other possible two methods that would be great to bring back are setTermIndexInterval and setReaderTermsIndexDivisor. Most of the other setters can actually be set on the MergePolicy itself, so no need for setters for those (I think). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-2970) SpecialOperations.isFinite can have TERRIBLE TERRIBLE runtime in certain situations
[ https://issues.apache.org/jira/browse/LUCENE-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-2970. - Resolution: Fixed Committed revision 1082200. Thanks for the review Mike! SpecialOperations.isFinite can have TERRIBLE TERRIBLE runtime in certain situations --- Key: LUCENE-2970 URL: https://issues.apache.org/jira/browse/LUCENE-2970 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Robert Muir Assignee: Robert Muir Fix For: 4.0 Attachments: LUCENE-2970.patch in an application of mine, i experienced some very slow query times with finite automata (all the DFAs are acyclic) It turned out, the slowdown is some terrible runtime in SpecialOperations.isFinite -- this is used to determine if the DFA is acyclic or not. (in this case I am talking about even up to minutes of cpu). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (SOLR-1822) SEVERE: Unable to move index file from: tempfile to: indexfile
[ https://issues.apache.org/jira/browse/SOLR-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller resolved SOLR-1822. --- Resolution: Duplicate Fix Version/s: (was: Next) 4.0 3.1 Assignee: Mark Miller SEVERE: Unable to move index file from: tempfile to: indexfile -- Key: SOLR-1822 URL: https://issues.apache.org/jira/browse/SOLR-1822 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 1.4 Environment: Linux, JDK6,SOLR 1.4 Reporter: wyhw whon Assignee: Mark Miller Priority: Critical Fix For: 3.1, 4.0 Attachments: SnapPuller.patch SOLR index directory remvoed,but do not know what the reasons for this. I add some codes on SnapPuller.java 577 line can reslove this bug. line 576 File indexFileInIndex = new File(indexDir, fname); + if (!indexDir.exists()) indexDir.mkdir(); boolean success = indexFileInTmpDir.renameTo(indexFileInIndex); -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory
[ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007627#comment-13007627 ] Grant Ingersoll commented on SOLR-1725: --- bq. As time passes, the case for moving to Java 6 increases. Solr trunk is on 1.6. Script based UpdateRequestProcessorFactory -- Key: SOLR-1725 URL: https://issues.apache.org/jira/browse/SOLR-1725 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.4 Reporter: Uri Boness Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code. The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}). The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory. Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic. The following variables are define as global variables for each script: * {{req}} - The SolrQueryRequest * {{rsp}}- The SolrQueryResponse * {{logger}} - A logger that can be used for logging purposes in the script -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Licenses files, Notice files and LUCENE-2952
As Robert can no doubt attest, we often scramble to make sure i's are dotted and t's are crossed when it comes to filling out LICENSE.txt and NOTICE.txt right before releases, thereby burdening the RM with way too much work in validating what dependency has which license. Thus, we've been working to resolve this. In prep for the landing of LUCENE-2952 and to make life easier on release managers going forward, we've adopted the following conventions for dealing with licenses: 1. For every dependency (i.e. jar file), there needs to be a corresponding file-LICENSE-LICENSE_TYPE.txt file, as in: foo-2.3.1.jar has the corresponding foo-LICENSE-BSD.txt file (assuming foo is BSD licensed) in the same directory as the jar file. 2. _IF_ the license requires a NOTICE entry, then there must be a file of the name file-NOTICE.txt, as in foo-NOTICE.txt. Failing to meet either one will break the build once L-2952 is committed (which should be soon for trunk and will be backported to 3.2). Consider yourself notified. -Grant - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Solr Cell DataImport Tika handler broken - fails to index Zip file contents
: I had raised a jira for the Data Import handler part with the patch : and the testcase - https://issues.apache.org/jira/browse/SOLR-2332. : The same fix is needed for the Solr Cell as well. : : I can raise a jira and provide the patch for the same, if the patch : seems good enough. Jayendra: I'm not ery familiar with your patch (or tika!) but by all means please open an jira for the bug, even if you are hesitant to work on a patch ... if you mention the issues in the comments for one another, people will see that they are related. -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter
[ https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007671#comment-13007671 ] Steven Rowe commented on LUCENE-2960: - bq. Though don't take that I don't agree as a hold up to finishing 3.1. +1 Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter -- Key: LUCENE-2960 URL: https://issues.apache.org/jira/browse/LUCENE-2960 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Shay Banon Priority: Blocker Fix For: 3.1, 4.0 Attachments: LUCENE-2960.patch In 3.1 the ability to setRAMBufferSizeMB is deprecated, and removed in trunk. It would be great to be able to control that on a live IndexWriter. Other possible two methods that would be great to bring back are setTermIndexInterval and setReaderTermsIndexDivisor. Most of the other setters can actually be set on the MergePolicy itself, so no need for setters for those (I think). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2382) DIH Cache Improvements
[ https://issues.apache.org/jira/browse/SOLR-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer updated SOLR-2382: - Attachment: SOLR-2382.patch Updated patch with 2 fixes for things I missed when porting this from 1.4.1 to Trunk. Also added 1 more unit test. I think this is ready for someone else to evaluate if anyone has the time desire. I do believe this would be a nice addition to the DIH product. DIH Cache Improvements -- Key: SOLR-2382 URL: https://issues.apache.org/jira/browse/SOLR-2382 Project: Solr Issue Type: New Feature Components: contrib - DataImportHandler Reporter: James Dyer Priority: Minor Attachments: SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch Functionality: 1. Provide a pluggable caching framework for DIH so that users can choose a cache implementation that best suits their data and application. 2. Provide a means to temporarily cache a child Entity's data without needing to create a special cached implementation of the Entity Processor (such as CachedSqlEntityProcessor). 3. Provide a means to write the final (root entity) DIH output to a cache rather than to Solr. Then provide a way for a subsequent DIH call to use the cache as an Entity input. Also provide the ability to do delta updates on such persistent caches. 4. Provide the ability to partition data across multiple caches that can then be fed back into DIH and indexed either to varying Solr Shards, or to the same Core in parallel. Use Cases: 1. We needed a flexible scalable way to temporarily cache child-entity data prior to joining to parent entities. - Using SqlEntityProcessor with Child Entities can cause an n+1 select problem. - CachedSqlEntityProcessor only supports an in-memory HashMap as a Caching mechanism and does not scale. - There is no way to cache non-SQL inputs (ex: flat files, xml, etc). 2. We needed the ability to gather data from long-running entities by a process that runs separate from our main indexing process. 3. We wanted the ability to do a delta import of only the entities that changed. - Lucene/Solr requires entire documents to be re-indexed, even if only a few fields changed. - Our data comes from 50+ complex sql queries and/or flat files. - We do not want to incur overhead re-gathering all of this data if only 1 entity's data changed. - Persistent DIH caches solve this problem. 4. We want the ability to index several documents in parallel (using 1.4.1, which did not have the threads parameter). 5. In the future, we may need to use Shards, creating a need to easily partition our source data into Shards. Implementation Details: 1. De-couple EntityProcessorBase from caching. - Created a new interface, DIHCache two implementations: - SortedMapBackedCache - An in-memory cache, used as default with CachedSqlEntityProcessor (now deprecated). - BerkleyBackedCache - A disk-backed cache, dependent on bdb-je, tested with je-4.1.6.jar - NOTE: the existing Lucene Contrib db project uses je-3.3.93.jar. I believe this may be incompatible due to Generic Usage. - NOTE: I did not modify the ant script to automatically get this jar, so to use or evaluate this patch, download bdb-je from http://www.oracle.com/technetwork/database/berkeleydb/downloads/index.html 2. Allow Entity Processors to take a cacheImpl parameter to cause the entity data to be cached (see EntityProcessorBase DIHCacheProperties). 3. Partially De-couple SolrWriter from DocBuilder - Created a new interface DIHWriter, two implementations: - SolrWriter (refactored) - DIHCacheWriter (allows DIH to write ultimately to a Cache). 4. Create a new Entity Processor, DIHCacheProcessor, which reads a persistent Cache as DIH Entity Input. 5. Support a partition parameter with both DIHCacheWriter and DIHCacheProcessor to allow for easy partitioning of source entity data. 6. Change the semantics of entity.destroy() - Previously, it was being called on each iteration of DocBuilder.buildDocument(). - Now it is does one-time cleanup tasks (like closing or deleting a disk-backed cache) once the entity processor is completed. - The only out-of-the-box entity processor that previously implemented destroy() was LineEntitiyProcessor, so this is not a very invasive change. General Notes: We are near completion in converting our search functionality from a legacy search engine to Solr. However, I found that DIH did not support caching to the level of our prior product's data import utility. In order to get our data into Solr, I created these caching enhancements. Because I believe this has broad application, and because we would like this feature
Re: Solr Cell DataImport Tika handler broken - fails to index Zip file contents
Thanks Chris, I have already opened a jira https://issues.apache.org/jira/browse/SOLR-2416 for the issue with the attached patch. Regards, Jayendra On Wed, Mar 16, 2011 at 3:57 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : I had raised a jira for the Data Import handler part with the patch : and the testcase - https://issues.apache.org/jira/browse/SOLR-2332. : The same fix is needed for the Solr Cell as well. : : I can raise a jira and provide the patch for the same, if the patch : seems good enough. Jayendra: I'm not ery familiar with your patch (or tika!) but by all means please open an jira for the bug, even if you are hesitant to work on a patch ... if you mention the issues in the comments for one another, people will see that they are related. -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2415) Change XMLWriter version parameter to wt.xml.version
[ https://issues.apache.org/jira/browse/SOLR-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007688#comment-13007688 ] Hoss Man commented on SOLR-2415: i'm with ryan. if it had always been respversion or something i wouldn't mind, and would encourage other response writers to use it for their own versioning purposes (ie: the json writer could have change the default for json.nl based on version). but version is just so damn generic, it's really hard to have any idea what it's taking about. (even xml.version is ambiguious: is it the format coming in, or going out? I'd suggest either adding wt.version or wt.xml.version (depending on how people feel about the idea that it should/can be reused by all response writers in their own way) to 3.x with a fallback to using version if it's specified and mark version deprecated ... then remove it completley at a much later date (maybe 4.0, depends on when it comes out and how many 3.x releases come first) Change XMLWriter version parameter to wt.xml.version -- Key: SOLR-2415 URL: https://issues.apache.org/jira/browse/SOLR-2415 Project: Solr Issue Type: Improvement Reporter: Ryan McKinley Priority: Trivial Fix For: 4.0 The XMLWriter has a parameter called 'version'. This controls some specifics about how the XMLWriter works. Using the parameter name 'version' made sense back when the XMLWriter was the only option, but with all the various writers and different places where 'version' makes sense, I think we should change this parameter name to wt.xml.version so that it specifically refers to the XMLWriter. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (SOLR-2399) Solr Admin Interface, reworked
[ https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007691#comment-13007691 ] Stefan Matheis (steffkes) edited comment on SOLR-2399 at 3/16/11 9:08 PM: -- A bigger step compared with those one we had yet, we are talking about the *Schema-Browser* : * [The current one (screenshot)|http://files.mathe.is/solr-admin/06_schema-browser_cur.png] needs much space (especially for the navigation) * [The new one (screenshot)|http://files.mathe.is/solr-admin/06_schema-browser.png] tries to put the focus more on details information * [The new Field/Dynamic Field/Type-Selection (screenshot)|http://files.mathe.is/solr-admin/06_schema-browser_nav.png] is diplayed in a simple Dropdown, which offers Keyboard-Navigation for Quick-Access * [The list of relations (screenshot)|http://files.mathe.is/solr-admin/06_schema-browser_var.png] depends on the selected F/DF/T Just to say: * CopyField's are also displayed in the area below the Selection, if defined. * Every F/DF/T is clickable, linked with his Detail-Page (Explicit) Questions for the Community: * TopTerms are actually limited to 50, is that enough? Or is there a need to browse _all_ TopTerms? * Analyzers-Detail, hide it for default - with a Toggle-Button (like it is actually)? * Analyzers-Detail, Presentation okay - or needs to much space? * F/DF/T-Selection, actually there is no possibilty to filter (like f.e. in iTunes; additional field, start typing and the list is restricted) - would that help for those of us, that have a lot of fields? was (Author: steffkes): A bigger step compared with those one we had yet, we are talking about the *Schema-Browser* : * [The current one (screenshot)|http://files.mathe.is/solr-admin/06_schema-browser_cur.png] needs much space (especially for the navigation) * [The new one (screenshot)|http://files.mathe.is/solr-admin/06_schema-browser.png] tries to put the focus more on details information * [The new Field/Dynamic Field/Type-Selection (screenshot)|http://files.mathe.is/solr-admin/06_schema-browser_nav.png] is diplay in a simple Dropdown, which offers Keyboard-Navigation for Quick-Access * [The list of relations (screenshot)|http://files.mathe.is/solr-admin/06_schema-browser_var.png] depends on the selected F/DF/T Just to say: * CopyField's are also displayed in the area below the Selection, if defined. * Every F/DF/T is clickable, linked with his Detail-Page (Explicit) Questions for the Community: * TopTerms are actually limited to 50, is that enough? Or is there a need to browse _all_ TopTerms? * Analyzers-Detail, hide it for default - with a Toggle-Button (like it is actually)? * Analyzers-Detail, Presentation okay - or needs to much space? * F/DF/T-Selection, actually there is no possibilty to filter (like f.e. in iTunes; additional field, start typing and the list is restricted) - would that help for those of us, that have a lot of fields? Solr Admin Interface, reworked -- Key: SOLR-2399 URL: https://issues.apache.org/jira/browse/SOLR-2399 Project: Solr Issue Type: Improvement Components: web gui Reporter: Stefan Matheis (steffkes) Priority: Minor *The idea was to create a new, fresh (and hopefully clean) Solr Admin Interface.* [Based on this [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] I've quickly created a Github-Repository (Just for me, to keep track of the changes) » https://github.com/steffkes/solr-admin [This commit shows the differences|https://github.com/steffkes/solr-admin/commit/5f80bb0ea9deb4b94162632912fe63386f869e0d] between old/existing index.jsp and my new one (which is could copy-cut/paste'd from the existing one). Main Action takes place in [js/script.js|https://github.com/steffkes/solr-admin/blob/master/js/script.js] which is actually neither clean nor pretty .. just work-in-progress. Actually it's Work in Progress, so ... give it a try. It's developed with Firefox as Browser, so, for a first impression .. please don't use _things_ like Internet Explorer or so ;o Jan already suggested a bunch of good things, i'm sure there are more ideas over there :) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2399) Solr Admin Interface, reworked
[ https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007691#comment-13007691 ] Stefan Matheis (steffkes) commented on SOLR-2399: - A bigger step compared with those one we had yet, we are talking about the *Schema-Browser* : * [The current one (screenshot)|http://files.mathe.is/solr-admin/06_schema-browser_cur.png] needs much space (especially for the navigation) * [The new one (screenshot)|http://files.mathe.is/solr-admin/06_schema-browser.png] tries to put the focus more on details information * [The new Field/Dynamic Field/Type-Selection (screenshot)|http://files.mathe.is/solr-admin/06_schema-browser_nav.png] is diplay in a simple Dropdown, which offers Keyboard-Navigation for Quick-Access * [The list of relations (screenshot)|http://files.mathe.is/solr-admin/06_schema-browser_var.png] depends on the selected F/DF/T Just to say: * CopyField's are also displayed in the area below the Selection, if defined. * Every F/DF/T is clickable, linked with his Detail-Page (Explicit) Questions for the Community: * TopTerms are actually limited to 50, is that enough? Or is there a need to browse _all_ TopTerms? * Analyzers-Detail, hide it for default - with a Toggle-Button (like it is actually)? * Analyzers-Detail, Presentation okay - or needs to much space? * F/DF/T-Selection, actually there is no possibilty to filter (like f.e. in iTunes; additional field, start typing and the list is restricted) - would that help for those of us, that have a lot of fields? Solr Admin Interface, reworked -- Key: SOLR-2399 URL: https://issues.apache.org/jira/browse/SOLR-2399 Project: Solr Issue Type: Improvement Components: web gui Reporter: Stefan Matheis (steffkes) Priority: Minor *The idea was to create a new, fresh (and hopefully clean) Solr Admin Interface.* [Based on this [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] I've quickly created a Github-Repository (Just for me, to keep track of the changes) » https://github.com/steffkes/solr-admin [This commit shows the differences|https://github.com/steffkes/solr-admin/commit/5f80bb0ea9deb4b94162632912fe63386f869e0d] between old/existing index.jsp and my new one (which is could copy-cut/paste'd from the existing one). Main Action takes place in [js/script.js|https://github.com/steffkes/solr-admin/blob/master/js/script.js] which is actually neither clean nor pretty .. just work-in-progress. Actually it's Work in Progress, so ... give it a try. It's developed with Firefox as Browser, so, for a first impression .. please don't use _things_ like Internet Explorer or so ;o Jan already suggested a bunch of good things, i'm sure there are more ideas over there :) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2415) Change XMLWriter version parameter to wt.xml.version
[ https://issues.apache.org/jira/browse/SOLR-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007695#comment-13007695 ] Yonik Seeley commented on SOLR-2415: On a highly related question, how should we handle the desire to change the faceting format (to make it easier to add metadata like total number of constraints, etc)? version would be one way. facet.format would be another way. Change XMLWriter version parameter to wt.xml.version -- Key: SOLR-2415 URL: https://issues.apache.org/jira/browse/SOLR-2415 Project: Solr Issue Type: Improvement Reporter: Ryan McKinley Priority: Trivial Fix For: 4.0 The XMLWriter has a parameter called 'version'. This controls some specifics about how the XMLWriter works. Using the parameter name 'version' made sense back when the XMLWriter was the only option, but with all the various writers and different places where 'version' makes sense, I think we should change this parameter name to wt.xml.version so that it specifically refers to the XMLWriter. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[Lucene.Net] [jira] Commented: (LUCENENET-399) Port changes from Java Lucene 2.9.3 and 2.9.4 releases
[ https://issues.apache.org/jira/browse/LUCENENET-399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007698#comment-13007698 ] Digy commented on LUCENENET-399: Current status of my local work: * All core files for 2.9.2 - 2.9.4 transition are ported. 16 modified/added test files are still waiting to be fixed under Lucene.Net.Index + Lucene.Net.Store * 12 test cases under Lucene.Net.Index 1 case under Lucene.Net.Util fail (better to see after remaining test files are ported). So, What do you think? Should I commit this huge patch or wait till everything is completed? DIGY. Port changes from Java Lucene 2.9.3 and 2.9.4 releases -- Key: LUCENENET-399 URL: https://issues.apache.org/jira/browse/LUCENENET-399 Project: Lucene.Net Issue Type: Task Components: Lucene.Net Core, Lucene.Net Test Reporter: Troy Howard Assignee: Scott Lombard Fix For: Lucene.Net 2.9.4 Time Spent: 2h Remaining Estimate: 40h Port changes from Java Lucene 2.9.3 and 2.9.4 releases. The Lucene.Net 2.9.4 release will roll up the changes from both of those releases into one. Unit tests should be added or updated to reflect the changes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Resolved: (SOLR-2251) use facet key as override for field name when looking for per field facet options
[ https://issues.apache.org/jira/browse/SOLR-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-2251. Resolution: Duplicate I just realized this is actually a dup of SOLR-1351 use facet key as override for field name when looking for per field facet options --- Key: SOLR-2251 URL: https://issues.apache.org/jira/browse/SOLR-2251 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4.1 Reporter: Tim Priority: Minor The key parameter that is used for aliasing output is very helpful in simplifying the readability of complex facets. However it doesn't seem that this same alias can be used when configuring facets of individual fields. The following example that does not use the key parameter works fine under 1.4.1: rows=0q=*:*+NOT+customers.blocked:1facet=truef.customers_name.facet.mincount=2facet.field=customers_name lst name=customers_name int name=jone2/int /lst The example below also works and does use the key parameter, however note that we're still using the original field name when referring to f.customers_name.facet.mincount: rows=0q=*:*+NOT+customers.blocked:1facet=truef.customers_name.facet.mincount=2facet.field={!key=alt_name}customers_name lst name=customers_name int name=jone2/int /lst The final example below does not work. It uses the alias established by the key parameter to configure the mincount setting for the customers_name field. rows=0q=*:*+NOT+customers.blocked:1facet=truef.alt_name.facet.mincount=2facet.field={!key=alt_name}customers_name lst name=alt_name int name=jone2/int int name=tim1/int int name=sami0/int /lst This is a trivial example. The behavior becomes much more important when talking about facet queries. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1499) SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via SolrJ
[ https://issues.apache.org/jira/browse/SOLR-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007700#comment-13007700 ] Ahmet Arslan commented on SOLR-1499: Eric, Thanks for the pointer. As you said when I use new CommonsHttpSolrServer(new URL(http://solr1.4.0Instance:8080/solr;), null, new XMLResponseParser(), false); I was able to communicate to solr 1.4.0 instance using solrj-trunk. Do you recommend modifying this patch in this manner? Any performance hits? Plus, What do you think about copy-pasting JavaBinCodec.java from source version to destination version and Using a custom BinaryResponseParser that uses that copy-paste class? Seems working for 1.4.0 to trunk. Or should i stick with writing a little script to do it? P.S. I am just trying to use a feature that will be already maintained by solr commnunity. SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via SolrJ - Key: SOLR-1499 URL: https://issues.apache.org/jira/browse/SOLR-1499 Project: Solr Issue Type: New Feature Components: contrib - DataImportHandler Reporter: Lance Norskog Assignee: Erik Hatcher Fix For: Next Attachments: SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch The SolrEntityProcessor queries an external Solr instance. The Solr documents returned are unpacked and emitted as DIH fields. The SolrEntityProcessor uses the following attributes: * solr='http://localhost:8983/solr/sms' ** This gives the URL of the target Solr instance. *** Note: the connection to the target Solr uses the binary SolrJ format. * query='Jeffersonsort=id+asc' ** This gives the base query string use with Solr. It can include any standard Solr request parameter. This attribute is processed under the variable resolution rules and can be driven in an inner stage of the indexing pipeline. * rows='10' ** This gives the number of rows to fetch per request.. ** The SolrEntityProcessor always fetches every document that matches the request.. * fields='id,tag' ** This selects the fields to be returned from the Solr request. ** These must also be declared as field elements. ** As with all fields, template processors can be used to alter the contents to be passed downwards. * timeout='30' ** This limits the query to 5 seconds. This can be used as a fail-safe to prevent the indexing session from freezing up. By default the timeout is 5 minutes. Limitations: * Solr errors are not handled correctly. * Loop control constructs have not been tested. * Multi-valued returned fields have not been tested. The unit tests give examples of how to use it as the root entity and an inner entity. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1823) QueryParser with new features for Lucene 3
[ https://issues.apache.org/jira/browse/LUCENE-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007737#comment-13007737 ] Adriano Crestani commented on LUCENE-1823: -- Hi Robert, I completely agree with your statement, the config API scares me also. I would love to submit a patch for it, but I am working for IBM now, and, as a committer, I need to go through some bureaucratic paperwork before doing any new feature for Lucene and it might still take some time :( I had a better idea, I will propose it to be a GSOC project for this year. This way we can also get one more contributor to contrib QP. QueryParser with new features for Lucene 3 -- Key: LUCENE-1823 URL: https://issues.apache.org/jira/browse/LUCENE-1823 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Reporter: Michael Busch Assignee: Luis Alves Priority: Minor Fix For: 4.0 Attachments: lucene_1823_any_opaque_precedence_fuzzybug_v2.patch, lucene_1823_foo_bug_08_26_2009.patch I'd like to have a new QueryParser implementation in Lucene 3.1, ideally based on the new QP framework in contrib. It should share as much code as possible with the current StandardQueryParser implementation for easy maintainability. Wish list (feel free to extend): 1. *Operator precedence*: Support operator precedence for boolean operators 2. *Opaque terms*: Ability to plugin an external parser for certain syntax extensions, e.g. XML query terms 3. *Improved RangeQuery syntax*: Use more intuitive =, =, = instead of [] and {} 4. *Support for trierange queries*: See LUCENE-1768 5. *Complex phrases*: See LUCENE-1486 6. *ANY operator*: E.g. (a b c d) ANY 3 should match if 3 of the 4 terms occur in the same document 7. *New syntax for Span queries*: I think the surround parser supports this? 8. *Escaped wildcards*: See LUCENE-588 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-trunk - Build # 6021 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/6021/ 1 tests failed. FAILED: org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration Error Message: expected:2 but was:3 Stack Trace: junit.framework.AssertionFailedError: expected:2 but was:3 at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1214) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1146) at org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration(CloudStateUpdateTest.java:208) Build Log (for compile errors): [...truncated 8570 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Solr Config XML DTD's
Hi, this is my first post to the mailing list. I'm working on a commercial implementation of a Solr project and would like to share some of my work, although it's not really much. I wrote a halting DTD for the Solr config file queryElevation.xml and would like to eventually write a DTD for the config file. Who do I need to talk to about reviewing my work and perhaps getting a little help. My DTD works for our internal version of queryElevation.xml, but since the ATTRIB name of the doc/ tag could be anything, I'm not sure how to write a DTD that would validate any valid query elevation file. Anyway, thanks. I put pressure on our company to redo our customer facing search using Solr. It launches soon and I've impressed everyone all the way up to the CEO most of the credit goes to the Solr and Lucene devs for making it so easy on me. Daniel
[jira] Commented: (SOLR-2415) Change XMLWriter version parameter to wt.xml.version
[ https://issues.apache.org/jira/browse/SOLR-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007805#comment-13007805 ] Ryan McKinley commented on SOLR-2415: - I see two approaches to the general problem: 1. each component gets its own version (wt.xml.version, facet.version, hl.version, etc) 2. a single 'version' param that multiple components use. I think option #2 makes more sense, perhaps we should add a getVersion() parameter on SolrQueryRequest and have that used across all components. For facet format (SOLR-2242) this should work, but I also hope that major versions (4.0 etc) can drop old formats since maintaining these for a long time can be a PIA. Change XMLWriter version parameter to wt.xml.version -- Key: SOLR-2415 URL: https://issues.apache.org/jira/browse/SOLR-2415 Project: Solr Issue Type: Improvement Reporter: Ryan McKinley Priority: Trivial Fix For: 4.0 The XMLWriter has a parameter called 'version'. This controls some specifics about how the XMLWriter works. Using the parameter name 'version' made sense back when the XMLWriter was the only option, but with all the various writers and different places where 'version' makes sense, I think we should change this parameter name to wt.xml.version so that it specifically refers to the XMLWriter. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2415) Change XMLWriter version parameter to wt.xml.version
[ https://issues.apache.org/jira/browse/SOLR-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007814#comment-13007814 ] Chris A. Mattmann commented on SOLR-2415: - At the rate of release cycles on this project, I'd seriously recommend against actually specifying versions, and fallbacks, etc., specifically for response writers other than the existing Solr version. Look at how long the existing response writers have hung around in their current format, independent of the version # changes (1.2, 1.3, 1.4, and now 3.1). In all of these cases, you simply could keep docs that say 1.2 is compatible (forwards) with 1.x, etc., and 3.x is compatible (backwards) with 1.x, etc. Change XMLWriter version parameter to wt.xml.version -- Key: SOLR-2415 URL: https://issues.apache.org/jira/browse/SOLR-2415 Project: Solr Issue Type: Improvement Reporter: Ryan McKinley Priority: Trivial Fix For: 4.0 The XMLWriter has a parameter called 'version'. This controls some specifics about how the XMLWriter works. Using the parameter name 'version' made sense back when the XMLWriter was the only option, but with all the various writers and different places where 'version' makes sense, I think we should change this parameter name to wt.xml.version so that it specifically refers to the XMLWriter. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2399) Solr Admin Interface, reworked
[ https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007824#comment-13007824 ] Mark Miller commented on SOLR-2399: --- Hey Stefan, I had seen this issue in passing, but had not yet taken a closer look... Fantastic stuff! I think this is a sorely needed face lift, and your screen shots look like a brilliant upgrade. Really nice to see some effort put into this area of Solr. Solr Admin Interface, reworked -- Key: SOLR-2399 URL: https://issues.apache.org/jira/browse/SOLR-2399 Project: Solr Issue Type: Improvement Components: web gui Reporter: Stefan Matheis (steffkes) Priority: Minor *The idea was to create a new, fresh (and hopefully clean) Solr Admin Interface.* [Based on this [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] I've quickly created a Github-Repository (Just for me, to keep track of the changes) » https://github.com/steffkes/solr-admin [This commit shows the differences|https://github.com/steffkes/solr-admin/commit/5f80bb0ea9deb4b94162632912fe63386f869e0d] between old/existing index.jsp and my new one (which is could copy-cut/paste'd from the existing one). Main Action takes place in [js/script.js|https://github.com/steffkes/solr-admin/blob/master/js/script.js] which is actually neither clean nor pretty .. just work-in-progress. Actually it's Work in Progress, so ... give it a try. It's developed with Firefox as Browser, so, for a first impression .. please don't use _things_ like Internet Explorer or so ;o Jan already suggested a bunch of good things, i'm sure there are more ideas over there :) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2399) Solr Admin Interface, reworked
[ https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-2399: -- Fix Version/s: 4.0 Solr Admin Interface, reworked -- Key: SOLR-2399 URL: https://issues.apache.org/jira/browse/SOLR-2399 Project: Solr Issue Type: Improvement Components: web gui Reporter: Stefan Matheis (steffkes) Priority: Minor Fix For: 4.0 *The idea was to create a new, fresh (and hopefully clean) Solr Admin Interface.* [Based on this [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] I've quickly created a Github-Repository (Just for me, to keep track of the changes) » https://github.com/steffkes/solr-admin [This commit shows the differences|https://github.com/steffkes/solr-admin/commit/5f80bb0ea9deb4b94162632912fe63386f869e0d] between old/existing index.jsp and my new one (which is could copy-cut/paste'd from the existing one). Main Action takes place in [js/script.js|https://github.com/steffkes/solr-admin/blob/master/js/script.js] which is actually neither clean nor pretty .. just work-in-progress. Actually it's Work in Progress, so ... give it a try. It's developed with Firefox as Browser, so, for a first impression .. please don't use _things_ like Internet Explorer or so ;o Jan already suggested a bunch of good things, i'm sure there are more ideas over there :) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Licenses files, Notice files and LUCENE-2952
We do use Apache RAT and it does not do these kinds of license checks. On Mar 16, 2011, at 9:19 PM, Mattmann, Chris A (388J) wrote: Have you guys thought about using Apache RAT [1]? It's not perfect but it implements a lot of license checks, and as far as I know, integrates nicely into Ant and Maven. Cheers, Chris [1] http://incubator.apache.org/rat/ On Mar 16, 2011, at 5:54 PM, Robert Muir wrote: On Wed, Mar 16, 2011 at 3:57 PM, Grant Ingersoll gsing...@apache.org wrote: As Robert can no doubt attest, we often scramble to make sure i's are dotted and t's are crossed when it comes to filling out LICENSE.txt and NOTICE.txt right before releases, thereby burdening the RM with way too much work in validating what dependency has which license. Thus, we've been working to resolve this. In prep for the landing of LUCENE-2952 and to make life easier on release managers going forward, we've adopted the following conventions for dealing with licenses: 1. For every dependency (i.e. jar file), there needs to be a corresponding file-LICENSE-LICENSE_TYPE.txt file, as in: foo-2.3.1.jar has the corresponding foo-LICENSE-BSD.txt file (assuming foo is BSD licensed) in the same directory as the jar file. 2. _IF_ the license requires a NOTICE entry, then there must be a file of the name file-NOTICE.txt, as in foo-NOTICE.txt. Failing to meet either one will break the build once L-2952 is committed (which should be soon for trunk and will be backported to 3.2). Consider yourself notified. +1 I think we can all agree, we want our licensing to be rock-solid and we should strive to raise the standards here for our project. Its actually more important than if our code even compiles. Automated checks go a long way, thank you Grant for working on this, because we have a lot of third-party dependencies and its difficult to verify that everything is in proper order. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - Mark Miller lucidimagination.com Lucene/Solr User Conference May 25-26, San Francisco www.lucenerevolution.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org