Re: [VOTE] Release PyLucene 3.1.0
On Wed, 6 Apr 2011, Bill Janssen wrote: Andi Vajda va...@apache.org wrote: On Wed, 6 Apr 2011, Bill Janssen wrote: Andi Vajda va...@apache.org wrote: Unless I'm missing something here, you've got two options before you break your users: 1. fix your code before you ship it to them Unfortunately, the code is out there for building, and the instructions, also already out there, say, PyLucene 2.4 to 3.X. I should be more careful :-). Given that APIs changed quite a bit between 2.x and 3.0 and that 2.x deprecated APIs are removed from 3.1+ (unless I'm confused about Lucene's deprecation policy (*)), your statement is a bit optimistic. My Python code looks for the differences and handles it. Of course, it can't do that for the future :-). Is there some ABI version # that I should be checking, instead? There are two versions available from the lucene module: import lucene [(v, lucene.__dict__[v]) for v in dir(lucene) if 'VERSION' in v] [('JCC_VERSION', '2.8'), ('VERSION', '3.1.0')] There is also the lucene.Version object. Andi..
Re: [VOTE] Release PyLucene 3.1.0
Andi Vajda va...@apache.org wrote: There are two versions available from the lucene module: import lucene [(v, lucene.__dict__[v]) for v in dir(lucene) if 'VERSION' in v] [('JCC_VERSION', '2.8'), ('VERSION', '3.1.0')] I suppose I could make a list of all the (JCC_VERSION, VERSION) pairs that I've personally verified that the code works with, and raise an error if a user attempts to install UpLib using a PyLucene that isn't on that list... But that seems like a sub-optimal solution :-). Bill
Re: [VOTE] Release PyLucene 3.1.0
On Wed, 6 Apr 2011, Bill Janssen wrote: Andi Vajda va...@apache.org wrote: There are two versions available from the lucene module: import lucene [(v, lucene.__dict__[v]) for v in dir(lucene) if 'VERSION' in v] [('JCC_VERSION', '2.8'), ('VERSION', '3.1.0')] I suppose I could make a list of all the (JCC_VERSION, VERSION) pairs that I've personally verified that the code works with, and raise an error if a user attempts to install UpLib using a PyLucene that isn't on that list... But that seems like a sub-optimal solution :-). Seems like the best solution to me. How can you be sure your code works otherwise ? Andi..
Re: My GSOC proposal
Hey Varun, On Tue, Apr 5, 2011 at 11:07 PM, Michael McCandless luc...@mikemccandless.com wrote: Hi Varun, Those two issues would make a great GSoC! Comments below... +1 On Tue, Apr 5, 2011 at 1:56 PM, Varun Thacker varunthacker1...@gmail.com wrote: I would like to combine two tasks as part of my project namely-Directory createOutput and openInput should take an IOContext (Lucene-2793) and compliment it by Generalize DirectIOLinuxDir to UnixDir (Lucene-2795). The first part of the project is aimed at significantly reducing time taken to search during indexing by adding an IOContext which would store buffer size and have options to bypass the OS’s buffer cache (This is what causes the slowdown in search ) and other hints. Once completed I would move on to Lucene-2795 and generalize the Directory implementation to make a UnixDirectory . So, the first part (LUCENE-2793) should cause no change at all to performance, functionality, etc., because it's merely installing the plumbing (IOContext threaded throughout the low-level store APIs in Lucene) so that higher levels can send important details down to the Directory. We'd fix IndexWriter/IndexReader to fill out this IOContext with the details (merging, flushing, new reader, etc.). There's some fun/freedom here in figuring out just what details should be included in IOContext... (eg: is it low level set buffer size to 4 KB or is it high level I am opening a new near-real-time reader). This first step is a rote cutover, just changing APIs but in no way taking advantage of the new APIs. The 2nd step (LUCENE-2795) would then take advantage of this plumbing, by creating a UnixDir impl that, using JNI (C code), passes advanced flags when opening files, based on the incoming IOContext. The goal is a single UnixDir that has ifdefs so that it's usable across multiple Unices, and eg would use direct IO if the context is merging. If we are ambitious we could rope Windows into the mix, too, and then this would be NativeDir... We can measure success by validating that a big merge while searching does not hurt search performance? (Ie we should be able to reproduce the results from http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html). Thanks for the summary mike! I have spoken to Micheal McCandless and Simon Willnauer about undertaking these tasks. Micheal McCandless has agreed to mentor me . I would love to be able to contribute and learn from Apache Lucene community this summer. Also I would love suggestions on how to make my application proposal stronger. I think either Simon or I can be the official mentor, and then the other one of us (and other Lucene committers) will support/chime in... I will take the official responsibility here once we are there! simon This is an important change for Lucene! Mike - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2455) admin/index.jsp double submit on IE
[ https://issues.apache.org/jira/browse/SOLR-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Chang updated SOLR-2455: Attachment: SOLR-2455.patch Modified both index.jsp and form.jsp to return false upon JS submit. admin/index.jsp double submit on IE --- Key: SOLR-2455 URL: https://issues.apache.org/jira/browse/SOLR-2455 Project: Solr Issue Type: Bug Affects Versions: 3.1 Environment: IE8 Reporter: Jeffrey Chang Priority: Minor Labels: patch Attachments: SOLR-2455.patch, SOLR-2455.patch /admin/index.jsp could issue a double submit on IE causing Jetty to error out. Here are the steps to reproduce on IE8 (only applies to IE8 on occasional basis, really more of an IE8 bug...): 1. Open IE8 2. Browse to http://localhost:8983/solr/admin 3. Submit a query 4. Displayed on Jetty log due to double submit: SEVERE: org.mortbay.jetty.EofException at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791) This can be fixed easily by modifying index.jsp's javascript submit to return false: ... queryForm.submit(); return false; ... I will try to submit a patch for this easy fix, new to all this so please bear with me... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Inquiries on SOLR DEV Contribution (SOLR-2455)
Hi All, I'd like to start small and see how I can contribute to SOLR development. By following http://wiki.apache.org/solr/HowToContribute, I've created a new defect (SOLR-2455) and created a patch for it. Not sure if I've done the right steps - can someone provide me some guidance if I'm on the right track to make some contributions? I'm still confused on how the committers decide which patch to include the fixes into. E.g. for the fixes I contribute, since I modified from Trunk, I'd assume it goes to SOLR 4.0.x? Also, should I modify JIRA csae status to Resolve myself? Thanks, Jeff
[jira] [Commented] (SOLR-2455) admin/index.jsp double submit on IE
[ https://issues.apache.org/jira/browse/SOLR-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016281#comment-13016281 ] Uwe Schindler commented on SOLR-2455: - Hi Jeffrey, thaks for the fix. This is really an issue and has nothing to do with Internet Explorer. The timing of the javascript calls in this browser just make it happen. In general: onclick handlers in javascript *must* return false to prevent the default action. This is true in all browsers. You can try this out with a simple web page link: a href=gohere onclick=window.alert('clicked'); return true;../a. This link will first display the message box and then go to gohere (in all browsers!), versus a href=gohere onclick=window.alert('clicked'); return false;../a will only display the message box. Another fix for this would be to simply remove form.submit() and explicitely return true. admin/index.jsp double submit on IE --- Key: SOLR-2455 URL: https://issues.apache.org/jira/browse/SOLR-2455 Project: Solr Issue Type: Bug Affects Versions: 3.1 Environment: IE8 Reporter: Jeffrey Chang Priority: Minor Labels: patch Attachments: SOLR-2455.patch, SOLR-2455.patch /admin/index.jsp could issue a double submit on IE causing Jetty to error out. Here are the steps to reproduce on IE8 (only applies to IE8 on occasional basis, really more of an IE8 bug...): 1. Open IE8 2. Browse to http://localhost:8983/solr/admin 3. Submit a query 4. Displayed on Jetty log due to double submit: SEVERE: org.mortbay.jetty.EofException at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791) This can be fixed easily by modifying index.jsp's javascript submit to return false: ... queryForm.submit(); return false; ... I will try to submit a patch for this easy fix, new to all this so please bear with me... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-2455) admin/index.jsp double submit on IE
[ https://issues.apache.org/jira/browse/SOLR-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reassigned SOLR-2455: --- Assignee: Uwe Schindler admin/index.jsp double submit on IE --- Key: SOLR-2455 URL: https://issues.apache.org/jira/browse/SOLR-2455 Project: Solr Issue Type: Bug Affects Versions: 3.1 Environment: IE8 Reporter: Jeffrey Chang Assignee: Uwe Schindler Priority: Minor Labels: patch Attachments: SOLR-2455.patch, SOLR-2455.patch /admin/index.jsp could issue a double submit on IE causing Jetty to error out. Here are the steps to reproduce on IE8 (only applies to IE8 on occasional basis, really more of an IE8 bug...): 1. Open IE8 2. Browse to http://localhost:8983/solr/admin 3. Submit a query 4. Displayed on Jetty log due to double submit: SEVERE: org.mortbay.jetty.EofException at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791) This can be fixed easily by modifying index.jsp's javascript submit to return false: ... queryForm.submit(); return false; ... I will try to submit a patch for this easy fix, new to all this so please bear with me... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2455) admin/index.jsp double submit on IE
[ https://issues.apache.org/jira/browse/SOLR-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated SOLR-2455: Fix Version/s: 4.0 3.2 3.1.1 admin/index.jsp double submit on IE --- Key: SOLR-2455 URL: https://issues.apache.org/jira/browse/SOLR-2455 Project: Solr Issue Type: Bug Affects Versions: 3.1 Environment: IE8 Reporter: Jeffrey Chang Assignee: Uwe Schindler Priority: Minor Labels: patch Fix For: 3.1.1, 3.2, 4.0 Attachments: SOLR-2455.patch, SOLR-2455.patch /admin/index.jsp could issue a double submit on IE causing Jetty to error out. Here are the steps to reproduce on IE8 (only applies to IE8 on occasional basis, really more of an IE8 bug...): 1. Open IE8 2. Browse to http://localhost:8983/solr/admin 3. Submit a query 4. Displayed on Jetty log due to double submit: SEVERE: org.mortbay.jetty.EofException at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791) This can be fixed easily by modifying index.jsp's javascript submit to return false: ... queryForm.submit(); return false; ... I will try to submit a patch for this easy fix, new to all this so please bear with me... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Inquiries on SOLR DEV Contribution (SOLR-2455)
Hi Jeffry, You don't have to do anything on this issue. I already assigned it to myself and I will commit your patch to 4.0 (trunk) and backport through simple merges. In general to bring fixes in, simply open issues, we will take care. If a fix is broken or not valid, somebody will notify you! Thanks for helping to improve Solr! Thanks! Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de/ http://www.thetaphi.de eMail: u...@thetaphi.de From: Jeffrey Chang [mailto:jclal...@gmail.com] Sent: Wednesday, April 06, 2011 8:51 AM To: dev@lucene.apache.org Subject: Inquiries on SOLR DEV Contribution (SOLR-2455) Hi All, I'd like to start small and see how I can contribute to SOLR development. By following http://wiki.apache.org/solr/HowToContribute, I've created a new defect (SOLR-2455) and created a patch for it. Not sure if I've done the right steps - can someone provide me some guidance if I'm on the right track to make some contributions? I'm still confused on how the committers decide which patch to include the fixes into. E.g. for the fixes I contribute, since I modified from Trunk, I'd assume it goes to SOLR 4.0.x? Also, should I modify JIRA csae status to Resolve myself? Thanks, Jeff
[jira] [Commented] (SOLR-2455) admin/index.jsp double submit on IE
[ https://issues.apache.org/jira/browse/SOLR-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016291#comment-13016291 ] Uwe Schindler commented on SOLR-2455: - Committed trunk revision 1089335, branch 3.x revision 1089340 I will keep this open for possible backport to 3.1.1 admin/index.jsp double submit on IE --- Key: SOLR-2455 URL: https://issues.apache.org/jira/browse/SOLR-2455 Project: Solr Issue Type: Bug Affects Versions: 3.1 Environment: IE8 Reporter: Jeffrey Chang Assignee: Uwe Schindler Priority: Minor Labels: patch Fix For: 3.1.1, 3.2, 4.0 Attachments: SOLR-2455.patch, SOLR-2455.patch /admin/index.jsp could issue a double submit on IE causing Jetty to error out. Here are the steps to reproduce on IE8 (only applies to IE8 on occasional basis, really more of an IE8 bug...): 1. Open IE8 2. Browse to http://localhost:8983/solr/admin 3. Submit a query 4. Displayed on Jetty log due to double submit: SEVERE: org.mortbay.jetty.EofException at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791) This can be fixed easily by modifying index.jsp's javascript submit to return false: ... queryForm.submit(); return false; ... I will try to submit a patch for this easy fix, new to all this so please bear with me... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2458) post.jar fails on non-XML updateHandlers
post.jar fails on non-XML updateHandlers Key: SOLR-2458 URL: https://issues.apache.org/jira/browse/SOLR-2458 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 3.1 Reporter: Jan Høydahl SimplePostTool.java by default tries to issue a commit after posting. Problem is that it does this by appending commit/ to the stream. This does not work when using non-XML requesthandler, such as CSV. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2458) post.jar fails on non-XML updateHandlers
[ https://issues.apache.org/jira/browse/SOLR-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016314#comment-13016314 ] Jan Høydahl commented on SOLR-2458: --- Example: {code} lap:exampledocs janhoy$ java -Durl=http://localhost:8983/solr/update/csv -jar post.jar books.csv SimplePostTool: version 1.3 SimplePostTool: POSTing files to http://localhost:8983/solr/update/csv.. SimplePostTool: POSTing file books.csv SimplePostTool: COMMITting Solr index changes.. SimplePostTool: FATAL: Solr returned an error #400 undefined field commit/ {code} The commit should be sent in a different way, problem is how to know where and how to send the commit in case of non-standard URLs, such as http://localhost:8983/solr/my/custom/updatehandler post.jar fails on non-XML updateHandlers Key: SOLR-2458 URL: https://issues.apache.org/jira/browse/SOLR-2458 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 3.1 Reporter: Jan Høydahl Labels: post.jar SimplePostTool.java by default tries to issue a commit after posting. Problem is that it does this by appending commit/ to the stream. This does not work when using non-XML requesthandler, such as CSV. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2458) post.jar fails on non-XML updateHandlers
[ https://issues.apache.org/jira/browse/SOLR-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016326#comment-13016326 ] Uwe Schindler commented on SOLR-2458: - The commit could be snet at the end as a single xml document in a separate request if the content type of data is different. post.jar fails on non-XML updateHandlers Key: SOLR-2458 URL: https://issues.apache.org/jira/browse/SOLR-2458 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 3.1 Reporter: Jan Høydahl Labels: post.jar SimplePostTool.java by default tries to issue a commit after posting. Problem is that it does this by appending commit/ to the stream. This does not work when using non-XML requesthandler, such as CSV. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2458) post.jar fails on non-XML updateHandlers
[ https://issues.apache.org/jira/browse/SOLR-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016329#comment-13016329 ] Jan Høydahl commented on SOLR-2458: --- How would post.jar know the URL of the XmlUpdateRequestHandler? A) We could assume .*/solr/update as 99% would not modify the defaults? Or B) Assume that all UpdateRequestHandlers support a GET parameter commit=true In that case, we could append ?commit=true to the given URL. I know for a fact that /solr/update/csv?commit=true will work post.jar fails on non-XML updateHandlers Key: SOLR-2458 URL: https://issues.apache.org/jira/browse/SOLR-2458 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 3.1 Reporter: Jan Høydahl Labels: post.jar SimplePostTool.java by default tries to issue a commit after posting. Problem is that it does this by appending commit/ to the stream. This does not work when using non-XML requesthandler, such as CSV. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3015) UDIDIndexWriter keeps write lock on corrupt index
UDIDIndexWriter keeps write lock on corrupt index - Key: LUCENE-3015 URL: https://issues.apache.org/jira/browse/LUCENE-3015 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.3 Environment: Lucene 2.9.3 Reporter: Christian Danninger Try to open an index writer with new UDIDIndexWriter(directory, new FakeAnalyzer(), false); keeps a write.lock. Creating the IndexWriter will succeed, but a subsequent call to UDIDIndexWriter.getCounter() in the constructor failes. There are no possibilites to remove write.lock per an API call. The index writer is used to optimize the index, the index itself will be created by an different index. So after some time the index will be valid again, but the write lock still exists. So the process has to ended first an afterward the write lock could be removed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2378) FST-based Lookup (suggestions) for prefix matches.
[ https://issues.apache.org/jira/browse/SOLR-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016372#comment-13016372 ] Dawid Weiss commented on SOLR-2378: --- I've been waiting for somebody to look at this patch, guys, just to confirm that everything is fine with it. If so, I'd like to commit it in and move on to infix suggestions support maybe. FST-based Lookup (suggestions) for prefix matches. -- Key: SOLR-2378 URL: https://issues.apache.org/jira/browse/SOLR-2378 Project: Solr Issue Type: New Feature Components: spellchecker Reporter: Dawid Weiss Assignee: Dawid Weiss Labels: lookup, prefix Fix For: 4.0 Attachments: SOLR-2378.patch Implement a subclass of Lookup based on finite state automata/ transducers (Lucene FST package). This issue is for implementing a relatively basic prefix matcher, we will handle infixes and other types of input matches gradually. Impl. phases: - -write a DFA based suggester effectively identical to ternary tree based solution right now,- - -baseline benchmark against tern. tree (memory consumption, rebuilding speed, indexing speed; reuse Andrzej's benchmark code)- - -modify DFA to encode term weights directly in the automaton (optimize for onlyMostPopular case)- - -benchmark again- - add infix suggestion support with prefix matches boosted higher (?) - benchmark again - modify the tutorial on the wiki [http://wiki.apache.org/solr/Suggester] -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3015) UDIDIndexWriter keeps write lock on corrupt index
[ https://issues.apache.org/jira/browse/LUCENE-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016373#comment-13016373 ] Uwe Schindler commented on LUCENE-3015: --- What are you talking about? Lucene has no class UDIDIndexWriter, so maybe thats an external customization. If this is the case, I will close the issue. UDIDIndexWriter keeps write lock on corrupt index - Key: LUCENE-3015 URL: https://issues.apache.org/jira/browse/LUCENE-3015 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.3 Environment: Lucene 2.9.3 Reporter: Christian Danninger Try to open an index writer with new UDIDIndexWriter(directory, new FakeAnalyzer(), false); keeps a write.lock. Creating the IndexWriter will succeed, but a subsequent call to UDIDIndexWriter.getCounter() in the constructor failes. There are no possibilites to remove write.lock per an API call. The index writer is used to optimize the index, the index itself will be created by an different index. So after some time the index will be valid again, but the write lock still exists. So the process has to ended first an afterward the write lock could be removed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3015) UDIDIndexWriter keeps write lock on corrupt index
[ https://issues.apache.org/jira/browse/LUCENE-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016375#comment-13016375 ] Christian Danninger commented on LUCENE-3015: - Sorry about that, you are right. I'll close the ticket. UDIDIndexWriter keeps write lock on corrupt index - Key: LUCENE-3015 URL: https://issues.apache.org/jira/browse/LUCENE-3015 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.3 Environment: Lucene 2.9.3 Reporter: Christian Danninger Try to open an index writer with new UDIDIndexWriter(directory, new FakeAnalyzer(), false); keeps a write.lock. Creating the IndexWriter will succeed, but a subsequent call to UDIDIndexWriter.getCounter() in the constructor failes. There are no possibilites to remove write.lock per an API call. The index writer is used to optimize the index, the index itself will be created by an different index. So after some time the index will be valid again, but the write lock still exists. So the process has to ended first an afterward the write lock could be removed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2378) FST-based Lookup (suggestions) for prefix matches.
[ https://issues.apache.org/jira/browse/SOLR-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016378#comment-13016378 ] Robert Muir commented on SOLR-2378: --- Took a quick look: Builder.add(char[], int, int, ..) adds codepoints (Character.codePointAt/Character.charCount) [utf-32 order] but the comparator you use when building the automaton compares characters [utf-16 order]. so if someone has a term in the supplementary range in their index, the order will be inconsistent. So I think the comparator should just compare codepoints (it should iterate with codePointAt/charCount too)? FST-based Lookup (suggestions) for prefix matches. -- Key: SOLR-2378 URL: https://issues.apache.org/jira/browse/SOLR-2378 Project: Solr Issue Type: New Feature Components: spellchecker Reporter: Dawid Weiss Assignee: Dawid Weiss Labels: lookup, prefix Fix For: 4.0 Attachments: SOLR-2378.patch Implement a subclass of Lookup based on finite state automata/ transducers (Lucene FST package). This issue is for implementing a relatively basic prefix matcher, we will handle infixes and other types of input matches gradually. Impl. phases: - -write a DFA based suggester effectively identical to ternary tree based solution right now,- - -baseline benchmark against tern. tree (memory consumption, rebuilding speed, indexing speed; reuse Andrzej's benchmark code)- - -modify DFA to encode term weights directly in the automaton (optimize for onlyMostPopular case)- - -benchmark again- - add infix suggestion support with prefix matches boosted higher (?) - benchmark again - modify the tutorial on the wiki [http://wiki.apache.org/solr/Suggester] -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2378) FST-based Lookup (suggestions) for prefix matches.
[ https://issues.apache.org/jira/browse/SOLR-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016381#comment-13016381 ] Yonik Seeley commented on SOLR-2378: If it causes too much of a lookup performance hit, the Builder could just build in utf-16 order too? FST-based Lookup (suggestions) for prefix matches. -- Key: SOLR-2378 URL: https://issues.apache.org/jira/browse/SOLR-2378 Project: Solr Issue Type: New Feature Components: spellchecker Reporter: Dawid Weiss Assignee: Dawid Weiss Labels: lookup, prefix Fix For: 4.0 Attachments: SOLR-2378.patch Implement a subclass of Lookup based on finite state automata/ transducers (Lucene FST package). This issue is for implementing a relatively basic prefix matcher, we will handle infixes and other types of input matches gradually. Impl. phases: - -write a DFA based suggester effectively identical to ternary tree based solution right now,- - -baseline benchmark against tern. tree (memory consumption, rebuilding speed, indexing speed; reuse Andrzej's benchmark code)- - -modify DFA to encode term weights directly in the automaton (optimize for onlyMostPopular case)- - -benchmark again- - add infix suggestion support with prefix matches boosted higher (?) - benchmark again - modify the tutorial on the wiki [http://wiki.apache.org/solr/Suggester] -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Wiki docs about compression
Yes, you're right Eric. The feature was removed in Solr 1.4.1 much to my chagrin. On my todo list in the next couple months or so, I intend to bring it back -- at least for 4.0 ~ David From: Eric Pugh [ep...@opensourceconnections.com] Sent: Wednesday, April 06, 2011 12:27 AM To: solr-...@lucene.apache.org Subject: Wiki docs about compression Correct me if I am wrong, but isn't compression of fields removed from Solr 3.1? I think the docs about compression on the wiki at http://wiki.apache.org/solr/SchemaXml need to clarify that in 3.1 these features where removed! Just wanted to confirm my understanding of this Eric - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Co-Author: Solr 1.4 Enterprise Search Server available from http://www.packtpub.com/solr-1-4-enterprise-search-server This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1155) Change DirectUpdateHandler2 to allow concurrent adds during an autocommit
[ https://issues.apache.org/jira/browse/SOLR-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016382#comment-13016382 ] Jayson Minard commented on SOLR-1155: - Is there interest in me updating this for 3.1? It is a huge performance improvement over DirectUpdateHandler2 under heavy indexing load... Change DirectUpdateHandler2 to allow concurrent adds during an autocommit - Key: SOLR-1155 URL: https://issues.apache.org/jira/browse/SOLR-1155 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.3, 1.4 Reporter: Jayson Minard Fix For: Next Attachments: SOLR-1155-release1.4-rev834789.patch, SOLR-1155-trunk-rev834706.patch, Solr-1155.patch, Solr-1155.patch Currently DirectUpdateHandler2 will block adds during a commit, and it seems to be possible with recent changes to Lucene to allow them to run concurrently. See: http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--td23435224.html -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2378) FST-based Lookup (suggestions) for prefix matches.
[ https://issues.apache.org/jira/browse/SOLR-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016384#comment-13016384 ] Robert Muir commented on SOLR-2378: --- I am referring to build-time, not runtime here. run-time can handle supplementary characters wrong and I wouldn't object to committing it, but currently if someone has terms 0x in their index it will preventing the FST from being built at all and suggesting will not work? (as i think the FST will throw exc?) FST-based Lookup (suggestions) for prefix matches. -- Key: SOLR-2378 URL: https://issues.apache.org/jira/browse/SOLR-2378 Project: Solr Issue Type: New Feature Components: spellchecker Reporter: Dawid Weiss Assignee: Dawid Weiss Labels: lookup, prefix Fix For: 4.0 Attachments: SOLR-2378.patch Implement a subclass of Lookup based on finite state automata/ transducers (Lucene FST package). This issue is for implementing a relatively basic prefix matcher, we will handle infixes and other types of input matches gradually. Impl. phases: - -write a DFA based suggester effectively identical to ternary tree based solution right now,- - -baseline benchmark against tern. tree (memory consumption, rebuilding speed, indexing speed; reuse Andrzej's benchmark code)- - -modify DFA to encode term weights directly in the automaton (optimize for onlyMostPopular case)- - -benchmark again- - add infix suggestion support with prefix matches boosted higher (?) - benchmark again - modify the tutorial on the wiki [http://wiki.apache.org/solr/Suggester] -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene Spatial Future
On Wed, Apr 6, 2011 at 9:30 AM, Grant Ingersoll grant.ingers...@gmail.com wrote: By all means go for it. I don't see any reason not too. I guess in the end, I'm not sure what you are asking us to do. Do you want Lucene/Solr to remove all of our spatial support in favor of incorporating this new project or do you just want those who are interested in spatial to join the new project and it can be seen as an add on? Let's not confuse the issue... what is being discussed really has no impact on the basic spatial search that was added to Solr. As you said yourself, the Solr geo stuff uses very little of the spatial contrib stuff. This is about building and maintaining a spatial module, and the best place to do it (which I'll leave up to those doing the work... I'm pretty happy with basic point, radius, bounding-box). -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene Spatial Future
The spatial API in google code takes a pretty different approach to spatial search in general. It is organized into three key packages: 1. core stuff, no lucene dependencies. Most of the math is here Aren't you just replicating what SIS is doing for this piece? If you don't have a JTS requirement, that means you are going to need equivalent math, right? Isn't that what SIS is about? This package defines the general interfaces and concepts used in the project. Things like SpatialOperations, Shape, PrefixGrid and DistanceCalculator -- these can then be backed by simple math, JTS, or eventually maybe SIS. The other key stuff in this package is the client side objects that to build spatial queries. Essentially everything that could be bundled with solrj. I could suggest a new ASF project, but there seems like too much overlap with SIS and very different philosophy on 3rd party libraries. In the end, osgeo.com seems like a more natural home and has better branding for spatial related work anyway. By all means go for it. I don't see any reason not too. I guess in the end, I'm not sure what you are asking us to do. Do you want Lucene/Solr to remove all of our spatial support in favor of incorporating this new project or do you just want those who are interested in spatial to join the new project and it can be seen as an add on? I'm trying to have an open discussion about what makes sense for spatial development. I don't *want* to start a new project... but I think we need a dev/test environment that can support the whole range of spatial needs -- without reinventing many wheels, this includes JTS. Lucene currently has LGPL compile dependencies, but they are on the way out, and (unless I'm missing something) i don't think folks are open to adding a JTS build/test dependency -- Maybe I should call a vote on the JTS issue, though i suspect most binding votes are -0 or -1. I *totally* understand if other people don't want JTS in the build system -- it is not a core concern to most people involved. If the lucene build/test environment does not support spatial development, this leads me to think about other places to host the project... wherever it makes the most sense. I would prefer staying within lucene because it is easiest for me. I don't want this to be competition or duplicate effort. I hope it lets us clean up the broken stuff from lucene and overtime deprecate the parts that are better supported elsewhere. I want the best spatial support available in solr out-of-the-box. If this project is eventually built and maintained outside of lucene, i would like the .jar distributed in solr. ryan - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2458) post.jar fails on non-XML updateHandlers
[ https://issues.apache.org/jira/browse/SOLR-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016395#comment-13016395 ] Yonik Seeley commented on SOLR-2458: bq. Assume that all UpdateRequestHandlers support a GET parameter commit=true I think we should assume this, and fix anything where it doesn't work. post.jar fails on non-XML updateHandlers Key: SOLR-2458 URL: https://issues.apache.org/jira/browse/SOLR-2458 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 3.1 Reporter: Jan Høydahl Labels: post.jar SimplePostTool.java by default tries to issue a commit after posting. Problem is that it does this by appending commit/ to the stream. This does not work when using non-XML requesthandler, such as CSV. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-trunk - Build # 6796 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/6796/ 2 tests failed. REGRESSION: org.apache.solr.cloud.BasicDistributedZkTest.testDistribSearch Error Message: Error executing query Stack Trace: org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:119) at org.apache.solr.cloud.BasicDistributedZkTest.queryServer(BasicDistributedZkTest.java:274) at org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:335) at org.apache.solr.cloud.BasicDistributedZkTest.doTest(BasicDistributedZkTest.java:128) at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:593) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160) Caused by: org.apache.solr.common.SolrException: no servers hosting shard: no servers hosting shard: request: http://127.0.0.1:33773/solr/select?q=*:*sort=n_ti1 descshards=shard3,shard4,shard5,shard6distrib=truewt=javabinversion=2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:436) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:245) at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:249) at org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:152) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89) REGRESSION: org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration Error Message: null Stack Trace: junit.framework.AssertionFailedError: at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160) at org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration(CloudStateUpdateTest.java:227) Build Log (for compile errors): [...truncated 8761 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene Spatial Future
On Apr 6, 2011, at 10:45 AM, Ryan McKinley wrote: I'm trying to have an open discussion about what makes sense for spatial development. I don't *want* to start a new project... but I think we need a dev/test environment that can support the whole range of spatial needs -- without reinventing many wheels, this includes JTS. Lucene currently has LGPL compile dependencies, but they are on the way out, and (unless I'm missing something) i don't think folks are open to adding a JTS build/test dependency -- Maybe I should call a vote on the JTS issue, though i suspect most binding votes are -0 or -1. I *totally* understand if other people don't want JTS in the build system -- it is not a core concern to most people involved. Until there is a specific patch that brings in and shows how JTS would be incorporated (via reflection and as a totally optional piece, presumably, per the ASF LGPL guidelines), there really isn't anything to vote on. I don't want this to be competition or duplicate effort. I hope it lets us clean up the broken stuff from lucene and overtime deprecate the parts that are better supported elsewhere. I totally agree. I hope I wasn't framing it that way. I'm just trying to understand what's being proposed. I can see advantages to both. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: My GSOC proposal
Hi. I wrote a sample code to test out speed difference between SEQUENTIAL and O_DIRECT( I used the madvise flag-MADV_DONTNEED) reads . This is the link to the code: http://pastebin.com/8QywKGyS There was a speed difference which when i switched between the two flags. I have not used the O_DIRECT flag because Linus had criticized it. Is this what the flags are intended to be used for ? This is just a sample code with a test file . On Wed, Apr 6, 2011 at 12:11 PM, Simon Willnauer simon.willna...@googlemail.com wrote: Hey Varun, On Tue, Apr 5, 2011 at 11:07 PM, Michael McCandless luc...@mikemccandless.com wrote: Hi Varun, Those two issues would make a great GSoC! Comments below... +1 On Tue, Apr 5, 2011 at 1:56 PM, Varun Thacker varunthacker1...@gmail.com wrote: I would like to combine two tasks as part of my project namely-Directory createOutput and openInput should take an IOContext (Lucene-2793) and compliment it by Generalize DirectIOLinuxDir to UnixDir (Lucene-2795). The first part of the project is aimed at significantly reducing time taken to search during indexing by adding an IOContext which would store buffer size and have options to bypass the OS’s buffer cache (This is what causes the slowdown in search ) and other hints. Once completed I would move on to Lucene-2795 and generalize the Directory implementation to make a UnixDirectory . So, the first part (LUCENE-2793) should cause no change at all to performance, functionality, etc., because it's merely installing the plumbing (IOContext threaded throughout the low-level store APIs in Lucene) so that higher levels can send important details down to the Directory. We'd fix IndexWriter/IndexReader to fill out this IOContext with the details (merging, flushing, new reader, etc.). There's some fun/freedom here in figuring out just what details should be included in IOContext... (eg: is it low level set buffer size to 4 KB or is it high level I am opening a new near-real-time reader). This first step is a rote cutover, just changing APIs but in no way taking advantage of the new APIs. The 2nd step (LUCENE-2795) would then take advantage of this plumbing, by creating a UnixDir impl that, using JNI (C code), passes advanced flags when opening files, based on the incoming IOContext. The goal is a single UnixDir that has ifdefs so that it's usable across multiple Unices, and eg would use direct IO if the context is merging. If we are ambitious we could rope Windows into the mix, too, and then this would be NativeDir... We can measure success by validating that a big merge while searching does not hurt search performance? (Ie we should be able to reproduce the results from http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html). Thanks for the summary mike! I have spoken to Micheal McCandless and Simon Willnauer about undertaking these tasks. Micheal McCandless has agreed to mentor me . I would love to be able to contribute and learn from Apache Lucene community this summer. Also I would love suggestions on how to make my application proposal stronger. I think either Simon or I can be the official mentor, and then the other one of us (and other Lucene committers) will support/chime in... I will take the official responsibility here once we are there! simon This is an important change for Lucene! Mike - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Regards, Varun Thacker http://varunthacker.wordpress.com
Re: [VOTE] Release PyLucene 3.1.0
I'm seeing parse failures on this query string: categories:RSSReader AND id:[0-00--000 TO 01299-51-3142-795] AND NOT categories:RSSReader/_noexpire_ thr002: RSSReader: Traceback (most recent call last): thr002: File /local/lib/UpLib-1.7.11/site-extensions/RSSReader.py, line 271, in _scan_rss_sites thr002: hits = repo.do_query(categories:RSSReader AND id:[0-00--000 TO %s] AND NOT categories:RSSReader/_noexpire_ % old_id) thr002: File /local/share/UpLib-1.7.11/code/uplib/repository.py, line 1172, in do_query thr002: results = self.do_full_query(query_string, searchtype) thr002: File /local/share/UpLib-1.7.11/code/uplib/repository.py, line 1196, in do_full_query thr002: results = self.pylucene_search(searchtype, query_string) thr002: File /local/share/UpLib-1.7.11/code/uplib/repository.py, line 1081, in pylucene_search thr002: v = self.__search_context.search(query_string) thr002: File /local/share/UpLib-1.7.11/code/uplib/indexing.py, line 913, in search thr002: parsed_query = query_parser.parseQ(query) thr002: File /local/share/UpLib-1.7.11/code/uplib/indexing.py, line 550, in parseQ thr002: query = QueryParser.parse(self, querystring) thr002: JavaError: org.apache.jcc.PythonException: getFieldQuery_quoted thr002: AttributeError: getFieldQuery_quoted thr002: Java stacktrace: thr002: org.apache.jcc.PythonException: getFieldQuery_quoted thr002: AttributeError: getFieldQuery_quoted thr002: at org.apache.pylucene.queryParser.PythonMultiFieldQueryParser.getFieldQuery_quoted(Native Method) thr002: at org.apache.pylucene.queryParser.PythonMultiFieldQueryParser.getFieldQuery(Unknown Source) thr002: at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1421) thr002: at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1309) thr002: at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1237) thr002: at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1226) thr002: at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:206) Bill
Re: Lucene Spatial Future
On Apr 6, 2011, at 11:38 AM, Grant Ingersoll wrote: Until there is a specific patch that brings in and shows how JTS would be incorporated (via reflection and as a totally optional piece, presumably, per the ASF LGPL guidelines), there really isn't anything to vote on. I think what is being asked to vote on is deprecation/removal of Lucene's spatial contrib module with its replacement being an externally hosted ASL-licened module expressly designed to work with Lucene/Solr 4.0 and beyond (temporarily known as lucene-spatial-playground). What would stay is the _basic_ spatial support that got into Lucene/Solr 3.1. Furthermore, no future spatial work would be accepted on Lucene/Solr aside from support of the basic capability. This module isn't quite ready so perhaps the vote should wait till it is. ~ David Smiley Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Inquiries on SOLR DEV Contribution (SOLR-2455)
Thanks Uwe. I'll dig in more to see how I can help further now that I understand the contribution process more. Thanks, Jeff On Wed, Apr 6, 2011 at 3:19 PM, Uwe Schindler u...@thetaphi.de wrote: Hi Jeffry, You don’t have to do anything on this issue. I already assigned it to myself and I will commit your patch to 4.0 (trunk) and backport through simple merges. In general to bring fixes in, simply open issues, we will take care. If a fix is broken or not valid, somebody will notify you! Thanks for helping to improve Solr! Thanks! Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de *From:* Jeffrey Chang [mailto:jclal...@gmail.com] *Sent:* Wednesday, April 06, 2011 8:51 AM *To:* dev@lucene.apache.org *Subject:* Inquiries on SOLR DEV Contribution (SOLR-2455) Hi All, I'd like to start small and see how I can contribute to SOLR development. By following http://wiki.apache.org/solr/HowToContribute, I've created a new defect (SOLR-2455) and created a patch for it. Not sure if I've done the right steps - can someone provide me some guidance if I'm on the right track to make some contributions? I'm still confused on how the committers decide which patch to include the fixes into. E.g. for the fixes I contribute, since I modified from Trunk, I'd assume it goes to SOLR 4.0.x? Also, should I modify JIRA csae status to Resolve myself? Thanks, Jeff
Re: Lucene Spatial Future
-1. I *totally* understand if other people don't want JTS in the build system -- it is not a core concern to most people involved. Until there is a specific patch that brings in and shows how JTS would be incorporated (via reflection and as a totally optional piece, presumably, per the ASF LGPL guidelines), there really isn't anything to vote on. fair point -- the optional logistics are working in a maven build. I'm reluctant to convert to the ant build system if there is already strong opposition to the idea. If folks are OK with the idea, I will happily make concrete patch/branch that we could vote on. so maybe i'm just looking for a POLL not a vote -- find out if this is a non-starter or not (i am under the impression that it might be) FYI, the optional support is now handled by a static SpatialContextProvider' that you can ask for a SpatialContext. By default it makes a SimpleSpatialContext -- if you set some system properties, it uses reflection to load a different instance. Eventually, this should be replaced with the standard java service loader stuff (i think) ryan - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Release PyLucene 3.1.0
Bill Janssen jans...@parc.com wrote: I'm seeing parse failures on this query string: categories:RSSReader AND id:[0-00--000 TO 01299-51-3142-795] AND NOT categories:RSSReader/_noexpire_ 3.0.3 works just fine. Bill
[jira] [Commented] (SOLR-2438) Case Insensitive Search for Wildcard Queries
[ https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016464#comment-13016464 ] David Smiley commented on SOLR-2438: Nice Peter. So why did you create another JIRA issue instead of putting your patch on SOLR-219? This is yet another issue and there is already a quasi-community of commenters (including me) on that other issue). Case Insensitive Search for Wildcard Queries Key: SOLR-2438 URL: https://issues.apache.org/jira/browse/SOLR-2438 Project: Solr Issue Type: Improvement Reporter: Peter Sturge Attachments: SOLR-2438.patch This patch adds support to allow case-insensitive queries on wildcard searches for configured TextField field types. This patch extends the excellent work done Yonik and Michael in SOLR-219. The approach here is different enough (imho) to warrant a separate JIRA issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene Spatial Future
On Apr 6, 2011, at 12:06 PM, Smiley, David W. wrote: On Apr 6, 2011, at 11:38 AM, Grant Ingersoll wrote: Until there is a specific patch that brings in and shows how JTS would be incorporated (via reflection and as a totally optional piece, presumably, per the ASF LGPL guidelines), there really isn't anything to vote on. I think what is being asked to vote on is deprecation/removal of Lucene's spatial contrib module Just FYI, It is already deprecated in 3.x and slated for removal in 4.0. Someone just needs to axe the appropriate bits (and either move what's needed to Solr or to modules) with its replacement being an externally hosted ASL-licened module expressly designed to work with Lucene/Solr 4.0 and beyond (temporarily known as lucene-spatial-playground). What would stay is the _basic_ spatial support that got into Lucene/Solr 3.1. Furthermore, no future spatial work would be accepted on Lucene/Solr aside from support of the basic capability. That is the piece I was wondering about and why I said yesterday it isn't likely to work, as it will just fork. How do you tell people not to put in patches to L/S, especially when part of it is native and part of it isn't? - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: My GSOC proposal
That test code looks good -- you really should have seen awful performance had you used O_DIRECT since you read byte by byte. A more realistic test is to read a whole buffer (eg 4 KB is what Lucene now uses during merging, but we'd probably up this to like 1 MB when using O_DIRECT). Linus does hate O_DIRECT (see http://kerneltrap.org/node/7563), and for good reason: its existence means projects like ours can use it to work around limitations in the Linux IO apis that control the buffer cache when, otherwise, we might conceivably make patches to fix Linux correctly. It's an escape hatch, and we all use the escape hatch instead of trying to fix Linux for real... For example the NOREUSE flag is a no-op now in Linux, which is a shame, because that's precisely the flag we'd want to use for merging (along with SEQUENTIAL). Had that flag been implemented well, it'd give better results than our workaround using O_DIRECT. Anyway, giving how things are, until we can get more control (wy up in Javaland) over the buffer cache, O_DIRECT (via native directory impl through JNI) is our only real option, today. More details here: http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html Note that other OSs likely do a better job and actually implement NOREUSE, and similar APIs, so the generic Unix/WindowsNativeDirectory would simply use NOREUSE on these platforms for I/O during segment merging. Mike http://blog.mikemccandless.com On Wed, Apr 6, 2011 at 11:56 AM, Varun Thacker varunthacker1...@gmail.com wrote: Hi. I wrote a sample code to test out speed difference between SEQUENTIAL and O_DIRECT( I used the madvise flag-MADV_DONTNEED) reads . This is the link to the code: http://pastebin.com/8QywKGyS There was a speed difference which when i switched between the two flags. I have not used the O_DIRECT flag because Linus had criticized it. Is this what the flags are intended to be used for ? This is just a sample code with a test file . On Wed, Apr 6, 2011 at 12:11 PM, Simon Willnauer simon.willna...@googlemail.com wrote: Hey Varun, On Tue, Apr 5, 2011 at 11:07 PM, Michael McCandless luc...@mikemccandless.com wrote: Hi Varun, Those two issues would make a great GSoC! Comments below... +1 On Tue, Apr 5, 2011 at 1:56 PM, Varun Thacker varunthacker1...@gmail.com wrote: I would like to combine two tasks as part of my project namely-Directory createOutput and openInput should take an IOContext (Lucene-2793) and compliment it by Generalize DirectIOLinuxDir to UnixDir (Lucene-2795). The first part of the project is aimed at significantly reducing time taken to search during indexing by adding an IOContext which would store buffer size and have options to bypass the OS’s buffer cache (This is what causes the slowdown in search ) and other hints. Once completed I would move on to Lucene-2795 and generalize the Directory implementation to make a UnixDirectory . So, the first part (LUCENE-2793) should cause no change at all to performance, functionality, etc., because it's merely installing the plumbing (IOContext threaded throughout the low-level store APIs in Lucene) so that higher levels can send important details down to the Directory. We'd fix IndexWriter/IndexReader to fill out this IOContext with the details (merging, flushing, new reader, etc.). There's some fun/freedom here in figuring out just what details should be included in IOContext... (eg: is it low level set buffer size to 4 KB or is it high level I am opening a new near-real-time reader). This first step is a rote cutover, just changing APIs but in no way taking advantage of the new APIs. The 2nd step (LUCENE-2795) would then take advantage of this plumbing, by creating a UnixDir impl that, using JNI (C code), passes advanced flags when opening files, based on the incoming IOContext. The goal is a single UnixDir that has ifdefs so that it's usable across multiple Unices, and eg would use direct IO if the context is merging. If we are ambitious we could rope Windows into the mix, too, and then this would be NativeDir... We can measure success by validating that a big merge while searching does not hurt search performance? (Ie we should be able to reproduce the results from http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html). Thanks for the summary mike! I have spoken to Micheal McCandless and Simon Willnauer about undertaking these tasks. Micheal McCandless has agreed to mentor me . I would love to be able to contribute and learn from Apache Lucene community this summer. Also I would love suggestions on how to make my application proposal stronger. I think either Simon or I can be the official mentor, and then the other one of us (and other Lucene committers) will support/chime in... I will take the official responsibility here once we are there! simon This is an important change for Lucene! Mike
[jira] [Commented] (SOLR-2458) post.jar fails on non-XML updateHandlers
[ https://issues.apache.org/jira/browse/SOLR-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016471#comment-13016471 ] Yonik Seeley commented on SOLR-2458: part of the reason it works the way it does now is that when commit=true it POSTs a single commit at the end of multiple file POSTs, if we use the param based commit it would either need to specify commit on all of them, or keep track of the last one only add the param there. Although only adding a commit on the last update should be easy, we could also just do it via the URL. I believe posting ?commit=true to update handlers w/o a body works? post.jar fails on non-XML updateHandlers Key: SOLR-2458 URL: https://issues.apache.org/jira/browse/SOLR-2458 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 3.1 Reporter: Jan Høydahl Labels: post.jar SimplePostTool.java by default tries to issue a commit after posting. Problem is that it does this by appending commit/ to the stream. This does not work when using non-XML requesthandler, such as CSV. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (SOLR-2458) post.jar fails on non-XML updateHandlers
[ https://issues.apache.org/jira/browse/SOLR-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016471#comment-13016471 ] Yonik Seeley edited comment on SOLR-2458 at 4/6/11 6:13 PM: bq. part of the reason it works the way it does now is that when commit=true it POSTs a single commit at the end of multiple file POSTs, if we use the param based commit it would either need to specify commit on all of them, or keep track of the last one only add the param there. Although only adding a commit on the last update should be easy, we could also just do it via the URL. I believe posting ?commit=true to update handlers w/o a body works? was (Author: ysee...@gmail.com): part of the reason it works the way it does now is that when commit=true it POSTs a single commit at the end of multiple file POSTs, if we use the param based commit it would either need to specify commit on all of them, or keep track of the last one only add the param there. Although only adding a commit on the last update should be easy, we could also just do it via the URL. I believe posting ?commit=true to update handlers w/o a body works? post.jar fails on non-XML updateHandlers Key: SOLR-2458 URL: https://issues.apache.org/jira/browse/SOLR-2458 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 3.1 Reporter: Jan Høydahl Labels: post.jar SimplePostTool.java by default tries to issue a commit after posting. Problem is that it does this by appending commit/ to the stream. This does not work when using non-XML requesthandler, such as CSV. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2458) post.jar fails on non-XML updateHandlers
[ https://issues.apache.org/jira/browse/SOLR-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016468#comment-13016468 ] Hoss Man commented on SOLR-2458: post.jar has hardcoded assumptions about what URL you want to hit and how it should behave -- if you want to change those assumptions there are documented params for changing it. -Durl=... and -Dcommit=false. if you want to post to something that isn't the XmlRequestHandler, you should specify -Dcommit=false, and then you can follow that with an explicit execution to commit... java -Durl=... -jar post.jar *.cssv java -jar post.jar part of the reason it works the way it does now is that when commit=true it POSTs a single commit at the end of multiple file POSTs, if we use the param based commit it would either need to specify commit on all of them, or keep track of the last one only add the param there. i don't object to changing post.jar to use a commit request param instead of sending the XML form, but this isn't a bug -- it's working as it was intended. post.jar fails on non-XML updateHandlers Key: SOLR-2458 URL: https://issues.apache.org/jira/browse/SOLR-2458 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 3.1 Reporter: Jan Høydahl Labels: post.jar SimplePostTool.java by default tries to issue a commit after posting. Problem is that it does this by appending commit/ to the stream. This does not work when using non-XML requesthandler, such as CSV. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene Spatial Future
On Apr 6, 2011, at 2:08 PM, Grant Ingersoll wrote: with its replacement being an externally hosted ASL-licened module expressly designed to work with Lucene/Solr 4.0 and beyond (temporarily known as lucene-spatial-playground). What would stay is the _basic_ spatial support that got into Lucene/Solr 3.1. Furthermore, no future spatial work would be accepted on Lucene/Solr aside from support of the basic capability. That is the piece I was wondering about and why I said yesterday it isn't likely to work, as it will just fork. How do you tell people not to put in patches to L/S, especially when part of it is native and part of it isn't? I think risk of this is mitigated if the proposed external module is highly visible in L/S -- in other words, it's downloaded and packaged up as part of the distribution -- a jar, sitting along side the other contrib module jars (no JTS of course!). Users would be referred to this module for non-basic spatial via the wiki and community in general. Of course I would prominently mention this module in the 2nd edition of my book ;-) which is well underway. ~ David Smiley Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [POLL] JTS compile/test dependency
On Apr 6, 2011, at 2:12 PM, Ryan McKinley wrote: [ ] OK with JTS compile dependency. Spatial support should be a module [X] OK with JTS, but think this spatial stuff should happen elsewhere [ ] Please, no LGPL dependencies in lucene build ~ David Smiley Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene Spatial Future
On Wed, Apr 6, 2011 at 2:08 PM, Grant Ingersoll grant.ingers...@gmail.com wrote: On Apr 6, 2011, at 12:06 PM, Smiley, David W. wrote: with its replacement being an externally hosted ASL-licened module expressly designed to work with Lucene/Solr 4.0 and beyond (temporarily known as lucene-spatial-playground). What would stay is the _basic_ spatial support that got into Lucene/Solr 3.1. Furthermore, no future spatial work would be accepted on Lucene/Solr aside from support of the basic capability. That is the piece I was wondering about and why I said yesterday it isn't likely to work, as it will just fork. How do you tell people not to put in patches to L/S, especially when part of it is native and part of it isn't? Right - there's no need to try and make promises about the future. It seems unrelated to the questions at hand here. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2378) FST-based Lookup (suggestions) for prefix matches.
[ https://issues.apache.org/jira/browse/SOLR-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016481#comment-13016481 ] Dawid Weiss commented on SOLR-2378: --- Oh, right -- I didn't peek at the inside of Builder.add(char[],...), but I will verify this by trying to add something that has multilingual stuff in it -- will update the patch tomorrow, hopefully. I would also love to have somebody who actually uses suggestions to try to compile it and use it on a production data set to see if my benchmark was approximately right with respect to the speed differences between the different available implementations. FST-based Lookup (suggestions) for prefix matches. -- Key: SOLR-2378 URL: https://issues.apache.org/jira/browse/SOLR-2378 Project: Solr Issue Type: New Feature Components: spellchecker Reporter: Dawid Weiss Assignee: Dawid Weiss Labels: lookup, prefix Fix For: 4.0 Attachments: SOLR-2378.patch Implement a subclass of Lookup based on finite state automata/ transducers (Lucene FST package). This issue is for implementing a relatively basic prefix matcher, we will handle infixes and other types of input matches gradually. Impl. phases: - -write a DFA based suggester effectively identical to ternary tree based solution right now,- - -baseline benchmark against tern. tree (memory consumption, rebuilding speed, indexing speed; reuse Andrzej's benchmark code)- - -modify DFA to encode term weights directly in the automaton (optimize for onlyMostPopular case)- - -benchmark again- - add infix suggestion support with prefix matches boosted higher (?) - benchmark again - modify the tutorial on the wiki [http://wiki.apache.org/solr/Suggester] -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [POLL] JTS compile/test dependency
On Wed, Apr 6, 2011 at 2:12 PM, Ryan McKinley ryan...@gmail.com wrote: Some may be following the thread on spatial development... here is a quick summary, and a poll to help decide what may be the best next move. I'm hoping to introduce a high level spatial API that can be used for a variety of indexing strategies and computational needs. For simple point in BBox and point in WGS84 radius, this does not require any external libraries. To support more complex queries -- point in polygon, complex geometry intersections, etc -- we need an LGPL library JTS. The LGPL dependency is only needed to compile/test, there is no runtime requirement for JTS. To enable the more complicated options you would need to add JTS to the classpath and perhaps set a environment variable. This is essentially what we are now doing with the (soon to be removed) bdb contrib. I am trying to figure out the best home for this code and development to live. I think it is essential for the JTS support to be part of the core build/test -- splitting it into a separate module that is tested elsewhere is not an option. This raises the basic question of if people are willing to have the LGPL build dependency as part of the main lucene build. I think it is, but am sympathetic to the idea that it might not be. I'm sorta confused about this (i'll probably offend someone here, but so be it) We have a contrib module for spatial that is experimental, people want to deprecate, and say has problems. Why must the super-expert-polygon stuff sit with the basic capability that probably most users want: the ability to do basic searches (probably in combination with text too) in their app? Its hard for me to tell, i hope the reason isn't elegance, but why aren't we working on making a simple,supported,80-20 case in lucene that non-spatial-gurus (and users) understand and can maintain... then it would seem ideal for the complex stuff to be outside of this project with any dependencies it wants? Users are probably really confused about the spatial situation: is it because we are floundering around this expert stuff
Re: [POLL] JTS compile/test dependency
On Wed, Apr 6, 2011 at 2:39 PM, Grant Ingersoll grant.ingers...@gmail.com wrote: I don't see why we need a compile/test dependency is needed at all: We provide a factory based spatial module where one specifies a SpatialProvider. We have our own implementation of that which works for some set (or all) of the features. An external project (Apache Extras?) This is the non-starter for me. This would split the dev across multiple places and mean that the implementations I use (JTS) would not be a first class citizen in testing. This is the point of the whole debate... and why i think elsewhere may be a better option. ryan - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [POLL] JTS compile/test dependency
On Apr 6, 2011, at 2:44 PM, Ryan McKinley wrote: On Wed, Apr 6, 2011 at 2:39 PM, Grant Ingersoll grant.ingers...@gmail.com wrote: I don't see why we need a compile/test dependency is needed at all: We provide a factory based spatial module where one specifies a SpatialProvider. We have our own implementation of that which works for some set (or all) of the features. An external project (Apache Extras?) This is the non-starter for me. This would split the dev across multiple places and mean that the implementations I use (JTS) would not be a first class citizen in testing. This is the point of the whole debate... and why i think elsewhere may be a better option. That's a bit contradictory, though, isn't it? By definition, elsewhere means split too, b/c we have stated the point search stuff isn't going anywhere. And even if it does, you will still need to have a separate factory based implementation and ship a non-JTS provider, otherwise none of it can be packaged into a L/S release, so it's still the same amount of work. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [POLL] JTS compile/test dependency
On Wed, Apr 6, 2011 at 2:54 PM, Ryan McKinley ryan...@gmail.com wrote: The code can be separated so that the the dependencies are as you suggest -- i have done this, but it makes testing more difficult and less robust. As part of the framework I've introduced a robust way to use the same data and and tests with different strategies and implementations. For me to work on it, i need the stuff i use to be a first class citizen in testing. Right, but this creates a problem for our testing too: if we open this can of worms with optional LGPL stuff I think its going to actually complicate build and testing. I already stated my concerns about this here: http://s.apache.org/vE I don't think the bdb should be used as justification already that the can of worms is already open. Personally I didn't realize the license it had, and for these same reasons, when i found this out i put up a patch on Grant's issue. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-trunk - Build # 6805 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/6805/ 1 tests failed. REGRESSION: org.apache.solr.client.solrj.embedded.SolrExampleStreamingTest.testCommitWithin Error Message: expected:1 but was:0 Stack Trace: junit.framework.AssertionFailedError: expected:1 but was:0 at org.apache.solr.client.solrj.SolrExampleTests.testCommitWithin(SolrExampleTests.java:380) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160) Build Log (for compile errors): [...truncated 8713 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [POLL] JTS compile/test dependency
On Wed, Apr 6, 2011 at 2:48 PM, Grant Ingersoll grant.ingers...@gmail.com wrote: On Apr 6, 2011, at 2:44 PM, Ryan McKinley wrote: On Wed, Apr 6, 2011 at 2:39 PM, Grant Ingersoll grant.ingers...@gmail.com wrote: I don't see why we need a compile/test dependency is needed at all: We provide a factory based spatial module where one specifies a SpatialProvider. We have our own implementation of that which works for some set (or all) of the features. An external project (Apache Extras?) This is the non-starter for me. This would split the dev across multiple places and mean that the implementations I use (JTS) would not be a first class citizen in testing. This is the point of the whole debate... and why i think elsewhere may be a better option. That's a bit contradictory, though, isn't it? By definition, elsewhere means split too, I'm looking at the proposed spatial strategy stuff as a unit. It is obviously related to existing stuff, but is a very different thing. b/c we have stated the point search stuff isn't going anywhere. Agree -- i think the two would live happily together. Parts of existing point stuff may be deprecated if that seems appropriate. But other parts -- especially the general vector based function queries would never map to a high level spatial API anyway. And even if it does, you will still need to have a separate factory based implementation and ship a non-JTS provider, otherwise none of it can be packaged into a L/S release, so it's still the same amount of work. IIUC, we can distribute classes that were compiled against the JTS API, but not JTS itself. People could register what provider should get used and if JTS is available, it would load that one via reflection. ryan - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [POLL] JTS compile/test dependency
On Wed, Apr 6, 2011 at 3:01 PM, Robert Muir rcm...@gmail.com wrote: On Wed, Apr 6, 2011 at 2:54 PM, Ryan McKinley ryan...@gmail.com wrote: The code can be separated so that the the dependencies are as you suggest -- i have done this, but it makes testing more difficult and less robust. As part of the framework I've introduced a robust way to use the same data and and tests with different strategies and implementations. For me to work on it, i need the stuff i use to be a first class citizen in testing. Right, but this creates a problem for our testing too: if we open this can of worms with optional LGPL stuff I think its going to actually complicate build and testing. I already stated my concerns about this here: http://s.apache.org/vE I don't think the bdb should be used as justification already that the can of worms is already open. Personally I didn't realize the license it had, and for these same reasons, when i found this out i put up a patch on Grant's issue. I totally agree -- this was my preface to the whole discussion, and why i think it may be more appropriate to move spatial dev to an environment that can have different compile time choices. I'd like to figure a way that this is a win for everyone -- this is why i'm bothering with the prolonged discussion so that at least motivations are clear and all that. ryan - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene Spatial Future
Right - there's no need to try and make promises about the future. It seems unrelated to the questions at hand here. To be clear... I don't see any of this as promises -- obviously nothing happens until there is somethign concrete to evaluate. The point of this thread (for me anyway) is to raise my concerns, see what people are thinking, and be transparent about my choices. This discussion has made me feel like the right choice (for me) is to pursue spatial development somewhere else -- likely osgeo -- and down the road figure out how that could/should fit with solr. ryan - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Release PyLucene 3.1.0
Hi Bill, The QueryParser class changed a bit. More overloads were introduced on the Lucene side. You probably have a Python 'subclass' of QueryParser that needs a bit of work to adapt to the changes. Look at the new version in apache/pylucene-3.1/java/org/apache/pylucene/queryParser/PythonQueryParser.java and see the native methods that you're missing on your Python implementation. Also take a look at test/test_PythonQueryParser.py for an example on what the new methods look like (hint: getFieldQuery_quoted()). With a default QueryParser instance, your query parses just fine: from lucene import * initVM() jcc.JCCEnv object at 0x10029d0f0 qp = QueryParser(Version.LUCENE_CURRENT, foo, StandardAnalyzer(Version.LUCENE_CURRENT)) qp.parse(categories:RSSReader AND id:[0-00--000 TO 01299-51-3142-795] AND NOT categories:RSSReader/_noexpire_) Query: +categories:rssreader +id:[0-00--000 TO 01299-51-3142-795] -(categories:rssreader categories:_noexpire_) Andi.. On Wed, 6 Apr 2011, Bill Janssen wrote: I'm seeing parse failures on this query string: categories:RSSReader AND id:[0-00--000 TO 01299-51-3142-795] AND NOT categories:RSSReader/_noexpire_ thr002: RSSReader: Traceback (most recent call last): thr002: File /local/lib/UpLib-1.7.11/site-extensions/RSSReader.py, line 271, in _scan_rss_sites thr002: hits = repo.do_query(categories:RSSReader AND id:[0-00--000 TO %s] AND NOT categories:RSSReader/_noexpire_ % old_id) thr002: File /local/share/UpLib-1.7.11/code/uplib/repository.py, line 1172, in do_query thr002: results = self.do_full_query(query_string, searchtype) thr002: File /local/share/UpLib-1.7.11/code/uplib/repository.py, line 1196, in do_full_query thr002: results = self.pylucene_search(searchtype, query_string) thr002: File /local/share/UpLib-1.7.11/code/uplib/repository.py, line 1081, in pylucene_search thr002: v = self.__search_context.search(query_string) thr002: File /local/share/UpLib-1.7.11/code/uplib/indexing.py, line 913, in search thr002: parsed_query = query_parser.parseQ(query) thr002: File /local/share/UpLib-1.7.11/code/uplib/indexing.py, line 550, in parseQ thr002: query = QueryParser.parse(self, querystring) thr002: JavaError: org.apache.jcc.PythonException: getFieldQuery_quoted thr002: AttributeError: getFieldQuery_quoted thr002: Java stacktrace: thr002: org.apache.jcc.PythonException: getFieldQuery_quoted thr002: AttributeError: getFieldQuery_quoted thr002: at org.apache.pylucene.queryParser.PythonMultiFieldQueryParser.getFieldQuery_quoted(Native Method) thr002: at org.apache.pylucene.queryParser.PythonMultiFieldQueryParser.getFieldQuery(Unknown Source) thr002: at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1421) thr002: at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1309) thr002: at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1237) thr002: at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1226) thr002: at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:206) Bill
Re: [POLL] JTS compile/test dependency
On Apr 6, 2011, at 2:54 PM, Ryan McKinley wrote: I'm sorta confused about this (i'll probably offend someone here, but so be it) Don't worry Its hard for me to tell, i hope the reason isn't elegance, but why aren't we working on making a simple,supported,80-20 case in lucene that non-spatial-gurus (and users) understand and can maintain... for me it is all about testing and development. For my needs I can't use the simple stuff, and *need* the features that many users won't care about. I have not done any work on the existing spatial contrib because it does not meet my needs. The code can be separated so that the the dependencies are as you suggest -- i have done this, but it makes testing more difficult and less robust. As part of the framework I've introduced a robust way to use the same data and and tests with different strategies and implementations. For me to work on it, i need the stuff i use to be a first class citizen in testing. I don't follow why testing is any harder. The core interfaces and baseline implementation (along w/ point search) are tested here. The JTS project does it's own tests. You can certainly, on your machine, run the tests together. As I voted earlier, I think we should just define the interfaces here along w/ a baseline implementation that meets the 80/20 rule and the JTS project (or whatever else) lives somewhere else. I just don't see any valid way to bring in a compile/test dependency on JTS that we can support as a first class citizen, but that doesn't mean we can't support the framework which makes it easy to drop in and test on an individual's machine. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked
[ https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016500#comment-13016500 ] Stefan Matheis (steffkes) commented on SOLR-2399: - So, long time silence .. :) I've updated my github repo with a few things .. fixes and also a new threaddump-list, which was originally created by Upayavira, thanks! Also started a new Wiki-Page [http://wiki.apache.org/solr/ReworkedSolrAdminGUI] - i'm not really good at marketing, so the Page is really basic and everybody is invited to update it. As Upayavira stated last Tuesday there are still a few Things missing, compared to the current Admin-UI .. but i'd like to now: Which are the Features that *you* will need to give the reworked UI a try? One of the listed features? Or, will it be easier, if the Code would work from /solr/admin? Please let me know -- Feedback is really appreciated :) Solr Admin Interface, reworked -- Key: SOLR-2399 URL: https://issues.apache.org/jira/browse/SOLR-2399 Project: Solr Issue Type: Improvement Components: web gui Reporter: Stefan Matheis (steffkes) Priority: Minor Fix For: 4.0 *The idea was to create a new, fresh (and hopefully clean) Solr Admin Interface.* [Based on this [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] I've quickly created a Github-Repository (Just for me, to keep track of the changes) » https://github.com/steffkes/solr-admin [This commit shows the differences|https://github.com/steffkes/solr-admin/commit/5f80bb0ea9deb4b94162632912fe63386f869e0d] between old/existing index.jsp and my new one (which is could copy-cut/paste'd from the existing one). Main Action takes place in [js/script.js|https://github.com/steffkes/solr-admin/blob/master/js/script.js] which is actually neither clean nor pretty .. just work-in-progress. Actually it's Work in Progress, so ... give it a try. It's developed with Firefox as Browser, so, for a first impression .. please don't use _things_ like Internet Explorer or so ;o Jan already suggested a bunch of good things, i'm sure there are more ideas over there :) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2400) FieldAnalysisRequestHandler; add information about token-relation
[ https://issues.apache.org/jira/browse/SOLR-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016502#comment-13016502 ] Stefan Matheis (steffkes) commented on SOLR-2400: - I've checked out the current trunk-Revision .. but could not see any change on that, especially the raw-Term thing. Did i miss something else? Special Setting required for getting this property? FieldAnalysisRequestHandler; add information about token-relation - Key: SOLR-2400 URL: https://issues.apache.org/jira/browse/SOLR-2400 Project: Solr Issue Type: Improvement Components: Schema and Analysis Reporter: Stefan Matheis (steffkes) Priority: Minor Attachments: 110303_FieldAnalysisRequestHandler_output.xml, 110303_FieldAnalysisRequestHandler_view.png The XML-Output (simplified example attached) is missing one small information .. which could be very useful to build an nice Analysis-Output, and that's Token-Relation (if there is special/correct word for this, please correct me). Meaning, that is actually not possible to follow the Analysis-Process (completly) while the Tokenizers/Filters will drop out Tokens (f.e. StopWord) or split it into multiple Tokens (f.e. WordDelimiter). Would it be possible to include this Information? If so, it would be possible to create an improved Analysis-Page for the new Solr Admin (SOLR-2399) - short scribble attached -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2308) Separately specify a field's type
[ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016504#comment-13016504 ] Nikola Tankovic commented on LUCENE-2308: - Hi folks, I wrote an GSoC proposal for this issue, but I am missing a mentor for this issue. Any volonters? :) Separately specify a field's type - Key: LUCENE-2308 URL: https://issues.apache.org/jira/browse/LUCENE-2308 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: LUCENE-2308.patch This came up from dicussions on IRC. I'm summarizing here... Today when you make a Field to add to a document you can set things index or not, stored or not, analyzed or not, details like omitTfAP, omitNorms, index term vectors (separately controlling offsets/positions), etc. I think we should factor these out into a new class (FieldType?). Then you could re-use this FieldType instance across multiple fields. The Field instance would still hold the actual value. We could then do per-field analyzers by adding a setAnalyzer on the FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise for per-field codecs (with flex), where we now have PerFieldCodecWrapper). This would NOT be a schema! It's just refactoring what we already specify today. EG it's not serialized into the index. This has been discussed before, and I know Michael Busch opened a more ambitious (I think?) issue. I think this is a good first baby step. We could consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold off on that for starters... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2459) LogLevelSelection Servlet outputs plain HTML
LogLevelSelection Servlet outputs plain HTML Key: SOLR-2459 URL: https://issues.apache.org/jira/browse/SOLR-2459 Project: Solr Issue Type: Wish Components: web gui Reporter: Stefan Matheis (steffkes) Priority: Trivial The current available Output of the LogLevelSelection Servlet is plain HTML, which made it unpossible, to integrate the Logging-Information in the new Admin-UI. Format-Agnostic Output (like every [?] other Servlet offers) would be really nice! Just as an Idea for a future structure, the new admin-ui is [https://github.com/steffkes/solr-admin/blob/master/logging.json|actually based on that json-structure] :) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2308) Separately specify a field's type
[ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016522#comment-13016522 ] Nikola Tankovic commented on LUCENE-2308: - I submitted first draft of my proposal (LUCENE-2308 Separately specify a field's type), hope you can see it and give me some further pointers if needed. Thank you! Separately specify a field's type - Key: LUCENE-2308 URL: https://issues.apache.org/jira/browse/LUCENE-2308 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: LUCENE-2308.patch This came up from dicussions on IRC. I'm summarizing here... Today when you make a Field to add to a document you can set things index or not, stored or not, analyzed or not, details like omitTfAP, omitNorms, index term vectors (separately controlling offsets/positions), etc. I think we should factor these out into a new class (FieldType?). Then you could re-use this FieldType instance across multiple fields. The Field instance would still hold the actual value. We could then do per-field analyzers by adding a setAnalyzer on the FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise for per-field codecs (with flex), where we now have PerFieldCodecWrapper). This would NOT be a schema! It's just refactoring what we already specify today. EG it's not serialized into the index. This has been discussed before, and I know Michael Busch opened a more ambitious (I think?) issue. I think this is a good first baby step. We could consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold off on that for starters... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2458) post.jar fails on non-XML updateHandlers
[ https://issues.apache.org/jira/browse/SOLR-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016527#comment-13016527 ] Jan Høydahl commented on SOLR-2458: --- It might not be a bug according to the original design intentions. But the first thing we tell users is to try out post.jar to post stuff, and now we've even included csv and json examples for it. Then it's unnecessary to get the error Solr returned an error #400 undefined field commit/ thrown in your face - the error does not even explain the problem. I'll try to assemble a first patch for this next week some time, adding a separate POST with ?commit=true after the last file is POSTed. post.jar fails on non-XML updateHandlers Key: SOLR-2458 URL: https://issues.apache.org/jira/browse/SOLR-2458 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 3.1 Reporter: Jan Høydahl Labels: post.jar SimplePostTool.java by default tries to issue a commit after posting. Problem is that it does this by appending commit/ to the stream. This does not work when using non-XML requesthandler, such as CSV. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Release PyLucene 3.1.0
Andi Vajda va...@apache.org wrote: Hi Bill, The QueryParser class changed a bit. More overloads were introduced on the Lucene side. You probably have a Python 'subclass' of QueryParser that needs a bit of work to adapt to the changes. Thanks, but... All that adds up to breakage for my users. Bill
Re: [POLL] JTS compile/test dependency
On Wed, Apr 6, 2011 at 22:43, Robert Muir rcm...@gmail.com wrote: On Wed, Apr 6, 2011 at 2:12 PM, Ryan McKinley ryan...@gmail.com wrote: Some may be following the thread on spatial development... here is a quick summary, and a poll to help decide what may be the best next move. I'm hoping to introduce a high level spatial API that can be used for a variety of indexing strategies and computational needs. For simple point in BBox and point in WGS84 radius, this does not require any external libraries. To support more complex queries -- point in polygon, complex geometry intersections, etc -- we need an LGPL library JTS. The LGPL dependency is only needed to compile/test, there is no runtime requirement for JTS. To enable the more complicated options you would need to add JTS to the classpath and perhaps set a environment variable. This is essentially what we are now doing with the (soon to be removed) bdb contrib. I am trying to figure out the best home for this code and development to live. I think it is essential for the JTS support to be part of the core build/test -- splitting it into a separate module that is tested elsewhere is not an option. This raises the basic question of if people are willing to have the LGPL build dependency as part of the main lucene build. I think it is, but am sympathetic to the idea that it might not be. I'm sorta confused about this (i'll probably offend someone here, but so be it) We have a contrib module for spatial that is experimental, people want to deprecate, and say has problems. Why must the super-expert-polygon stuff sit with the basic capability that probably most users want: the ability to do basic searches (probably in combination with text too) in their app? Its hard for me to tell, i hope the reason isn't elegance, but why aren't we working on making a simple,supported,80-20 case in lucene that non-spatial-gurus (and users) understand and can maintain... then it would seem ideal for the complex stuff to be outside of this project with any dependencies it wants? Users are probably really confused about the spatial situation: is it because we are floundering around this expert stuff Handling Unicode code points outside of BMP is highly expert stuff as well. And is totally unneeded by 80% of the users for any other reason except elegance. I think you two guys can really understand each other here : ) -- Kirill Zakharenko/Кирилл Захаренко E-Mail/Jabber: ear...@gmail.com Phone: +7 (495) 683-567-4 ICQ: 104465785 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [POLL] JTS compile/test dependency
On Wed, Apr 6, 2011 at 5:07 PM, Earwin Burrfoot ear...@gmail.com wrote: Handling Unicode code points outside of BMP is highly expert stuff as well. And is totally unneeded by 80% of the users for any other reason except elegance. I think you two guys can really understand each other here : ) you are wrong: you either support unicode, or your application is buggy. Its not an optional feature, its the text standard used by the java programming language. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Release PyLucene 3.1.0
On Wed, 6 Apr 2011, Bill Janssen wrote: Andi Vajda va...@apache.org wrote: Hi Bill, The QueryParser class changed a bit. More overloads were introduced on the Lucene side. You probably have a Python 'subclass' of QueryParser that needs a bit of work to adapt to the changes. Thanks, but... All that adds up to breakage for my users. Unless I'm missing something here, you've got two options before you break your users: 1. fix your code before you ship it to them 2. don't upgrade Yes, you could say that the same applies to PyLucene, of course :-) I'm not exactly sure what kind of backwards compat promises Lucene Java made going from 3.0 to 3.1 but the new QueryParser method overloads and the fact that there is no support for method overloads in Python make PythonQueryParser a bit stuck between a rock and a hard place. If you see a better way to fix the mess with the _quoted and _slop variants for getFieldQuery, a patch is welcome. Andi..
Re: [VOTE] Release PyLucene 3.1.0
Andi Vajda va...@apache.org wrote: Look at the new version in apache/pylucene-3.1/java/org/apache/pylucene/queryParser/PythonQueryParser.java and see the native methods that you're missing on your Python Wow, looks like a lot. My implementations just have implementations of getFieldQuery() and getRangeQuery(). implementation. Also take a look at test/test_PythonQueryParser.py for an example on what the new methods look like (hint: getFieldQuery_quoted()). Looking at that, it seems that one needn't provide implementations for most of the native methods -- your example classes don't. How should one know which to implement? The ones that could get called, I suppose. Bill With a default QueryParser instance, your query parses just fine: from lucene import * initVM() jcc.JCCEnv object at 0x10029d0f0 qp = QueryParser(Version.LUCENE_CURRENT, foo, StandardAnalyzer(Version.LUCENE_CURRENT)) qp.parse(categories:RSSReader AND id:[0-00--000 TO 01299-51-3142-795] AND NOT categories:RSSReader/_noexpire_) Query: +categories:rssreader +id:[0-00--000 TO 01299-51-3142-795] -(categories:rssreader categories:_noexpire_) Andi.. On Wed, 6 Apr 2011, Bill Janssen wrote: I'm seeing parse failures on this query string: categories:RSSReader AND id:[0-00--000 TO 01299-51-3142-795] AND NOT categories:RSSReader/_noexpire_ thr002: RSSReader: Traceback (most recent call last): thr002: File /local/lib/UpLib-1.7.11/site-extensions/RSSReader.py, line 271, in _scan_rss_sites thr002: hits = repo.do_query(categories:RSSReader AND id:[0-00--000 TO %s] AND NOT categories:RSSReader/_noexpire_ % old_id) thr002: File /local/share/UpLib-1.7.11/code/uplib/repository.py, line 1172, in do_query thr002: results = self.do_full_query(query_string, searchtype) thr002: File /local/share/UpLib-1.7.11/code/uplib/repository.py, line 1196, in do_full_query thr002: results = self.pylucene_search(searchtype, query_string) thr002: File /local/share/UpLib-1.7.11/code/uplib/repository.py, line 1081, in pylucene_search thr002: v = self.__search_context.search(query_string) thr002: File /local/share/UpLib-1.7.11/code/uplib/indexing.py, line 913, in search thr002: parsed_query = query_parser.parseQ(query) thr002: File /local/share/UpLib-1.7.11/code/uplib/indexing.py, line 550, in parseQ thr002: query = QueryParser.parse(self, querystring) thr002: JavaError: org.apache.jcc.PythonException: getFieldQuery_quoted thr002: AttributeError: getFieldQuery_quoted thr002: Java stacktrace: thr002: org.apache.jcc.PythonException: getFieldQuery_quoted thr002: AttributeError: getFieldQuery_quoted thr002: at org.apache.pylucene.queryParser.PythonMultiFieldQueryParser.getFieldQuery_quoted(Native Method) thr002: at org.apache.pylucene.queryParser.PythonMultiFieldQueryParser.getFieldQuery(Unknown Source) thr002: at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1421) thr002: at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1309) thr002: at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1237) thr002: at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1226) thr002: at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:206) Bill repl: bad addresses: pylucene-...@lucene.apache.org Andi Vajda va...@apache.org -- junk after local@domain (Andi)
Re: [POLL] JTS compile/test dependency
I love this idea.! Bill Bell Sent from mobile On Apr 6, 2011, at 2:39 PM, Grant Ingersoll grant.ingers...@gmail.com wrote: I don't see why we need a compile/test dependency is needed at all: We provide a factory based spatial module where one specifies a SpatialProvider. We have our own implementation of that which works for some set (or all) of the features. An external project (Apache Extras?) could then go and implement that provider using JTS and can easily leverage all of our existing tests as well as it's own (using the handy-dandy test framework). Users who wish to use this would then simply include the external JAR (accepting that it is LGPL on their own free will) and telling L/S to use a different Provider. I thought this is what you already proposed. This allows innovation on our stuff (which may well surpass JTS at some point) as well as satisfies the short term win of JTS w/o violating ASF legal issues (per http://www.apache.org/legal/3party.html#options-optional). It would also make it easy for SIS to add it's own provider if and when it is mature enough. -Grant On Apr 6, 2011, at 2:12 PM, Ryan McKinley wrote: [] OK with JTS compile dependency. Spatial support should be a module [] OK with JTS, but think this spatial stuff should happen elsewhere [] Please, no LGPL dependencies in lucene build [x] Please no LGPL in Lucene build, please keep spatial framework here, please implement JTS piece in Apache Extras per a well-defined (and hosted in Lucene) SpatialProvider/Factory mechanism that is completely pluggable. Compile dependency is in JTS needs Lucene spatial module, not Lucene spatial module needs JTS. :-) -Grant
Re: [POLL] JTS compile/test dependency
On Thu, Apr 7, 2011 at 01:11, Robert Muir rcm...@gmail.com wrote: On Wed, Apr 6, 2011 at 5:07 PM, Earwin Burrfoot ear...@gmail.com wrote: Handling Unicode code points outside of BMP is highly expert stuff as well. And is totally unneeded by 80% of the users for any other reason except elegance. I think you two guys can really understand each other here : ) you are wrong: you either support unicode, or your application is buggy. Its not an optional feature, its the text standard used by the java programming language. You either handle the the Earth as a proper somewhat-ellipsoid, or your application is buggy. It's not an optional feature, it's even stronger than a standard - it is a physical fact experienced by all of us, earthlings. Though 80% of the users can throw geoids and unicode planes out of the window and live happily with some stupid local coordinate system and two-byte characters (some even manage with one-byte!). Yeah, they don't really care about being buggy in any geo/unicode-zealot's eyes. Having said that, it's cool that people like you two exist :) Because earth is round, maps are ugly, there are lots of different writing systems and someone has to deal with that. -- Kirill Zakharenko/Кирилл Захаренко E-Mail/Jabber: ear...@gmail.com Phone: +7 (495) 683-567-4 ICQ: 104465785 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Solr-2242
Solr-2242 inquiry. Who is going to help me get this committed? Issues? Bill Bell Sent from mobile On Apr 6, 2011, at 2:51 AM, Jeffrey Chang jclal...@gmail.com wrote: Hi All, I'd like to start small and see how I can contribute to SOLR development. By following http://wiki.apache.org/solr/HowToContribute, I've created a new defect (SOLR-2455) and created a patch for it. Not sure if I've done the right steps - can someone provide me some guidance if I'm on the right track to make some contributions? I'm still confused on how the committers decide which patch to include the fixes into. E.g. for the fixes I contribute, since I modified from Trunk, I'd assume it goes to SOLR 4.0.x? Also, should I modify JIRA csae status to Resolve myself? Thanks, Jeff
Indexing Non-Textual Data
Hi, I'm new to PyLucene, so forgive me if this is a newbie question. I have a dataset composed of several thousand lists of 128 integer features, each list associated with a class label. Would it be possible to use Lucene as a classifier, by indexing the label with respect to these integer features, and then classify a new list by finding the most similar labels with Lucene? I've been going through the PyLucene samples, but they only seem to involve indexing text, not continuous features (understandably). Could anyone point me to an example that indexes non-textual data? I think the project Lire (http://www.semanticmetadata.net/lire/) is using Lucene to do something similar to this, although with an emphasis on image features. I've dug into their code a little, but I'm not a strong Java programmer, so I'm not sure how they're pulling it off, nor how I might translate this into the PyLucene API. In your opinion, is this a practical use of Lucene? Regards, Chris
Re: My GSOC proposal
I have drafted the proposal on the official GSoC website . This is the link to my proposal http://goo.gl/uYXrV . Please do let me know if anything needs to be changed ,added or removed. I will keep on working on it till the deadline on the 8th. On Wed, Apr 6, 2011 at 11:41 PM, Michael McCandless luc...@mikemccandless.com wrote: That test code looks good -- you really should have seen awful performance had you used O_DIRECT since you read byte by byte. A more realistic test is to read a whole buffer (eg 4 KB is what Lucene now uses during merging, but we'd probably up this to like 1 MB when using O_DIRECT). Linus does hate O_DIRECT (see http://kerneltrap.org/node/7563), and for good reason: its existence means projects like ours can use it to work around limitations in the Linux IO apis that control the buffer cache when, otherwise, we might conceivably make patches to fix Linux correctly. It's an escape hatch, and we all use the escape hatch instead of trying to fix Linux for real... For example the NOREUSE flag is a no-op now in Linux, which is a shame, because that's precisely the flag we'd want to use for merging (along with SEQUENTIAL). Had that flag been implemented well, it'd give better results than our workaround using O_DIRECT. Anyway, giving how things are, until we can get more control (wy up in Javaland) over the buffer cache, O_DIRECT (via native directory impl through JNI) is our only real option, today. More details here: http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html Note that other OSs likely do a better job and actually implement NOREUSE, and similar APIs, so the generic Unix/WindowsNativeDirectory would simply use NOREUSE on these platforms for I/O during segment merging. Mike http://blog.mikemccandless.com On Wed, Apr 6, 2011 at 11:56 AM, Varun Thacker varunthacker1...@gmail.com wrote: Hi. I wrote a sample code to test out speed difference between SEQUENTIAL and O_DIRECT( I used the madvise flag-MADV_DONTNEED) reads . This is the link to the code: http://pastebin.com/8QywKGyS There was a speed difference which when i switched between the two flags. I have not used the O_DIRECT flag because Linus had criticized it. Is this what the flags are intended to be used for ? This is just a sample code with a test file . On Wed, Apr 6, 2011 at 12:11 PM, Simon Willnauer simon.willna...@googlemail.com wrote: Hey Varun, On Tue, Apr 5, 2011 at 11:07 PM, Michael McCandless luc...@mikemccandless.com wrote: Hi Varun, Those two issues would make a great GSoC! Comments below... +1 On Tue, Apr 5, 2011 at 1:56 PM, Varun Thacker varunthacker1...@gmail.com wrote: I would like to combine two tasks as part of my project namely-Directory createOutput and openInput should take an IOContext (Lucene-2793) and compliment it by Generalize DirectIOLinuxDir to UnixDir (Lucene-2795). The first part of the project is aimed at significantly reducing time taken to search during indexing by adding an IOContext which would store buffer size and have options to bypass the OS’s buffer cache (This is what causes the slowdown in search ) and other hints. Once completed I would move on to Lucene-2795 and generalize the Directory implementation to make a UnixDirectory . So, the first part (LUCENE-2793) should cause no change at all to performance, functionality, etc., because it's merely installing the plumbing (IOContext threaded throughout the low-level store APIs in Lucene) so that higher levels can send important details down to the Directory. We'd fix IndexWriter/IndexReader to fill out this IOContext with the details (merging, flushing, new reader, etc.). There's some fun/freedom here in figuring out just what details should be included in IOContext... (eg: is it low level set buffer size to 4 KB or is it high level I am opening a new near-real-time reader). This first step is a rote cutover, just changing APIs but in no way taking advantage of the new APIs. The 2nd step (LUCENE-2795) would then take advantage of this plumbing, by creating a UnixDir impl that, using JNI (C code), passes advanced flags when opening files, based on the incoming IOContext. The goal is a single UnixDir that has ifdefs so that it's usable across multiple Unices, and eg would use direct IO if the context is merging. If we are ambitious we could rope Windows into the mix, too, and then this would be NativeDir... We can measure success by validating that a big merge while searching does not hurt search performance? (Ie we should be able to reproduce the results from http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html ). Thanks for the summary mike! I have spoken to Micheal McCandless and Simon Willnauer about undertaking these tasks. Micheal McCandless has agreed to mentor me . I would love to be able to contribute
[jira] [Commented] (LUCENE-2308) Separately specify a field's type
[ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016571#comment-13016571 ] Simon Willnauer commented on LUCENE-2308: - bq. I wrote an GSoC proposal for this issue, but I am missing a mentor for this issue. Any volonters? don't worry we will find somebody to mentor! bq. I submitted first draft of my proposal (LUCENE-2308 Separately specify a field's type), hope you can see it and give me some further pointers if needed. yep I can see it - looks good so far. Separately specify a field's type - Key: LUCENE-2308 URL: https://issues.apache.org/jira/browse/LUCENE-2308 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: LUCENE-2308.patch This came up from dicussions on IRC. I'm summarizing here... Today when you make a Field to add to a document you can set things index or not, stored or not, analyzed or not, details like omitTfAP, omitNorms, index term vectors (separately controlling offsets/positions), etc. I think we should factor these out into a new class (FieldType?). Then you could re-use this FieldType instance across multiple fields. The Field instance would still hold the actual value. We could then do per-field analyzers by adding a setAnalyzer on the FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise for per-field codecs (with flex), where we now have PerFieldCodecWrapper). This would NOT be a schema! It's just refactoring what we already specify today. EG it's not serialized into the index. This has been discussed before, and I know Michael Busch opened a more ambitious (I think?) issue. I think this is a good first baby step. We could consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold off on that for starters... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Indexing Non-Textual Data
Hi, On Wed, 6 Apr 2011, Chris Spencer wrote: I'm new to PyLucene, so forgive me if this is a newbie question. I have a dataset composed of several thousand lists of 128 integer features, each list associated with a class label. Would it be possible to use Lucene as a classifier, by indexing the label with respect to these integer features, and then classify a new list by finding the most similar labels with Lucene? I believe there is support in Lucene for indexing numeric values using a Trie. Please ask on java-u...@lucene.apache.org (subscribe first by sending mail to jave-user-subscr...@lucene.apache.org). There are many more Lucene experts with answers there. For example, this class may be relevant: http://lucene.apache.org/java/3_1_0/api/core/org/apache/lucene/document/NumericField.html Andi.. I've been going through the PyLucene samples, but they only seem to involve indexing text, not continuous features (understandably). Could anyone point me to an example that indexes non-textual data? I think the project Lire (http://www.semanticmetadata.net/lire/) is using Lucene to do something similar to this, although with an emphasis on image features. I've dug into their code a little, but I'm not a strong Java programmer, so I'm not sure how they're pulling it off, nor how I might translate this into the PyLucene API. In your opinion, is this a practical use of Lucene? Regards, Chris
[jira] [Assigned] (LUCENE-2308) Separately specify a field's type
[ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-2308: -- Assignee: Michael McCandless Separately specify a field's type - Key: LUCENE-2308 URL: https://issues.apache.org/jira/browse/LUCENE-2308 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: LUCENE-2308.patch This came up from dicussions on IRC. I'm summarizing here... Today when you make a Field to add to a document you can set things index or not, stored or not, analyzed or not, details like omitTfAP, omitNorms, index term vectors (separately controlling offsets/positions), etc. I think we should factor these out into a new class (FieldType?). Then you could re-use this FieldType instance across multiple fields. The Field instance would still hold the actual value. We could then do per-field analyzers by adding a setAnalyzer on the FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise for per-field codecs (with flex), where we now have PerFieldCodecWrapper). This would NOT be a schema! It's just refactoring what we already specify today. EG it's not serialized into the index. This has been discussed before, and I know Michael Busch opened a more ambitious (I think?) issue. I think this is a good first baby step. We could consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold off on that for starters... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2308) Separately specify a field's type
[ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016582#comment-13016582 ] Michael McCandless commented on LUCENE-2308: Hi Nikola, I'd be happy to mentor for this issue! Your proposal looks great. Separately specify a field's type - Key: LUCENE-2308 URL: https://issues.apache.org/jira/browse/LUCENE-2308 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: LUCENE-2308.patch This came up from dicussions on IRC. I'm summarizing here... Today when you make a Field to add to a document you can set things index or not, stored or not, analyzed or not, details like omitTfAP, omitNorms, index term vectors (separately controlling offsets/positions), etc. I think we should factor these out into a new class (FieldType?). Then you could re-use this FieldType instance across multiple fields. The Field instance would still hold the actual value. We could then do per-field analyzers by adding a setAnalyzer on the FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise for per-field codecs (with flex), where we now have PerFieldCodecWrapper). This would NOT be a schema! It's just refactoring what we already specify today. EG it's not serialized into the index. This has been discussed before, and I know Michael Busch opened a more ambitious (I think?) issue. I think this is a good first baby step. We could consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold off on that for starters... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Release PyLucene 3.1.0
Andi Vajda va...@apache.org wrote: Unless I'm missing something here, you've got two options before you break your users: 1. fix your code before you ship it to them Unfortunately, the code is out there for building, and the instructions, also already out there, say, PyLucene 2.4 to 3.X. I should be more careful :-). 2. don't upgrade It's the users that upgrade, not me. Yes, you could say that the same applies to PyLucene, of course :-) :-) I'm not exactly sure what kind of backwards compat promises Lucene Java made going from 3.0 to 3.1 but the new QueryParser method overloads and the fact that there is no support for method overloads in Python make PythonQueryParser a bit stuck between a rock and a hard place. If you see a better way to fix the mess with the _quoted and _slop variants for getFieldQuery, a patch is welcome. Sure. Bill
Re: [VOTE] Release PyLucene 3.1.0
On Wed, 6 Apr 2011, Bill Janssen wrote: Andi Vajda va...@apache.org wrote: Unless I'm missing something here, you've got two options before you break your users: 1. fix your code before you ship it to them Unfortunately, the code is out there for building, and the instructions, also already out there, say, PyLucene 2.4 to 3.X. I should be more careful :-). Given that APIs changed quite a bit between 2.x and 3.0 and that 2.x deprecated APIs are removed from 3.1+ (unless I'm confused about Lucene's deprecation policy (*)), your statement is a bit optimistic. (*) maybe it's not until 4.0 that they're going to be removed ? I can't remember at the moment. Mike, if you read this, can you please correct me if I'm wrong ? Andi.. 2. don't upgrade It's the users that upgrade, not me. Yes, you could say that the same applies to PyLucene, of course :-) :-) I'm not exactly sure what kind of backwards compat promises Lucene Java made going from 3.0 to 3.1 but the new QueryParser method overloads and the fact that there is no support for method overloads in Python make PythonQueryParser a bit stuck between a rock and a hard place. If you see a better way to fix the mess with the _quoted and _slop variants for getFieldQuery, a patch is welcome. Sure. Bill
Re: [VOTE] Release PyLucene 3.1.0
On Wed, Apr 6, 2011 at 6:38 PM, Andi Vajda va...@apache.org wrote: On Wed, 6 Apr 2011, Bill Janssen wrote: Andi Vajda va...@apache.org wrote: Unless I'm missing something here, you've got two options before you break your users: 1. fix your code before you ship it to them Unfortunately, the code is out there for building, and the instructions, also already out there, say, PyLucene 2.4 to 3.X. I should be more careful :-). Given that APIs changed quite a bit between 2.x and 3.0 and that 2.x deprecated APIs are removed from 3.1+ (unless I'm confused about Lucene's deprecation policy (*)), your statement is a bit optimistic. (*) maybe it's not until 4.0 that they're going to be removed ? I can't remember at the moment. Mike, if you read this, can you please correct me if I'm wrong ? Actually, any API deprecated in any Lucene 2.x release is removed in 3.0. (Same for 3.x to 4.0, etc.). Mike http://blog.mikemccandless.com
Re: [VOTE] Release PyLucene 3.1.0
On Wed, 6 Apr 2011, Michael McCandless wrote: On Wed, Apr 6, 2011 at 6:38 PM, Andi Vajda va...@apache.org wrote: On Wed, 6 Apr 2011, Bill Janssen wrote: Andi Vajda va...@apache.org wrote: Unless I'm missing something here, you've got two options before you break your users: 1. fix your code before you ship it to them Unfortunately, the code is out there for building, and the instructions, also already out there, say, PyLucene 2.4 to 3.X. I should be more careful :-). Given that APIs changed quite a bit between 2.x and 3.0 and that 2.x deprecated APIs are removed from 3.1+ (unless I'm confused about Lucene's deprecation policy (*)), your statement is a bit optimistic. (*) maybe it's not until 4.0 that they're going to be removed ? I can't remember at the moment. Mike, if you read this, can you please correct me if I'm wrong ? Actually, any API deprecated in any Lucene 2.x release is removed in 3.0. (Same for 3.x to 4.0, etc.). Ah, I thought 3.0 had them both. Ok, duly noted. Thanks ! Andi..
[jira] [Updated] (LUCENE-2959) [GSoC] Implementing State of the Art Ranking for Lucene
[ https://issues.apache.org/jira/browse/LUCENE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2959: Assignee: Robert Muir setting myself as assignee as i'd like to mentor this one. [GSoC] Implementing State of the Art Ranking for Lucene --- Key: LUCENE-2959 URL: https://issues.apache.org/jira/browse/LUCENE-2959 Project: Lucene - Java Issue Type: New Feature Components: Examples, Javadocs, Query/Scoring Reporter: David Mark Nemeskey Assignee: Robert Muir Labels: gsoc2011, lucene-gsoc-11, mentor Attachments: LUCENE-2959_mockdfr.patch, implementation_plan.pdf, proposal.pdf Lucene employs the Vector Space Model (VSM) to rank documents, which compares unfavorably to state of the art algorithms, such as BM25. Moreover, the architecture is tailored specically to VSM, which makes the addition of new ranking functions a non- trivial task. This project aims to bring state of the art ranking methods to Lucene and to implement a query architecture with pluggable ranking functions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: My GSOC proposal
Hi Varun, Nice proposal, very complete. Only one thing missing, you should mention somewhere how many hours a week you are willing to spend working on the project and whether there is any holiday you won't be able to work. Good luck ;) On Wed, Apr 6, 2011 at 5:57 PM, Varun Thacker varunthacker1...@gmail.comwrote: I have drafted the proposal on the official GSoC website . This is the link to my proposal http://goo.gl/uYXrV . Please do let me know if anything needs to be changed ,added or removed. I will keep on working on it till the deadline on the 8th. On Wed, Apr 6, 2011 at 11:41 PM, Michael McCandless luc...@mikemccandless.com wrote: That test code looks good -- you really should have seen awful performance had you used O_DIRECT since you read byte by byte. A more realistic test is to read a whole buffer (eg 4 KB is what Lucene now uses during merging, but we'd probably up this to like 1 MB when using O_DIRECT). Linus does hate O_DIRECT (see http://kerneltrap.org/node/7563), and for good reason: its existence means projects like ours can use it to work around limitations in the Linux IO apis that control the buffer cache when, otherwise, we might conceivably make patches to fix Linux correctly. It's an escape hatch, and we all use the escape hatch instead of trying to fix Linux for real... For example the NOREUSE flag is a no-op now in Linux, which is a shame, because that's precisely the flag we'd want to use for merging (along with SEQUENTIAL). Had that flag been implemented well, it'd give better results than our workaround using O_DIRECT. Anyway, giving how things are, until we can get more control (wy up in Javaland) over the buffer cache, O_DIRECT (via native directory impl through JNI) is our only real option, today. More details here: http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html Note that other OSs likely do a better job and actually implement NOREUSE, and similar APIs, so the generic Unix/WindowsNativeDirectory would simply use NOREUSE on these platforms for I/O during segment merging. Mike http://blog.mikemccandless.com On Wed, Apr 6, 2011 at 11:56 AM, Varun Thacker varunthacker1...@gmail.com wrote: Hi. I wrote a sample code to test out speed difference between SEQUENTIAL and O_DIRECT( I used the madvise flag-MADV_DONTNEED) reads . This is the link to the code: http://pastebin.com/8QywKGyS There was a speed difference which when i switched between the two flags. I have not used the O_DIRECT flag because Linus had criticized it. Is this what the flags are intended to be used for ? This is just a sample code with a test file . On Wed, Apr 6, 2011 at 12:11 PM, Simon Willnauer simon.willna...@googlemail.com wrote: Hey Varun, On Tue, Apr 5, 2011 at 11:07 PM, Michael McCandless luc...@mikemccandless.com wrote: Hi Varun, Those two issues would make a great GSoC! Comments below... +1 On Tue, Apr 5, 2011 at 1:56 PM, Varun Thacker varunthacker1...@gmail.com wrote: I would like to combine two tasks as part of my project namely-Directory createOutput and openInput should take an IOContext (Lucene-2793) and compliment it by Generalize DirectIOLinuxDir to UnixDir (Lucene-2795). The first part of the project is aimed at significantly reducing time taken to search during indexing by adding an IOContext which would store buffer size and have options to bypass the OS’s buffer cache (This is what causes the slowdown in search ) and other hints. Once completed I would move on to Lucene-2795 and generalize the Directory implementation to make a UnixDirectory . So, the first part (LUCENE-2793) should cause no change at all to performance, functionality, etc., because it's merely installing the plumbing (IOContext threaded throughout the low-level store APIs in Lucene) so that higher levels can send important details down to the Directory. We'd fix IndexWriter/IndexReader to fill out this IOContext with the details (merging, flushing, new reader, etc.). There's some fun/freedom here in figuring out just what details should be included in IOContext... (eg: is it low level set buffer size to 4 KB or is it high level I am opening a new near-real-time reader). This first step is a rote cutover, just changing APIs but in no way taking advantage of the new APIs. The 2nd step (LUCENE-2795) would then take advantage of this plumbing, by creating a UnixDir impl that, using JNI (C code), passes advanced flags when opening files, based on the incoming IOContext. The goal is a single UnixDir that has ifdefs so that it's usable across multiple Unices, and eg would use direct IO if the context is merging. If we are ambitious we could rope Windows into the mix, too, and then this would be NativeDir... We can measure success by validating that a big merge while searching does not hurt search performance? (Ie we should be
GSoC Lucene proposals
Hi students, We are receiving very good proposals this year, I am sure mentors are very happy :) I have one suggestion to make our (mentors) lives easier. Please, add the JIRA identifier to your proposal's title, example: LUCENE-2883: Consolidate Solr Lucene FunctionQuery into modules. This will let mentors to quickly search for Lucene and Solr proposals, as all Apache proposals are mixed and there is no way to sort by project. Thanks! -- Adriano Crestani
Re: GSoC Lucene proposals
Done! --- Em qua, 6/4/11, Adriano Crestani adrianocrest...@apache.org escreveu: De: Adriano Crestani adrianocrest...@apache.org Assunto: GSoC Lucene proposals Para: dev@lucene.apache.org Data: Quarta-feira, 6 de Abril de 2011, 22:43 Hi students, We are receiving very good proposals this year, I am sure mentors are very happy :) I have one suggestion to make our (mentors) lives easier. Please, add the JIRA identifier to your proposal's title, example: LUCENE-2883: Consolidate Solr Lucene FunctionQuery into modules. This will let mentors to quickly search for Lucene and Solr proposals, as all Apache proposals are mixed and there is no way to sort by project. Thanks! --Adriano Crestani
[jira] [Commented] (LUCENE-1768) NumericRange support for new query parser
[ https://issues.apache.org/jira/browse/LUCENE-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016653#comment-13016653 ] Vinicius Barros commented on LUCENE-1768: - Thanks for reviewing it Adriano. I updated the proposal to clarify it's the contrib query parser. NumericRange support for new query parser - Key: LUCENE-1768 URL: https://issues.apache.org/jira/browse/LUCENE-1768 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Affects Versions: 2.9 Reporter: Uwe Schindler Assignee: Adriano Crestani Labels: contrib, gsoc, gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 It would be good to specify some type of schema for the query parser in future, to automatically create NumericRangeQuery for different numeric types? It would then be possible to index a numeric value (double,float,long,int) using NumericField and then the query parser knows, which type of field this is and so it correctly creates a NumericRangeQuery for strings like [1.567..*] or (1.787..19.5]. There is currently no way to extract if a field is numeric from the index, so the user will have to configure the FieldConfig objects in the ConfigHandler. But if this is done, it will not be that difficult to implement the rest. The only difference between the current handling of RangeQuery is then the instantiation of the correct Query type and conversion of the entered numeric values (simple Number.valueOf(...) cast of the user entered numbers). Evenerything else is identical, NumericRangeQuery also supports the MTQ rewrite modes (as it is a MTQ). Another thing is a change in Date semantics. There are some strange flags in the current parser that tells it how to handle dates. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: My GSOC proposal
I have updated my proposal online to mention the time I would be able to dedicate to the project . On Thu, Apr 7, 2011 at 7:05 AM, Adriano Crestani adrianocrest...@gmail.comwrote: Hi Varun, Nice proposal, very complete. Only one thing missing, you should mention somewhere how many hours a week you are willing to spend working on the project and whether there is any holiday you won't be able to work. Good luck ;) On Wed, Apr 6, 2011 at 5:57 PM, Varun Thacker varunthacker1...@gmail.comwrote: I have drafted the proposal on the official GSoC website . This is the link to my proposal http://goo.gl/uYXrV . Please do let me know if anything needs to be changed ,added or removed. I will keep on working on it till the deadline on the 8th. On Wed, Apr 6, 2011 at 11:41 PM, Michael McCandless luc...@mikemccandless.com wrote: That test code looks good -- you really should have seen awful performance had you used O_DIRECT since you read byte by byte. A more realistic test is to read a whole buffer (eg 4 KB is what Lucene now uses during merging, but we'd probably up this to like 1 MB when using O_DIRECT). Linus does hate O_DIRECT (see http://kerneltrap.org/node/7563), and for good reason: its existence means projects like ours can use it to work around limitations in the Linux IO apis that control the buffer cache when, otherwise, we might conceivably make patches to fix Linux correctly. It's an escape hatch, and we all use the escape hatch instead of trying to fix Linux for real... For example the NOREUSE flag is a no-op now in Linux, which is a shame, because that's precisely the flag we'd want to use for merging (along with SEQUENTIAL). Had that flag been implemented well, it'd give better results than our workaround using O_DIRECT. Anyway, giving how things are, until we can get more control (wy up in Javaland) over the buffer cache, O_DIRECT (via native directory impl through JNI) is our only real option, today. More details here: http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html Note that other OSs likely do a better job and actually implement NOREUSE, and similar APIs, so the generic Unix/WindowsNativeDirectory would simply use NOREUSE on these platforms for I/O during segment merging. Mike http://blog.mikemccandless.com On Wed, Apr 6, 2011 at 11:56 AM, Varun Thacker varunthacker1...@gmail.com wrote: Hi. I wrote a sample code to test out speed difference between SEQUENTIAL and O_DIRECT( I used the madvise flag-MADV_DONTNEED) reads . This is the link to the code: http://pastebin.com/8QywKGyS There was a speed difference which when i switched between the two flags. I have not used the O_DIRECT flag because Linus had criticized it. Is this what the flags are intended to be used for ? This is just a sample code with a test file . On Wed, Apr 6, 2011 at 12:11 PM, Simon Willnauer simon.willna...@googlemail.com wrote: Hey Varun, On Tue, Apr 5, 2011 at 11:07 PM, Michael McCandless luc...@mikemccandless.com wrote: Hi Varun, Those two issues would make a great GSoC! Comments below... +1 On Tue, Apr 5, 2011 at 1:56 PM, Varun Thacker varunthacker1...@gmail.com wrote: I would like to combine two tasks as part of my project namely-Directory createOutput and openInput should take an IOContext (Lucene-2793) and compliment it by Generalize DirectIOLinuxDir to UnixDir (Lucene-2795). The first part of the project is aimed at significantly reducing time taken to search during indexing by adding an IOContext which would store buffer size and have options to bypass the OS’s buffer cache (This is what causes the slowdown in search ) and other hints. Once completed I would move on to Lucene-2795 and generalize the Directory implementation to make a UnixDirectory . So, the first part (LUCENE-2793) should cause no change at all to performance, functionality, etc., because it's merely installing the plumbing (IOContext threaded throughout the low-level store APIs in Lucene) so that higher levels can send important details down to the Directory. We'd fix IndexWriter/IndexReader to fill out this IOContext with the details (merging, flushing, new reader, etc.). There's some fun/freedom here in figuring out just what details should be included in IOContext... (eg: is it low level set buffer size to 4 KB or is it high level I am opening a new near-real-time reader). This first step is a rote cutover, just changing APIs but in no way taking advantage of the new APIs. The 2nd step (LUCENE-2795) would then take advantage of this plumbing, by creating a UnixDir impl that, using JNI (C code), passes advanced flags when opening files, based on the incoming IOContext. The goal is a single UnixDir that has ifdefs so that it's usable across multiple Unices, and eg would use direct IO if the context is merging. If we are ambitious we could