[Fwd: TermEnum usage]
Without any answers, I'm reposting once. Do I have to post bug report ? Let me know Thanks a lot Vincent DARON ASK ---BeginMessage--- Hi all I'm using Lucene.NET 2.9.2.2 from SVN. I try to iterate terms of a field in my index, todo so, i'm using IndexReader.Terms(f) that return a TermEnum. The classic usage of iterator is the folowing pattern TermNum enu = reader.Terms(new Term(myfield)); while(enu.Next()) { ProcessTerm(enu.Term()); } But it seems that the TermEnum is already on the first item BEFORE the first call to Next. The previous code will therefore always skip the first Term. Bug ? Thanks Vincent DARON ASK ---End Message---
Re: [Fwd: TermEnum usage]
Hey Vincent, I am not a dev, but for example look at FuzzyQuery.cs (starting at line 148): do { float score = 0.0f; Term t = enumerator.Term(); if (t != null) { // some stuff with t } } while (enumerator.Next()); you can see that it is expecting the enumerator to have a term in it before it calls next [i.e. it is using do...while rather than just while]. So I think this is expected behavior, although it may not be intuitive. Hope this helps, -Ben --- On Thu, 7/22/10, Vincent DARON vda...@ask.be wrote: From: Vincent DARON vda...@ask.be Subject: [Fwd: TermEnum usage] To: lucene-net-dev lucene-net-dev@lucene.apache.org Date: Thursday, July 22, 2010, 10:10 AM Without any answers, I'm reposting once. Do I have to post bug report ? Let me know Thanks a lot Vincent DARON ASK
Re: API changes between 2.9.2 and 2.9.3
On Jul 22, 2010, at 2:09, Bill Janssen jans...@parc.com wrote: Andi Vajda va...@apache.org wrote: Porting your stuff to 3.0 is thus highly recommended instead of complaining about broken (my bad) long- deprecated APIs. Hey, take 2.9.3 down, and announce no further pylucene support for 2.x, and I'll stop talking about it. The value in 2.9.3 is really just in the Lucene fixes since 2.9.2. If you want them without the new JCC which is tripping you up, take a 2.9.2 build tree and change the Lucene svn url near the top of the Makefile to point at the 2.9.3 sources. This should just work (tm). Andi.. Bill
Re: API changes between 2.9.2 and 2.9.3
Andi Vajda va...@apache.org wrote: On Jul 22, 2010, at 2:09, Bill Janssen jans...@parc.com wrote: Andi Vajda va...@apache.org wrote: Porting your stuff to 3.0 is thus highly recommended instead of complaining about broken (my bad) long- deprecated APIs. Hey, take 2.9.3 down, and announce no further pylucene support for 2.x, and I'll stop talking about it. The value in 2.9.3 is really just in the Lucene fixes since 2.9.2. If you want them without the new JCC which is tripping you up, take a 2.9.2 build tree and change the Lucene svn url near the top of the Makefile to point at the 2.9.3 sources. This should just work (tm). Another fix is to edit the common-build.xml file in the Lucene subtree to remove the 1.4 restriction. That lets it build with Java 5 and that adds the Iterable interface, and things work as they did, even with jcc 2.6. Bill
Re: API changes between 2.9.2 and 2.9.3
On Jul 22, 2010, at 17:52, Bill Janssen jans...@parc.com wrote: Andi Vajda va...@apache.org wrote: On Jul 22, 2010, at 2:09, Bill Janssen jans...@parc.com wrote: Andi Vajda va...@apache.org wrote: Porting your stuff to 3.0 is thus highly recommended instead of complaining about broken (my bad) long- deprecated APIs. Hey, take 2.9.3 down, and announce no further pylucene support for 2.x, and I'll stop talking about it. The value in 2.9.3 is really just in the Lucene fixes since 2.9.2. If you want them without the new JCC which is tripping you up, take a 2.9.2 build tree and change the Lucene svn url near the top of the Makefile to point at the 2.9.3 sources. This should just work (tm). Another fix is to edit the common-build.xml file in the Lucene subtree to remove the 1.4 restriction. That lets it build with Java 5 and that adds the Iterable interface, and things work as they did, even with jcc 2.6. Even better. Still, none of the Lucene 2.9 code uses any of the Java 1.5 features directly, hence why Lucene 3.0 is yet a better choice. Andi.. Bill
[jira] Commented: (SOLR-1731) ArrayIndexOutOfBoundsException when highlighting
[ https://issues.apache.org/jira/browse/SOLR-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891018#action_12891018 ] Leonhard Maylein commented on SOLR-1731: We have the same problem whenever we search for a word which has synonyms defined. ArrayIndexOutOfBoundsException when highlighting Key: SOLR-1731 URL: https://issues.apache.org/jira/browse/SOLR-1731 Project: Solr Issue Type: Bug Components: highlighter Affects Versions: 1.4 Reporter: Tim Underwood Priority: Minor I'm seeing an java.lang.ArrayIndexOutOfBoundsException when trying to highlight for certain queries. The error seems to be an issue with the combination of the ShingleFilterFactory, PositionFilterFactory and the LengthFilterFactory. Here's my fieldType definition: fieldType name=textSku class=solr.TextField positionIncrementGap=100 omitNorms=true analyzer type=index tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ filter class=solr.LengthFilterFactory min=2 max=100/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.ShingleFilterFactory maxShingleSize=8 outputUnigrams=true/ filter class=solr.PositionFilterFactory / filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ filter class=solr.LengthFilterFactory min=2 max=100/ !-- works if this is commented out -- /analyzer /fieldType Here's the field definition: field name=sku_new type=textSku indexed=true stored=true omitNorms=true/ Here's a sample doc: add doc field name=id1/field field name=sku_newA 1280 C/field /doc /add Doing a query for sku_new:A 1280 C and requesting highlighting throws the exception (full stack trace below): http://localhost:8983/solr/select/?q=sku_new%3A%22A+1280+C%22version=2.2start=0rows=10indent=onhl=onhl.fl=sku_newfl=* If I comment out the LengthFilterFactory from my query analyzer section everything seems to work. Commenting out just the PositionFilterFactory also makes the exception go away and seems to work for this specific query. Full stack trace: java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:202) at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:414) at org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:216) at org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:184) at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:226) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:335) at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at
[jira] Updated: (SOLR-1804) Upgrade Carrot2 to 3.2.0
[ https://issues.apache.org/jira/browse/SOLR-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stanislaw Osinski updated SOLR-1804: Attachment: SOLR-1804-carrot2-3.4.0-dev.patch Ok, here's another shot. This time, the language model factory includes support for Chinese. To avoid compilation issues, the classes are loaded through reflection. Not pretty, but works. If there's a way to have access to smart chinese at compilation time, let me know, I can remove the reflection stuff, so that the refactoring is more reliable. Upgrade Carrot2 to 3.2.0 Key: SOLR-1804 URL: https://issues.apache.org/jira/browse/SOLR-1804 Project: Solr Issue Type: Improvement Components: contrib - Clustering Reporter: Grant Ingersoll Assignee: Grant Ingersoll Attachments: SOLR-1804-carrot2-3.4.0-dev.patch http://project.carrot2.org/release-3.2.0-notes.html Carrot2 is now LGPL free, which means we should be able to bundle the binary! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
unsubscribe
- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891085#action_12891085 ] Michael McCandless commented on LUCENE-2324: This is looking awesome Michael! I love the removal of *PerThread -- they are all logically absorbed into DWPT, so everything is now per thread. I still see usage of docStoreOffset, but aren't we doing away with shared doc stores with the cutover to DWPT? I think you can further simplify DocumentsWriterPerThread.DocWriter; in fact I think you can remove it all subclasses in consumers! The consumers can simply directly write their files. The only reason this class was created was because we have to interleave docs when writing the doc stores; this is no longer needed since doc stores are again private to the segment. I think we don't need PerDocBuffer, either. And this also simplifies RAM usage tracking! Also, we don't need separate closeDocStore; it should just be closed during flush. I like the ThreadAffinityDocumentsWriterThreadPool; it's the default right (I see some tests explicitly setting in on IWC; not sure why)? We should make the in-RAM deletes impl somehow pluggable? Per thread DocumentsWriters that write their own private segments - Key: LUCENE-2324 URL: https://issues.apache.org/jira/browse/LUCENE-2324 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch See LUCENE-2293 for motivation and more details. I'm copying here Mike's summary he posted on 2293: Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and normal segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1799) Unicode compression
[ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1799: Attachment: LUCENE-1799_big.patch attached is a really really rough patch that sets bocu-1 as the default encoding. Beware: its a work in progress and a lot of the patch is auto-generated (eclipse) so some things need to be reverted. Most tests pass, the idea is to find bugs in tests etc that abuse bytesref/assume utf-8 encoding, things like that. Unicode compression --- Key: LUCENE-1799 URL: https://issues.apache.org/jira/browse/LUCENE-1799 Project: Lucene - Java Issue Type: New Feature Components: Store Affects Versions: 2.4.1 Reporter: DM Smith Priority: Minor Attachments: LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799_big.patch In lucene-1793, there is the off-topic suggestion to provide compression of Unicode data. The motivation was a custom encoding in a Russian analyzer. The original supposition was that it provided a more compact index. This led to the comment that a different or compressed encoding would be a generally useful feature. BOCU-1 was suggested as a possibility. This is a patented algorithm by IBM with an implementation in ICU. If Lucene provide it's own implementation a freely avIlable, royalty-free license would need to be obtained. SCSU is another Unicode compression algorithm that could be used. An advantage of these methods is that they work on the whole of Unicode. If that is not needed an encoding such as iso8859-1 (or whatever covers the input) could be used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1799) Unicode compression
[ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891101#action_12891101 ] Robert Muir commented on LUCENE-1799: - btw that patch is huge because i just sucked in the icu charset stuff to have an implementation that works for testing... its not intended to ever be that way as we would just implement the stuff we need without this code, but it makes it easier to test since you dont need any external jars or muck with the build system at all. Unicode compression --- Key: LUCENE-1799 URL: https://issues.apache.org/jira/browse/LUCENE-1799 Project: Lucene - Java Issue Type: New Feature Components: Store Affects Versions: 2.4.1 Reporter: DM Smith Priority: Minor Attachments: LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799_big.patch In lucene-1793, there is the off-topic suggestion to provide compression of Unicode data. The motivation was a custom encoding in a Russian analyzer. The original supposition was that it provided a more compact index. This led to the comment that a different or compressed encoding would be a generally useful feature. BOCU-1 was suggested as a possibility. This is a patented algorithm by IBM with an implementation in ICU. If Lucene provide it's own implementation a freely avIlable, royalty-free license would need to be obtained. SCSU is another Unicode compression algorithm that could be used. An advantage of these methods is that they work on the whole of Unicode. If that is not needed an encoding such as iso8859-1 (or whatever covers the input) could be used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-2537) FSDirectory.copy() impl is unsafe
[ https://issues.apache.org/jira/browse/LUCENE-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887935#action_12887935 ] Shai Erera edited comment on LUCENE-2537 at 7/22/10 8:09 AM: - Oh .. found the thread we discussed that on the list, to which I've actually last posted w/ the following text: {quote} I've Googled around a bit and came across this: http://markmail.org/message/l67bierbmmedrfw5. Apparently, there's a long standing bug against SUN since May 2006 (http://bugs.sun.com/view_bug.do?bug_id=6431344) that's still open and reports the exact same behavior that I'm seeing. If I understand correctly, this might be a Windows limitation and is expected to work well on Linux. I'll give it a try. But this makes me think if we should keep the current behavior for Linux-based directories, and fallback to the chunks approach for Windows ones? Since eventually I'll be running on Linux, I don't want to lose performance ... This isn't the first that we've witnessed the write once, run everywhere misconception of Java :). I'm thinking if in general we should have a Windows/Linux FSDirectory impl, or handlers, to prepare for future cases as well. Mike already started this with LUCENE-2500 (DirectIOLinuxDirectory). Instead of writing a Directory, perhaps we could have a handler object or something, or a generic LinuxDirectory that impls some stuff the 'linux' way. In FSDirectory we already have code which detects the OS and JRE used to decide between Simple, NIO and MMAP Directories ... {quote} was (Author: shaie): Oh .. found the thread we discussed that on the list, to which I've actually last posted w/ the following text: {quote} I've Googled around a bit and came across this: http://markmail.org/message/l67bierbmmedrfw5. Apparently, there's a long standing bug against SUN since May 2006 (http://bugs.sun.com/view_bug.do?bug_id=6431344) that's still open and reports the exact same behavior that I'm seeing. If I understand correctly, this might be a Windows limitation and is expected to work well on Linux. I'll give it a try. But this makes me think if we should keep the current behavior for Linux-based directories, and fallback to the chunks approach for Windows ones? Since eventually I'll be running on Linux, I don't want to lose performance ... This isn't the first that we've witnessed the write once, run everywhere misconception of Java :). I'm thinking if in general we should have a Windows/Linux FSDirectory impl, or handlers, to prepare for future cases as well. Mike already started this with LUCENE-2500 (DirectIOLinuxDirectory). Instead of writing a Directory, perhaps we could have a handler object or something, or a generic LinuxDirectory that impls some stuff the 'linux' way. In FSDirectory we already have code which detects the OS and JRE used to decide between Simple, NIO and MMAP Directories ... {code} FSDirectory.copy() impl is unsafe - Key: LUCENE-2537 URL: https://issues.apache.org/jira/browse/LUCENE-2537 Project: Lucene - Java Issue Type: Bug Components: Store Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1, 4.0 There are a couple of issues with it: # FileChannel.transferFrom documents that it may not copy the number of bytes requested, however we don't check the return value. So need to fix the code to read in a loop until all bytes were copied.. # When calling addIndexes() w/ very large segments (few hundred MBs in size), I ran into the following exception (Java 1.6 -- Java 1.5's exception was cryptic): {code} Exception in thread main java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:770) at sun.nio.ch.FileChannelImpl.transferToTrustedChannel(FileChannelImpl.java:450) at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:523) at org.apache.lucene.store.FSDirectory.copy(FSDirectory.java:450) at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:3019) Caused by: java.lang.OutOfMemoryError: Map failed at sun.nio.ch.FileChannelImpl.map0(Native Method) at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:767) ... 7 more {code} I changed the impl to something like this: {code} long numWritten = 0; long numToWrite = input.size(); long bufSize = 1 26; while (numWritten numToWrite) { numWritten += output.transferFrom(input, numWritten, bufSize); } {code} And the code successfully adds the indexes. This code uses chunks of 64MB, however that might be too large for some applications, so we definitely need a smaller one. The question is how small so that performance won't be affected, and it'd be great if we can let it be configurable, however since that API is
[jira] Reopened: (SOLR-1999) Download HEADER should not have pointer to nightly builds
[ https://issues.apache.org/jira/browse/SOLR-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb reopened SOLR-1999: Sorry, but that is still advertising nightly builds to the general public, albeit indirectly. If a developer really wants to find nightly builds, they should be able to do so via the developer pages, not the pages intended for all users. Download HEADER should not have pointer to nightly builds - Key: SOLR-1999 URL: https://issues.apache.org/jira/browse/SOLR-1999 Project: Solr Issue Type: Bug Environment: http://www.apache.org/dist/lucene/solr/HEADER.html Reporter: Sebb Assignee: Hoss Man The file HEADER.html should not have a pointer to nightly builds. Nightly builds should be reserved for developers, and not advertised to the general public. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2537) FSDirectory.copy() impl is unsafe
[ https://issues.apache.org/jira/browse/LUCENE-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-2537: --- Attachment: FileCopyTest.java I wrote a test which compares FileChannel API to intermediate buffer copies. The test runs each method 3 times and reports the best time of each. It can be run w/ different file and chunk sizes. Here are the results of copying a 1GB file using different chunk sizes (the chunk is used as the intermediate buffer size as well). Machine spec: * Linux, 64-bit (IBM) JVM * 2xQuad (+hyper-threading) - 16 cores overall * 16GB RAM * SAS HD ||Chunk Size||FileChannel||Intermediate Buffer||Diff|| |64K|1865|1528|{color:red}-18%{color}| |128K|1660|1526|{color:red}-9%{color}| |512K|1514|1493|{color:red}-2%{color}| |1M|1552|2072|{color:green}+33%{color}| |2M|1488|1559|{color:green}5%{color}| |4M|1596|1831|{color:green}13%{color}| |16M|1563|1964|{color:green}21%{color}| |64M|1494|2442|{color:green}39%{color}| |128M|1469|2445|{color:green}40%{color}| For small buffer sizes, intermediate byte[] copies is preferable. However, FileChannel method performs pretty much consistently, irregardless of the buffer size (except for the first run), while the byte[] approach degrades a lot, as the buffer size increases. I think, given these results, we can use the FileChannel method w/ a chunk size of 4 (or even 2) MB, to be on the safe side and don't eat up too much RAM? FSDirectory.copy() impl is unsafe - Key: LUCENE-2537 URL: https://issues.apache.org/jira/browse/LUCENE-2537 Project: Lucene - Java Issue Type: Bug Components: Store Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1, 4.0 Attachments: FileCopyTest.java There are a couple of issues with it: # FileChannel.transferFrom documents that it may not copy the number of bytes requested, however we don't check the return value. So need to fix the code to read in a loop until all bytes were copied.. # When calling addIndexes() w/ very large segments (few hundred MBs in size), I ran into the following exception (Java 1.6 -- Java 1.5's exception was cryptic): {code} Exception in thread main java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:770) at sun.nio.ch.FileChannelImpl.transferToTrustedChannel(FileChannelImpl.java:450) at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:523) at org.apache.lucene.store.FSDirectory.copy(FSDirectory.java:450) at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:3019) Caused by: java.lang.OutOfMemoryError: Map failed at sun.nio.ch.FileChannelImpl.map0(Native Method) at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:767) ... 7 more {code} I changed the impl to something like this: {code} long numWritten = 0; long numToWrite = input.size(); long bufSize = 1 26; while (numWritten numToWrite) { numWritten += output.transferFrom(input, numWritten, bufSize); } {code} And the code successfully adds the indexes. This code uses chunks of 64MB, however that might be too large for some applications, so we definitely need a smaller one. The question is how small so that performance won't be affected, and it'd be great if we can let it be configurable, however since that API is called by other API, such as addIndexes, not sure it's easily controllable. Also, I read somewhere (can't remember now where) that on Linux the native impl is better and does copy in chunks. So perhaps we should make a Linux specific impl? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2537) FSDirectory.copy() impl is unsafe
[ https://issues.apache.org/jira/browse/LUCENE-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891123#action_12891123 ] Michael McCandless commented on LUCENE-2537: Nice results Shai! bq. I think, given these results, we can use the FileChannel method w/ a chunk size of 4 (or even 2) MB, to be on the safe side and don't eat up too much RAM? +1 FSDirectory.copy() impl is unsafe - Key: LUCENE-2537 URL: https://issues.apache.org/jira/browse/LUCENE-2537 Project: Lucene - Java Issue Type: Bug Components: Store Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1, 4.0 Attachments: FileCopyTest.java There are a couple of issues with it: # FileChannel.transferFrom documents that it may not copy the number of bytes requested, however we don't check the return value. So need to fix the code to read in a loop until all bytes were copied.. # When calling addIndexes() w/ very large segments (few hundred MBs in size), I ran into the following exception (Java 1.6 -- Java 1.5's exception was cryptic): {code} Exception in thread main java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:770) at sun.nio.ch.FileChannelImpl.transferToTrustedChannel(FileChannelImpl.java:450) at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:523) at org.apache.lucene.store.FSDirectory.copy(FSDirectory.java:450) at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:3019) Caused by: java.lang.OutOfMemoryError: Map failed at sun.nio.ch.FileChannelImpl.map0(Native Method) at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:767) ... 7 more {code} I changed the impl to something like this: {code} long numWritten = 0; long numToWrite = input.size(); long bufSize = 1 26; while (numWritten numToWrite) { numWritten += output.transferFrom(input, numWritten, bufSize); } {code} And the code successfully adds the indexes. This code uses chunks of 64MB, however that might be too large for some applications, so we definitely need a smaller one. The question is how small so that performance won't be affected, and it'd be great if we can let it be configurable, however since that API is called by other API, such as addIndexes, not sure it's easily controllable. Also, I read somewhere (can't remember now where) that on Linux the native impl is better and does copy in chunks. So perhaps we should make a Linux specific impl? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2553) IOException: read past EOF
[ https://issues.apache.org/jira/browse/LUCENE-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle L. updated LUCENE-2553: Description: We have been getting an {{IOException}} with the following stack trace: \\ \\ {noformat} java.io.IOException: read past EOF at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:92) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:218) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:901) at com.cargurus.search.IndexManager$AllHitsUnsortedCollector.collect(IndexManager.java:520) at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:275) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:212) at org.apache.lucene.search.Searcher.search(Searcher.java:67) ... {noformat} \\ \\ We have implemented a basic custom collector that collects all hits in an unordered manner: {code} private class AllHitsUnsortedCollector extends Collector { private Log logger = LogFactory.getLog(AllHitsUnsortedCollector.class); private IndexReader reader; private int baselineDocumentId; private ListDocument matchingDocuments = new ArrayListDocument(); @Override public boolean acceptsDocsOutOfOrder() { return true; } @Override public void collect(int docId) throws IOException { int documentId = baselineDocumentId + docId; Document document = reader.document(documentId, getFieldSelector()); if (document == null) { logger.info(Null document from search results!); } else { matchingDocuments.add(document); } } @Override public void setNextReader(IndexReader segmentReader, int baseDocId) throws IOException { this.reader = segmentReader; this.baselineDocumentId = baseDocId; } @Override public void setScorer(Scorer scorer) throws IOException { // do nothing } public ListDocument getMatchingDocuments() { return matchingDocuments; } } {code} The exception arises when users perform searches while indexing/optimization is occurring. Our {{IndexReader}} is read-only. From the documentation I have read, a read-only {{IndexReader}} instance should be immune from any uncommitted index changes and should return consistent results during indexing and optimization. As this exception occurs during indexing/optimization, it seems to me that the read-only {{IndexReader}} is somehow stumbling upon the uncommitted content? The problem is difficult to replicate as it is sporadic in nature and so far has only occurred in Production. We have rebuilt the indexes a number of times, but that does not seem to alleviate the issue. Any other information I can provide that will help isolate the issue? Most likely the other possibility is that the {{Collector}} we have written is doing something it shouldn't. Any pointers? was: We have been getting an {{IOException}} with the following stack trace: \\ \\ {noformat} java.io.IOException: read past EOF at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:92) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:218) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:901) at com.cargurus.search.IndexManager$AllHitsUnsortedCollector.collect(IndexManager.java:520) at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:275) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:212) at org.apache.lucene.search.Searcher.search(Searcher.java:67) ... {noformat} \\ \\ We have implemented a basic custom collector that collects all hits in an unordered manner: {code} private class AllHitsUnsortedCollector extends Collector { private Log logger = LogFactory.getLog(AllHitsUnsortedCollector.class); private IndexReader reader; private int baselineDocumentId; private ListDocument matchingDocuments = new ArrayListDocument(); @Override public boolean acceptsDocsOutOfOrder() { return true; } @Override public void
[jira] Updated: (LUCENE-2553) IOException: read past EOF
[ https://issues.apache.org/jira/browse/LUCENE-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle L. updated LUCENE-2553: Description: We have been getting an {{IOException}} with the following stack trace: \\ \\ {noformat} java.io.IOException: read past EOF at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:92) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:218) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:901) at com.cargurus.search.IndexManager$AllHitsUnsortedCollector.collect(IndexManager.java:520) at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:275) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:212) at org.apache.lucene.search.Searcher.search(Searcher.java:67) ... {noformat} \\ \\ We have implemented a basic custom collector that collects all hits in an unordered manner: {code} private class AllHitsUnsortedCollector extends Collector { private Log logger = LogFactory.getLog(AllHitsUnsortedCollector.class); private IndexReader reader; private int baselineDocumentId; private ListDocument matchingDocuments = new ArrayListDocument(); @Override public boolean acceptsDocsOutOfOrder() { return true; } @Override public void collect(int docId) throws IOException { int documentId = baselineDocumentId + docId; Document document = reader.document(documentId, getFieldSelector()); if (document == null) { logger.info(Null document from search results!); } else { matchingDocuments.add(document); } } @Override public void setNextReader(IndexReader segmentReader, int baseDocId) throws IOException { this.reader = segmentReader; this.baselineDocumentId = baseDocId; } @Override public void setScorer(Scorer scorer) throws IOException { // do nothing } public ListDocument getMatchingDocuments() { return matchingDocuments; } } {code} The exception arises when users perform searches while indexing/optimization is occurring. Our {{IndexReader}} is read-only. From the documentation I have read, a read-only {{IndexReader}} instance should be immune from any uncommitted index changes and should return consistent results during indexing and optimization. As this exception occurs during indexing/optimization, it seems to me that the read-only {{IndexReader}} is somehow stumbling upon the uncommitted content? The problem is difficult to replicate as it is sporadic in nature and so far has only occurred in Production. We have rebuilt the indexes a number of times, but that does not seem to alleviate the issue. Any other information I can provide that will help isolate the issue? The most likely other possibility is that the {{Collector}} we have written is doing something it shouldn't. Any pointers? was: We have been getting an {{IOException}} with the following stack trace: \\ \\ {noformat} java.io.IOException: read past EOF at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154) at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69) at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:92) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:218) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:901) at com.cargurus.search.IndexManager$AllHitsUnsortedCollector.collect(IndexManager.java:520) at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:275) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:212) at org.apache.lucene.search.Searcher.search(Searcher.java:67) ... {noformat} \\ \\ We have implemented a basic custom collector that collects all hits in an unordered manner: {code} private class AllHitsUnsortedCollector extends Collector { private Log logger = LogFactory.getLog(AllHitsUnsortedCollector.class); private IndexReader reader; private int baselineDocumentId; private ListDocument matchingDocuments = new ArrayListDocument(); @Override public boolean acceptsDocsOutOfOrder() { return true; } @Override public void
[jira] Updated: (LUCENE-2537) FSDirectory.copy() impl is unsafe
[ https://issues.apache.org/jira/browse/LUCENE-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-2537: --- Attachment: LUCENE-2537.patch Patch copies the files in chunks of 2MB. All core tests pass. I'll wait a day or two in case someone wants to suggests a different approach, or chunk size limit before I commit. FSDirectory.copy() impl is unsafe - Key: LUCENE-2537 URL: https://issues.apache.org/jira/browse/LUCENE-2537 Project: Lucene - Java Issue Type: Bug Components: Store Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1, 4.0 Attachments: FileCopyTest.java, LUCENE-2537.patch There are a couple of issues with it: # FileChannel.transferFrom documents that it may not copy the number of bytes requested, however we don't check the return value. So need to fix the code to read in a loop until all bytes were copied.. # When calling addIndexes() w/ very large segments (few hundred MBs in size), I ran into the following exception (Java 1.6 -- Java 1.5's exception was cryptic): {code} Exception in thread main java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:770) at sun.nio.ch.FileChannelImpl.transferToTrustedChannel(FileChannelImpl.java:450) at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:523) at org.apache.lucene.store.FSDirectory.copy(FSDirectory.java:450) at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:3019) Caused by: java.lang.OutOfMemoryError: Map failed at sun.nio.ch.FileChannelImpl.map0(Native Method) at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:767) ... 7 more {code} I changed the impl to something like this: {code} long numWritten = 0; long numToWrite = input.size(); long bufSize = 1 26; while (numWritten numToWrite) { numWritten += output.transferFrom(input, numWritten, bufSize); } {code} And the code successfully adds the indexes. This code uses chunks of 64MB, however that might be too large for some applications, so we definitely need a smaller one. The question is how small so that performance won't be affected, and it'd be great if we can let it be configurable, however since that API is called by other API, such as addIndexes, not sure it's easily controllable. Also, I read somewhere (can't remember now where) that on Linux the native impl is better and does copy in chunks. So perhaps we should make a Linux specific impl? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-64) strict hierarchical facets
[ https://issues.apache.org/jira/browse/SOLR-64?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891207#action_12891207 ] SolrFan commented on SOLR-64: - Can the patch please be updated to the latest trunk? Thanks strict hierarchical facets -- Key: SOLR-64 URL: https://issues.apache.org/jira/browse/SOLR-64 Project: Solr Issue Type: New Feature Components: search Reporter: Yonik Seeley Fix For: Next Attachments: SOLR-64.patch, SOLR-64.patch, SOLR-64.patch, SOLR-64.patch Strict Facet Hierarchies... each tag has at most one parent (a tree). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-792) Tree Faceting Component
[ https://issues.apache.org/jira/browse/SOLR-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891208#action_12891208 ] SolrFan commented on SOLR-792: -- Hi, can this patch please be updated against the current 1.4 trunk? thanks. Tree Faceting Component --- Key: SOLR-792 URL: https://issues.apache.org/jira/browse/SOLR-792 Project: Solr Issue Type: New Feature Reporter: Erik Hatcher Assignee: Erik Hatcher Priority: Minor Attachments: SOLR-792.patch, SOLR-792.patch, SOLR-792.patch, SOLR-792.patch, SOLR-792.patch A component to do multi-level faceting. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-64) strict hierarchical facets
[ https://issues.apache.org/jira/browse/SOLR-64?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891213#action_12891213 ] Aleksander Stensby commented on SOLR-64: I'm currently on holidays until July 27. If urgent, please contact Gisele O'Connor: email: gisele.o.con...@integrasco.com phone: +47 90283809 Best regards, Aleksander -- Aleksander M. Stensby Integrasco A/S E-mail: aleksander.sten...@integrasco.com Tel.: +47 41 22 82 72 www.integrasco.com http://twitter.com/Integrasco http://facebook.com/Integrasco Please consider the environment before printing all or any of this e-mail strict hierarchical facets -- Key: SOLR-64 URL: https://issues.apache.org/jira/browse/SOLR-64 Project: Solr Issue Type: New Feature Components: search Reporter: Yonik Seeley Fix For: Next Attachments: SOLR-64.patch, SOLR-64.patch, SOLR-64.patch, SOLR-64.patch Strict Facet Hierarchies... each tag has at most one parent (a tree). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891228#action_12891228 ] Michael Busch commented on LUCENE-2324: --- Thanks, Mike - great feedback! (as always) {quote} I still see usage of docStoreOffset, but aren't we doing away with shared doc stores with the cutover to DWPT? {quote} Do we want all segments that one DWPT writes to share the same doc store, i.e. one doc store per DWPT, or remove doc stores entirely? {quote} I think you can further simplify DocumentsWriterPerThread.DocWriter; in fact I think you can remove it all subclasses in consumers! {quote} I agree! Now that a high number of testcases pass it's less scary to modify even more code :) - will do this next. {quote} Also, we don't need separate closeDocStore; it should just be closed during flush. {quote} OK sounds good. {quote} I like the ThreadAffinityDocumentsWriterThreadPool; it's the default right (I see some tests explicitly setting in on IWC; not sure why)? {quote} It's actually only TestStressIndexing2 and it sets it to use a different number of max thread states than the default. {quote} We should make the in-RAM deletes impl somehow pluggable? {quote} Do you mean so that it's customizable how deletes are handled? E.g. doing live deletes vs. lazy deletes on flush? I think that's a good idea. E.g. at Twitter we'll do live deletes always to get the lowest latency (and we don't have too many deletes), but that's probably not the best default for everyone. So I agree that making this customizable is a good idea. It'd also be nice to have a more efficient data structure to buffer the deletes. With many buffered deletes the java hashmap approach will not be very efficient. Terms could be written into a byte pool, but what should we do with queries? Per thread DocumentsWriters that write their own private segments - Key: LUCENE-2324 URL: https://issues.apache.org/jira/browse/LUCENE-2324 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch See LUCENE-2293 for motivation and more details. I'm copying here Mike's summary he posted on 2293: Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and normal segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: [Fwd: TermEnum usage]
It is expected behavior. Please see http://lucene.apache.org/java/2_9_2/api/all/org/apache/lucene/index/IndexReader.html#terms%28org.apache.lucene.index.Term%29 DIGY -Original Message- From: Vincent DARON [mailto:vda...@ask.be] Sent: Thursday, July 22, 2010 6:10 PM To: lucene-net-dev Subject: [Fwd: TermEnum usage] Without any answers, I'm reposting once. Do I have to post bug report ? Let me know Thanks a lot Vincent DARON ASK
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891241#action_12891241 ] Yonik Seeley commented on LUCENE-2324: -- bq. It'd also be nice to have a more efficient data structure to buffer the deletes. With many buffered deletes the java hashmap approach will not be very efficient. Terms could be written into a byte pool, but what should we do with queries? IMO, terms are an order of magnitude more important than queries. Most deletes will be by some sort of unique id, and will be in the same field. Perhaps a single byte[] with length prefixes (like the field cache has). A single int could then represent a term (it would just be an offset into the byte[], which is field-specific, so no need to store the field each time). We could then build a treemap or hashmap that natively used an int[]... but that may not be necessary (depending on how deletes are applied). Perhaps a sort could be done right before applying, and duplicate terms could be handled at that time. Anyway, I'm only casually following this issue, but I'ts looking like really cool stuff! Per thread DocumentsWriters that write their own private segments - Key: LUCENE-2324 URL: https://issues.apache.org/jira/browse/LUCENE-2324 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch See LUCENE-2293 for motivation and more details. I'm copying here Mike's summary he posted on 2293: Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and normal segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891256#action_12891256 ] Michael McCandless commented on LUCENE-2324: {quote} bq. I still see usage of docStoreOffset, but aren't we doing away with shared doc stores with the cutover to DWPT? Do we want all segments that one DWPT writes to share the same doc store, i.e. one doc store per DWPT, or remove doc stores entirely? {quote} Oh good question... a single DWPT can in fact continue to share doc store across the segments it flushes. Hmm, but... this opto only helps in that we don't have to merge the doc stores if we merge segments that already share their doc stores. But if (say) I have 2 threads indexing, and I'm indexing lots of docs and each DWPT has written 5 segments, we will then merge these 10 segments, and must merge the doc stores at that point. So the sharing isn't really buying us much (just not closing old files opening new ones, which is presumably negligible)? {quote} bq. I think you can further simplify DocumentsWriterPerThread.DocWriter; in fact I think you can remove it all subclasses in consumers! I agree! Now that a high number of testcases pass it's less scary to modify even more code - will do this next. bq. Also, we don't need separate closeDocStore; it should just be closed during flush. OK sounds good. {quote} Super :) {quote} bq. I like the ThreadAffinityDocumentsWriterThreadPool; it's the default right (I see some tests explicitly setting in on IWC; not sure why)? It's actually only TestStressIndexing2 and it sets it to use a different number of max thread states than the default. {quote} Ahh OK great. {quote} bq. We should make the in-RAM deletes impl somehow pluggable? Do you mean so that it's customizable how deletes are handled? {quote} Actually I was worried about the long[] sequenceIDs (adding 8 bytes RAM per buffered doc) -- this could be a biggish hit to RAM efficiency for small docs. {quote} E.g. doing live deletes vs. lazy deletes on flush? I think that's a good idea. E.g. at Twitter we'll do live deletes always to get the lowest latency (and we don't have too many deletes), but that's probably not the best default for everyone. So I agree that making this customizable is a good idea. {quote} Yeah, this too :) Actually deletions today are not applied on flush -- they continue to be buffered beyond flush, and then get applied just before a merge kicks off. I think we should keep this (as an option and probably as the default) -- it's important for apps w/ large indices that don't use NRT (and don't pool readers) because it's costly to open readers. So it sounds like we should support lazy (apply-before-merge like today) and live (live means resolve deleted Term/Query - docID(s) synchronously inside deleteDocuments, right?). Live should also be less performant because of less temporal locality (vs lazy). {quote} It'd also be nice to have a more efficient data structure to buffer the deletes. With many buffered deletes the java hashmap approach will not be very efficient. Terms could be written into a byte pool, but what should we do with queries? {quote} I agree w/ Yonik: let's worry only about delete by Term (not Query) for now. Maybe we could reuse (factor out) TermsHashPerField's custom hash here, for the buffered Terms? It efficiently maps a BytesRef -- int. Another thing: it looks like finishFlushedSegment is sync'd on the IW instance, but, it need not be sync'd for all of that? EG readerPool.get(), applyDeletes, building the CFS, may not need to be inside the sync block? Per thread DocumentsWriters that write their own private segments - Key: LUCENE-2324 URL: https://issues.apache.org/jira/browse/LUCENE-2324 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch See LUCENE-2293 for motivation and more details. I'm copying here Mike's summary he posted on 2293: Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and normal segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO CPU. -- This message is automatically generated by JIRA. - You can reply to this email to
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891262#action_12891262 ] Michael Busch commented on LUCENE-2324: --- {quote} Perhaps a single byte[] with length prefixes (like the field cache has). A single int could then represent a term (it would just be an offset into the byte[], which is field-specific, so no need to store the field each time). {quote} Yeah that's pretty much how TermsHashPerField works. I agree with Mike, let's reuse that code. {quote} Hmm, but... this opto only helps in that we don't have to merge the doc stores if we merge segments that already share their doc stores. But if (say) I have 2 threads indexing, and I'm indexing lots of docs and each DWPT has written 5 segments, we will then merge these 10 segments, and must merge the doc stores at that point. So the sharing isn't really buying us much (just not closing old files opening new ones, which is presumably negligible)? {quote} Yeah that's true. I agree it won't help much. I think we should just remove the doc stores, great simplification (which should also make parallel indexing a bit easier :) ). {quote} Another thing: it looks like finishFlushedSegment is sync'd on the IW instance, but, it need not be sync'd for all of that? EG readerPool.get(), applyDeletes, building the CFS, may not need to be inside the sync block? {quote} Thanks for the hint. I need to carefully go over all the synchronization, there are likely more problems. Per thread DocumentsWriters that write their own private segments - Key: LUCENE-2324 URL: https://issues.apache.org/jira/browse/LUCENE-2324 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch See LUCENE-2293 for motivation and more details. I'm copying here Mike's summary he posted on 2293: Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and normal segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891264#action_12891264 ] Yonik Seeley commented on LUCENE-2324: -- bq. Yeah that's pretty much how TermsHashPerField works. I agree with Mike, let's reuse that code. Do we even need to maintain a hash over it though, or can we simply keep a list (and allow dup terms until it's time to apply them)? Per thread DocumentsWriters that write their own private segments - Key: LUCENE-2324 URL: https://issues.apache.org/jira/browse/LUCENE-2324 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch See LUCENE-2293 for motivation and more details. I'm copying here Mike's summary he posted on 2293: Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and normal segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-752) Allow better Field Compression options
[ https://issues.apache.org/jira/browse/SOLR-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891284#action_12891284 ] David Smiley commented on SOLR-752: --- I spent some time today attempting to implement this with my own Solr FieldType that extends TextField. As I tried to implement it, I realized that I couldn't really do it. FieldType has a method createField(...) that is necessary to implement in order to set binary data (i.e. byte[]) on a Field. This method demands I return a org.apache.lucene.document.Field which is final. If I create the field with binary data, by default it's not indexed or tokenized. I can get those booleans to flip by simply invoking f.setTokenStream(null). However, I can't set omitNorms() to false, nor can I set booleans for the term vector fields. There may be other issues but at this point I gave up to work on other more important priorities of mine. Allow better Field Compression options -- Key: SOLR-752 URL: https://issues.apache.org/jira/browse/SOLR-752 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Priority: Minor See http://lucene.markmail.org/message/sd4mgwud6caevb35?q=compression It would be good if Solr handled field compression outside of Lucene's Field.COMPRESS capabilities, since those capabilities are less than ideal when it comes to control over compression. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-752) Allow better Field Compression options
[ https://issues.apache.org/jira/browse/SOLR-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891305#action_12891305 ] David Smiley commented on SOLR-752: --- I already looked at BinaryField and TrieField for inspiration. BinaryField assumes you're not going to index the data. And TrieField doesn't set binary data value on the Field. Yes, I think the next step is to make createField() return Fieldable. But I'm not a committer... Instead or in addition... I have to wonder, why not modify Lucene's Field class to allow me to set the Index, Store, and TermVecotr enums AND specify binary data on a suitable constructor? Arguably an existing constructor taking String would be hijaced to take Object and then do the right thing. That would be a small change, whereas implementing another subclass of AbstractField is more complex and would likely reproduce much of what's in Field already. Allow better Field Compression options -- Key: SOLR-752 URL: https://issues.apache.org/jira/browse/SOLR-752 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Priority: Minor See http://lucene.markmail.org/message/sd4mgwud6caevb35?q=compression It would be good if Solr handled field compression outside of Lucene's Field.COMPRESS capabilities, since those capabilities are less than ideal when it comes to control over compression. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2009) Contrib ant test targets do not respect sys props testcase,testpackage,and testpackageroot
Contrib ant test targets do not respect sys props testcase,testpackage,and testpackageroot -- Key: SOLR-2009 URL: https://issues.apache.org/jira/browse/SOLR-2009 Project: Solr Issue Type: Bug Components: Build Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: Next Very annoying using these props with core tests unless you use the junit target rather than test. Also would be nice if they worked regardless for future dev. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality
Improvements to SpellCheckComponent Collate functionality - Key: SOLR-2010 URL: https://issues.apache.org/jira/browse/SOLR-2010 Project: Solr Issue Type: New Feature Components: clients - java, spellchecker Affects Versions: 1.4.1 Environment: Tested against trunk revision 966633 Reporter: James Dyer Priority: Minor Improvements to SpellCheckComponent Collate functionality Our project requires a better Spell Check Collator. I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features. 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also). This is especially helpful when there is more than one correction per query. The 1.4 behavior does not verify that a particular combination will actually return hits. 2. Provide the option to get multiple collation suggestions 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction. This patch is similar to what is described in SOLR-507 item #1. Also, this patch provides a viable workaround for the problem discussed in SOLR-1074. A dictionary could be created that combines the terms from the multiple fields. The collator then would prune out any spurious suggestions this would cause. This patch adds the following spellcheck parameters: 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up. Lower values ensure better performance. Higher values may be necessary to find a collation that can return results. Default is 0, which maintains backwards-compatible behavior (do not check collations). 2. spellcheck.maxCollations - maximum # of collations to return. Default is 1, which maintains backwards-compatible behavior. 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found. default is false, which maintains backwards-compatible behavior. When true, output is like this (in context): lst name=spellcheck lst name=suggestions lst name=hopq int name=numFound94/int int name=startOffset7/int int name=endOffset11/int arr name=suggestion strhope/str strhow/str strhope/str strchops/str strhoped/str etc /arr lst name=faill int name=numFound100/int int name=startOffset16/int int name=endOffset21/int arr name=suggestion strfall/str strfails/str strfail/str strfill/str strfaith/str strall/str etc /arr /lst lst name=collation str name=collationQueryTitle:(how AND fails)/str int name=hits2/int lst name=misspellingsAndCorrections str name=hopqhow/str str name=faillfails/str /lst /lst lst name=collation str name=collationQueryTitle:(hope AND faith)/str int name=hits2/int lst name=misspellingsAndCorrections str name=hopqhope/str str name=faillfaith/str /lst /lst lst name=collation str name=collationQueryTitle:(chops AND all)/str int name=hits1/int lst name=misspellingsAndCorrections str name=hopqchops/str str name=faillall/str /lst /lst /lst /lst In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format. getCollatedResult(), which returns a single String, is retained for backwards-compatibility. Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false. This likely will not return valid results if using Shards. Rather, a more robust interaction with the index would be necessary than what exists in
[jira] Updated: (SOLR-2009) Contrib ant test targets do not respect sys props testcase,testpackage,and testpackageroot
[ https://issues.apache.org/jira/browse/SOLR-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-2009: -- Attachment: SOLR-2009.patch Contrib ant test targets do not respect sys props testcase,testpackage,and testpackageroot -- Key: SOLR-2009 URL: https://issues.apache.org/jira/browse/SOLR-2009 Project: Solr Issue Type: Bug Components: Build Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: Next Attachments: SOLR-2009.patch Very annoying using these props with core tests unless you use the junit target rather than test. Also would be nice if they worked regardless for future dev. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality
[ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer updated SOLR-2010: - Attachment: SOLR-2010.patch Tested against branch version #96633 Improvements to SpellCheckComponent Collate functionality - Key: SOLR-2010 URL: https://issues.apache.org/jira/browse/SOLR-2010 Project: Solr Issue Type: New Feature Components: clients - java, spellchecker Affects Versions: 1.4.1 Environment: Tested against trunk revision 966633 Reporter: James Dyer Priority: Minor Attachments: SOLR-2010.patch Improvements to SpellCheckComponent Collate functionality Our project requires a better Spell Check Collator. I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features. 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also). This is especially helpful when there is more than one correction per query. The 1.4 behavior does not verify that a particular combination will actually return hits. 2. Provide the option to get multiple collation suggestions 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction. This patch is similar to what is described in SOLR-507 item #1. Also, this patch provides a viable workaround for the problem discussed in SOLR-1074. A dictionary could be created that combines the terms from the multiple fields. The collator then would prune out any spurious suggestions this would cause. This patch adds the following spellcheck parameters: 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up. Lower values ensure better performance. Higher values may be necessary to find a collation that can return results. Default is 0, which maintains backwards-compatible behavior (do not check collations). 2. spellcheck.maxCollations - maximum # of collations to return. Default is 1, which maintains backwards-compatible behavior. 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found. default is false, which maintains backwards-compatible behavior. When true, output is like this (in context): lst name=spellcheck lst name=suggestions lst name=hopq int name=numFound94/int int name=startOffset7/int int name=endOffset11/int arr name=suggestion strhope/str strhow/str strhope/str strchops/str strhoped/str etc /arr lst name=faill int name=numFound100/int int name=startOffset16/int int name=endOffset21/int arr name=suggestion strfall/str strfails/str strfail/str strfill/str strfaith/str strall/str etc /arr /lst lst name=collation str name=collationQueryTitle:(how AND fails)/str int name=hits2/int lst name=misspellingsAndCorrections str name=hopqhow/str str name=faillfails/str /lst /lst lst name=collation str name=collationQueryTitle:(hope AND faith)/str int name=hits2/int lst name=misspellingsAndCorrections str name=hopqhope/str str name=faillfaith/str /lst /lst lst name=collation str name=collationQueryTitle:(chops AND all)/str int name=hits1/int lst name=misspellingsAndCorrections str name=hopqchops/str str name=faillall/str /lst /lst /lst /lst In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format. getCollatedResult(), which returns a single String, is retained for
[jira] Commented: (SOLR-1240) Numerical Range faceting
[ https://issues.apache.org/jira/browse/SOLR-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891321#action_12891321 ] Hoss Man commented on SOLR-1240: bq. Rather than embedding meta to the list containing the counts, perhaps we should bite the bullet and add an additional level for the counts. yeah ... i'm on board with that idea. it's a trivial change. any comments on the implementation? i think it's fairly solid -- the one wish i have though is to try and gut the existing date faceting code to just use the new code -- but i can't see a very easy way to do that while dealing with the differnet param names .. suggestions? Numerical Range faceting Key: SOLR-1240 URL: https://issues.apache.org/jira/browse/SOLR-1240 Project: Solr Issue Type: New Feature Components: search Reporter: Gijs Kunze Priority: Minor Attachments: SOLR-1240.patch, SOLR-1240.patch, SOLR-1240.patch, SOLR-1240.patch, SOLR-1240.patch, SOLR-1240.patch, SOLR-1240.patch For faceting numerical ranges using many facet.query query arguments leads to unmanageably large queries as the fields you facet over increase. Adding the same faceting parameter for numbers which already exists for dates should fix this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2011) Solr should get it's temp dir like lucene - first checking the tempDir sys prop
Solr should get it's temp dir like lucene - first checking the tempDir sys prop --- Key: SOLR-2011 URL: https://issues.apache.org/jira/browse/SOLR-2011 Project: Solr Issue Type: Improvement Components: Build Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: Next -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891334#action_12891334 ] Jason Rutherglen commented on LUCENE-2324: -- {quote}I think we should just remove the doc stores{quote} Right, I think we should remove sharing doc stores between segments. And in general, RT apps will likely not want to use doc stores if they are performing numerous updates and/or deletes. We can explicitly state this in the javadocs. I'm thinking we could explore efficient deleted docs as sequence ids in a different issue, specifically storing them in a short[] and wrapping around. Per thread DocumentsWriters that write their own private segments - Key: LUCENE-2324 URL: https://issues.apache.org/jira/browse/LUCENE-2324 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch See LUCENE-2293 for motivation and more details. I'm copying here Mike's summary he posted on 2293: Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and normal segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2012) stats component, min/max on a field with no values
stats component, min/max on a field with no values -- Key: SOLR-2012 URL: https://issues.apache.org/jira/browse/SOLR-2012 Project: Solr Issue Type: Bug Affects Versions: 1.4 Reporter: Jonathan Rochkind : : When I use the stats component on a field that has no values in the result set : (ie, stats.missing == rowCount), I'd expect that 'min'and 'max' would be : blank. : : Instead, they seem to be the smallest and largest float values or something, : min = 1.7976931348623157E308, max = 4.9E-324 . : : Is this a bug? off the top of my head it sounds like it ... would you mind opening a n issue in Jira please? -Hoss -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2555) Remove shared doc stores
Remove shared doc stores Key: LUCENE-2555 URL: https://issues.apache.org/jira/browse/LUCENE-2555 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch With per-thread DocumentsWriters sharing doc stores across segments doesn't make much sense anymore. See also LUCENE-2324. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (SOLR-2009) Contrib ant test targets do not respect sys props testcase,testpackage,and testpackageroot
[ https://issues.apache.org/jira/browse/SOLR-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller resolved SOLR-2009. --- Resolution: Fixed more to do here later, but this initial fix is in. Contrib ant test targets do not respect sys props testcase,testpackage,and testpackageroot -- Key: SOLR-2009 URL: https://issues.apache.org/jira/browse/SOLR-2009 Project: Solr Issue Type: Bug Components: Build Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: Next Attachments: SOLR-2009.patch Very annoying using these props with core tests unless you use the junit target rather than test. Also would be nice if they worked regardless for future dev. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2554) preflex codec doesn't order terms correctly
[ https://issues.apache.org/jira/browse/LUCENE-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891364#action_12891364 ] Robert Muir commented on LUCENE-2554: - the perf issues here are really from our contrived tests... its good to use _TestUtil.randomUnicodeString, but it gives you the impression there is something wrong with this dance and there really isnt. I added _TestUtil.randomRealisticUnicodeString in r966878, you can swap this into some of these slow tests and see its definitely the problem. preflex codec doesn't order terms correctly --- Key: LUCENE-2554 URL: https://issues.apache.org/jira/browse/LUCENE-2554 Project: Lucene - Java Issue Type: Test Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2554.patch The surrogate dance in the preflex codec (which must dynamically remap terms from UTF16 order to unicode code point order) is buggy. To better test it, I want to add a test-only codec, preflexrw, that is able to write indices in the pre-flex format. Then we should also fix tests to randomly pick codecs (including preflexrw) so we better test all of our codecs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1999) Download HEADER should not have pointer to nightly builds
[ https://issues.apache.org/jira/browse/SOLR-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891369#action_12891369 ] Sebb commented on SOLR-1999: See: http://www.apache.org/dev/release.html#what Do not include any links on the project website that might encourage non-developers to download and use nightly builds, snapshots, release candidates, or any other similar package. Download HEADER should not have pointer to nightly builds - Key: SOLR-1999 URL: https://issues.apache.org/jira/browse/SOLR-1999 Project: Solr Issue Type: Bug Environment: http://www.apache.org/dist/lucene/solr/HEADER.html Reporter: Sebb Assignee: Hoss Man The file HEADER.html should not have a pointer to nightly builds. Nightly builds should be reserved for developers, and not advertised to the general public. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1999) Download HEADER should not have pointer to nightly builds
[ https://issues.apache.org/jira/browse/SOLR-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891376#action_12891376 ] Hoss Man commented on SOLR-1999: Developers are members of the general public -- any page a developer can see can be seen by anybody else as well. While i agree the previous link was bad, i quite frankly don't understand your concern with the current situation HEADER.html doesn't even mention nightly builds -- it directs people interested in (unofficial, unreleased) source code for Solr to [a wiki page|http://wiki.apache.org/solr/HackingSolr] which makes it very clear it's audience is developers, and which has info on how to check out the development branches. Admittedly that HackingSolr page does mention that we have a nightly build system, so a non-developer might click the link about hacking on the source and then get intersted in the nightly builds -- but it doesn't even link directly to any builds -- instead it links to a [hudson page|http://hudson.zones.apache.org/hudson/view/Lucene/] where there is a list of branches that have builds, and if you click on one of those you can get a [branch build status page|http://hudson.zones.apache.org/hudson/view/Lucene/job/Solr-trunk/] and from there you can scroll all the way to the bottom to click on [an artifacts link|http://hudson.zones.apache.org/hudson/view/Lucene/job/Solr-trunk/lastSuccessfulBuild/artifact/] and from *there* you can actually click on a link to download something that could be called a nightly build. That seems like it fits the definition of developer pages, not the pages intended for all users. I'm hard pressed to imagine a way to make it harder for non-developers to find those builds while still linking to those hudson pages for developers Download HEADER should not have pointer to nightly builds - Key: SOLR-1999 URL: https://issues.apache.org/jira/browse/SOLR-1999 Project: Solr Issue Type: Bug Environment: http://www.apache.org/dist/lucene/solr/HEADER.html Reporter: Sebb Assignee: Hoss Man The file HEADER.html should not have a pointer to nightly builds. Nightly builds should be reserved for developers, and not advertised to the general public. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1999) Download HEADER should not have pointer to nightly builds
[ https://issues.apache.org/jira/browse/SOLR-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891378#action_12891378 ] Robert Muir commented on SOLR-1999: --- bq. Do not include any links on the project website that might encourage non-developers to download and use nightly builds, snapshots, release candidates, or any other similar package. Personally I think this is a load of crap. How should we get quality releases without encouraging users to test things before its officially released? Getting feedback from users that are willing to deal with trunk and patches, and letting things bake in trunk is really valuable, and I think its also a step towards encouraging them to participate in development. Download HEADER should not have pointer to nightly builds - Key: SOLR-1999 URL: https://issues.apache.org/jira/browse/SOLR-1999 Project: Solr Issue Type: Bug Environment: http://www.apache.org/dist/lucene/solr/HEADER.html Reporter: Sebb Assignee: Hoss Man The file HEADER.html should not have a pointer to nightly builds. Nightly builds should be reserved for developers, and not advertised to the general public. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1999) Download HEADER should not have pointer to nightly builds
[ https://issues.apache.org/jira/browse/SOLR-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891379#action_12891379 ] Sebb commented on SOLR-1999: The download pages are intended for all users of the software, and must only include released (voted on) software. It is not appropriate to mention non-released code on the official page for releases. Download HEADER should not have pointer to nightly builds - Key: SOLR-1999 URL: https://issues.apache.org/jira/browse/SOLR-1999 Project: Solr Issue Type: Bug Environment: http://www.apache.org/dist/lucene/solr/HEADER.html Reporter: Sebb Assignee: Hoss Man The file HEADER.html should not have a pointer to nightly builds. Nightly builds should be reserved for developers, and not advertised to the general public. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1999) Download HEADER should not have pointer to nightly builds
[ https://issues.apache.org/jira/browse/SOLR-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891384#action_12891384 ] Hoss Man commented on SOLR-1999: bq. It is not appropriate to mention non-released code on the official page for releases. why? i can (moderately) understand that we should not encourage non-devleopers to use unofficial versions, and i recognize that linking directly to nightlys from the official release page is a very bad idea .. but how far down the rabbit hole do we have to go to avoid links to links to links to links for nightly builds? Even following the letter of the policy you linked to, i don't see how we anyone could possibly construe that we are encourage(ing) non-developers to download and use nightly builds, snapshots, release candidates, or any other similar package Download HEADER should not have pointer to nightly builds - Key: SOLR-1999 URL: https://issues.apache.org/jira/browse/SOLR-1999 Project: Solr Issue Type: Bug Environment: http://www.apache.org/dist/lucene/solr/HEADER.html Reporter: Sebb Assignee: Hoss Man The file HEADER.html should not have a pointer to nightly builds. Nightly builds should be reserved for developers, and not advertised to the general public. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2011) Solr should get it's temp dir like lucene - first checking the tempDir sys prop
[ https://issues.apache.org/jira/browse/SOLR-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated SOLR-2011: -- Attachment: SOLR-2011.patch attached is an initial patch... (it only fixes solr core, but I think we can fix contrib build.xml's the same way). One benefit is that since temp stuff goes in build/ like lucene: on windows, 'ant clean' will remove spellchecker indexes or other leftover stuff that couldnt be deleted in tearDown(), rather than littering your system temp directory. Solr should get it's temp dir like lucene - first checking the tempDir sys prop --- Key: SOLR-2011 URL: https://issues.apache.org/jira/browse/SOLR-2011 Project: Solr Issue Type: Improvement Components: Build Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: Next Attachments: SOLR-2011.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2555) Remove shared doc stores
[ https://issues.apache.org/jira/browse/LUCENE-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891414#action_12891414 ] Michael Busch commented on LUCENE-2555: --- What shall we do about index backward-compatibility? I guess 4.0 has to be able to read shared doc stores? So a lot of that code we can't remove? :( Remove shared doc stores Key: LUCENE-2555 URL: https://issues.apache.org/jira/browse/LUCENE-2555 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch With per-thread DocumentsWriters sharing doc stores across segments doesn't make much sense anymore. See also LUCENE-2324. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2555) Remove shared doc stores
[ https://issues.apache.org/jira/browse/LUCENE-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891422#action_12891422 ] Jason Rutherglen commented on LUCENE-2555: -- Maybe we should break backwards-compatibility for the RT branch? Or just ship an RT specific JAR to keep things simple? Remove shared doc stores Key: LUCENE-2555 URL: https://issues.apache.org/jira/browse/LUCENE-2555 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch With per-thread DocumentsWriters sharing doc stores across segments doesn't make much sense anymore. See also LUCENE-2324. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2554) preflex codec doesn't order terms correctly
[ https://issues.apache.org/jira/browse/LUCENE-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2554: --- Attachment: LUCENE-2554.patch Fixed the test failures -- all tests should pass. preflex codec doesn't order terms correctly --- Key: LUCENE-2554 URL: https://issues.apache.org/jira/browse/LUCENE-2554 Project: Lucene - Java Issue Type: Test Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2554.patch, LUCENE-2554.patch The surrogate dance in the preflex codec (which must dynamically remap terms from UTF16 order to unicode code point order) is buggy. To better test it, I want to add a test-only codec, preflexrw, that is able to write indices in the pre-flex format. Then we should also fix tests to randomly pick codecs (including preflexrw) so we better test all of our codecs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1999) Download HEADER should not have pointer to nightly builds
[ https://issues.apache.org/jira/browse/SOLR-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891440#action_12891440 ] Yonik Seeley commented on SOLR-1999: I've been around the ASF long enough now to know that what seems like iron clad policy, often isn't. It's often just someone editing a page to reflect what they think should be the policy, and no one else complaining too much - even in cases when there clearly was no consensus. Related to this issue, I remember the last big thread back in '06 on the infra list. And in that case too, it was a single individual that took it upon themselves to add the text you now see (and there certainly was no previous consensus or even discussion on the text added). Trying to draw sharp lines between developers and users is a lost cause... lucene and solr are for developers themselves and it's one big continuum between user and developer. Having people use nightly builds is very important for lucene/solr development. Having a pointer to developer resources from *anywhere* should be fine. The *only* important point I see is to clearly communicate that a nightly build is not an official ASF release. Download HEADER should not have pointer to nightly builds - Key: SOLR-1999 URL: https://issues.apache.org/jira/browse/SOLR-1999 Project: Solr Issue Type: Bug Environment: http://www.apache.org/dist/lucene/solr/HEADER.html Reporter: Sebb Assignee: Hoss Man The file HEADER.html should not have a pointer to nightly builds. Nightly builds should be reserved for developers, and not advertised to the general public. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Build failed in Hudson: Lucene-trunk #1246
See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/1246/changes Changes: [rmuir] add randomRealisticUnicodeString, all chars in the same unicode block [uschindler] As BytesRef has now native order use them in numeric tests. The contents are raw byte[] and no strings, it should compare native [rmuir] add random prefixquerytest (hopefully easy to debug preflex issues with) [rmuir] fix some bytesref abuse in these tests -- [...truncated 2710 lines...] [junit] Testsuite: org.apache.lucene.search.TestPrefixRandom [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 23.153 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestQueryTermVector [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.019 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestQueryWrapperFilter [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.009 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestRegexpQuery [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.028 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestRegexpRandom [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 107.362 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestRegexpRandom2 [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 35.416 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestScoreCachingWrappingScorer [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.02 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestScorerPerf [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.572 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestSetNorm [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.006 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestSimilarity [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.008 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestSimpleExplanations [junit] Tests run: 53, Failures: 0, Errors: 0, Time elapsed: 2.778 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestSimpleExplanationsOfNonMatches [junit] Tests run: 53, Failures: 0, Errors: 0, Time elapsed: 0.133 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestSloppyPhraseQuery [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.253 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestSort [junit] Tests run: 24, Failures: 0, Errors: 0, Time elapsed: 6.451 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestSpanQueryFilter [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.012 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestTermRangeFilter [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 18.939 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestTermRangeQuery [junit] Tests run: 11, Failures: 0, Errors: 0, Time elapsed: 0.047 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestTermScorer [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.011 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestTermVectors [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.321 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestThreadSafe [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 6.786 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestTimeLimitingCollector [junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 1.123 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestTopDocsCollector [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.013 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestTopScoreDocCollector [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.004 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestWildcard [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.038 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestWildcardRandom [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 26.908 sec [junit] [junit] Testsuite: org.apache.lucene.search.function.TestCustomScoreQuery [junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 7.058 sec [junit] [junit] Testsuite: org.apache.lucene.search.function.TestDocValues [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.007 sec [junit] [junit] Testsuite: org.apache.lucene.search.function.TestFieldScoreQuery [junit] Tests run: 12, Failures: 0, Errors: 0, Time elapsed: 0.21 sec [junit] [junit] Testsuite: org.apache.lucene.search.function.TestOrdValues [junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed:
[jira] Created: (LUCENE-2556) CharTermAttribute cloning memory consumption
CharTermAttribute cloning memory consumption Key: LUCENE-2556 URL: https://issues.apache.org/jira/browse/LUCENE-2556 Project: Lucene - Java Issue Type: Improvement Components: Analysis Affects Versions: 3.0.2 Reporter: Adriano Crestani Priority: Minor Fix For: 3.1 The memory consumption problem with cloning a CharTermAttributeImpl object was raised on thread http://markmail.org/thread/bybuerugbk5w2u6z -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2556) CharTermAttribute cloning memory consumption
[ https://issues.apache.org/jira/browse/LUCENE-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adriano Crestani updated LUCENE-2556: - Attachment: CharTermAttributeMemoryConsumptionDemo.java This java application demonstrates how much memory CharTermAttributeImpl.clone() might consume in some scenarios. CharTermAttribute cloning memory consumption Key: LUCENE-2556 URL: https://issues.apache.org/jira/browse/LUCENE-2556 Project: Lucene - Java Issue Type: Improvement Components: Analysis Affects Versions: 3.0.2 Reporter: Adriano Crestani Priority: Minor Fix For: 3.1 Attachments: CharTermAttributeMemoryConsumptionDemo.java The memory consumption problem with cloning a CharTermAttributeImpl object was raised on thread http://markmail.org/thread/bybuerugbk5w2u6z -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2556) CharTermAttribute cloning memory consumption
[ https://issues.apache.org/jira/browse/LUCENE-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adriano Crestani updated LUCENE-2556: - Attachment: lucene_2556_adriano_crestani_07_23_2010.patch This patch optimizes the cloning of the CharTermAttributeImpl internal buffer. It keeps using clone() to clone the internal buffer when CharTermAttribute.length() is at least 150 and at least 75% and of the internal buffer length, otherwise, it uses System.arrayCopy(...) to clone it using CharTermAttribute.length() as the new internal buffer size. It's performing the optimization, because in some scenarios, like cloning long arrays, clone() is usually faster than System.arrayCopy(...). CharTermAttribute cloning memory consumption Key: LUCENE-2556 URL: https://issues.apache.org/jira/browse/LUCENE-2556 Project: Lucene - Java Issue Type: Improvement Components: Analysis Affects Versions: 3.0.2 Reporter: Adriano Crestani Priority: Minor Fix For: 3.1 Attachments: CharTermAttributeMemoryConsumptionDemo.java, lucene_2556_adriano_crestani_07_23_2010.patch The memory consumption problem with cloning a CharTermAttributeImpl object was raised on thread http://markmail.org/thread/bybuerugbk5w2u6z -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2556) CharTermAttribute cloning memory consumption
[ https://issues.apache.org/jira/browse/LUCENE-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2556: -- Attachment: LUCENE-2556.patch Here the patch, I see no problem with applying it to 3.x and trunk. CharTermAttribute cloning memory consumption Key: LUCENE-2556 URL: https://issues.apache.org/jira/browse/LUCENE-2556 Project: Lucene - Java Issue Type: Improvement Components: Analysis Affects Versions: 3.0.2 Reporter: Adriano Crestani Priority: Minor Fix For: 3.1 Attachments: CharTermAttributeMemoryConsumptionDemo.java, LUCENE-2556.patch, lucene_2556_adriano_crestani_07_23_2010.patch The memory consumption problem with cloning a CharTermAttributeImpl object was raised on thread http://markmail.org/thread/bybuerugbk5w2u6z -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Assigned: (LUCENE-2556) CharTermAttribute cloning memory consumption
[ https://issues.apache.org/jira/browse/LUCENE-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reassigned LUCENE-2556: - Assignee: Uwe Schindler CharTermAttribute cloning memory consumption Key: LUCENE-2556 URL: https://issues.apache.org/jira/browse/LUCENE-2556 Project: Lucene - Java Issue Type: Improvement Components: Analysis Affects Versions: 3.0.2 Reporter: Adriano Crestani Assignee: Uwe Schindler Priority: Minor Fix For: 3.1 Attachments: CharTermAttributeMemoryConsumptionDemo.java, LUCENE-2556.patch, lucene_2556_adriano_crestani_07_23_2010.patch The memory consumption problem with cloning a CharTermAttributeImpl object was raised on thread http://markmail.org/thread/bybuerugbk5w2u6z -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2556) CharTermAttribute cloning memory consumption
[ https://issues.apache.org/jira/browse/LUCENE-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891481#action_12891481 ] Uwe Schindler commented on LUCENE-2556: --- {quote} This patch optimizes the cloning of the CharTermAttributeImpl internal buffer. It keeps using clone() to clone the internal buffer when CharTermAttribute.length() is at least 150 and at least 75% and of the internal buffer length, otherwise, it uses System.arrayCopy(...) to clone it using CharTermAttribute.length() as the new internal buffer size. It's performing the optimization, because in some scenarios, like cloning long arrays, clone() is usually faster than System.arrayCopy(...). {quote} Haven't seen your patch yet. I dont know if the two extra calculations rectify the barnching, because terms are mostly short... If we take your patch, the allocations should in all cases be done with ArrayUtils.oversize() to be consistent with the allocation strategy of the rest of CTA. CharTermAttribute cloning memory consumption Key: LUCENE-2556 URL: https://issues.apache.org/jira/browse/LUCENE-2556 Project: Lucene - Java Issue Type: Improvement Components: Analysis Affects Versions: 3.0.2 Reporter: Adriano Crestani Assignee: Uwe Schindler Priority: Minor Fix For: 3.1 Attachments: CharTermAttributeMemoryConsumptionDemo.java, LUCENE-2556.patch, lucene_2556_adriano_crestani_07_23_2010.patch The memory consumption problem with cloning a CharTermAttributeImpl object was raised on thread http://markmail.org/thread/bybuerugbk5w2u6z -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Solr debugging suing eclipse
HI, Can anyone help me with the instructions on how to use eclipse for solr development.I want to configure Solr in eclipse and should be able to debug. Thanks Regard's, Pavan
Re: Solr debugging suing eclipse
create a web project copy all source codes to src copy all jsp to WebContent configure tomcat with -Dsolr.solr.home= 2010/7/23 pavan kumar donepudi pavan.donep...@gmail.com: HI, Can anyone help me with the instructions on how to use eclipse for solr development.I want to configure Solr in eclipse and should be able to debug. Thanks Regard's, Pavan - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org