Hudson build is back to normal : Solr-3.x #96
See https://hudson.apache.org/hudson/job/Solr-3.x/96/changes - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2002) improve build/tests
[ https://issues.apache.org/jira/browse/SOLR-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated SOLR-2002: -- Attachment: SOLR-2002_merged.patch since we merged lucene solr, the build system has been somewhat of a mess. attached is a very early patch thats basically a reboot of the solr build: * it reuses the logic from lucene's build * its significantly faster, especially dependencies with lucene's up2date macros * its nowhere near committable yet One interesting thing found so far the solr contribs basically have their own build systems: and they are hiding exceptions going on behind the scenes when running tests (try the patch to see) The patch doesnt yet work for things like 'dist' or 'example'. at the moment only things like 'ant compile, ant test, ant javadocs' work correctly. additionally the contrib/dataimporthandler 'extras' isnt compiled or tested yet, I think i would like to propose instead we make contrib/dataimporthandler-extras, that depends on the main dataimporthandler, this would really simplify the build. improve build/tests --- Key: SOLR-2002 URL: https://issues.apache.org/jira/browse/SOLR-2002 Project: Solr Issue Type: Task Components: Build Reporter: Robert Muir Assignee: Robert Muir Priority: Minor Fix For: 3.1, 4.0 Attachments: SOLR-2002.patch, SOLR-2002_core_contrib.patch, SOLR-2002_localization.patch, SOLR-2002_lucenetestcase.patch, SOLR-2002_merged.patch, SOLR-2002_replication.patch, SOLR-2002_testiter.patch, SOLR-2002_testmethod.patch, SOLR-2002_timeout.patch, SOLR-2002setupteardown.patch we are working on improving some functionality in lucene's build/tests, it would be good to improve the solr side to take advantage of it. currently its only sorta-kinda integrated and a bit messy. i'd like to do some incremental improvements piece-by-piece on this issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2002) improve build/tests
[ https://issues.apache.org/jira/browse/SOLR-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906689#action_12906689 ] Robert Muir commented on SOLR-2002: --- by the way, i think this really simplifies the contrib builds. its probably hard to see in the patch: but here is the entire contrib/clustering build now {noformat} project name=solr-clustering default=default descriptionClustering Integration./description property name=src.dir location=src/main/java/ property name=tests.src.dir location=src/test/java/ property name=tests.userdir location=src/test/resources/ import file=../contrib-build.xml/ /project {noformat} improve build/tests --- Key: SOLR-2002 URL: https://issues.apache.org/jira/browse/SOLR-2002 Project: Solr Issue Type: Task Components: Build Reporter: Robert Muir Assignee: Robert Muir Priority: Minor Fix For: 3.1, 4.0 Attachments: SOLR-2002.patch, SOLR-2002_core_contrib.patch, SOLR-2002_localization.patch, SOLR-2002_lucenetestcase.patch, SOLR-2002_merged.patch, SOLR-2002_replication.patch, SOLR-2002_testiter.patch, SOLR-2002_testmethod.patch, SOLR-2002_timeout.patch, SOLR-2002setupteardown.patch we are working on improving some functionality in lucene's build/tests, it would be good to improve the solr side to take advantage of it. currently its only sorta-kinda integrated and a bit messy. i'd like to do some incremental improvements piece-by-piece on this issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
solr getUniqueTermCount() when multiple segments?
Hello- I'm looking at using the new terms.getUniqueTermCount() to give a quick count for the LukeRequestHandler rather then needing to walk all the terms. When solr index reader has just one segment, it works great. However with more segments I get: java.lang.UnsupportedOperationException: this reader does not implement getUniqueTermCount() at org.apache.lucene.index.Terms.getUniqueTermCount(Terms.java:84) Is this expected? Is there any way around that? I am getting the terms using: Terms terms = MultiFields.getTerms(reader, fieldName); long cnt = (terms==null) ? 0 : terms.getUniqueTermCount(); Thanks ryan - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2106) Spelling Checking for Multiple Fields
Spelling Checking for Multiple Fields - Key: SOLR-2106 URL: https://issues.apache.org/jira/browse/SOLR-2106 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 1.4 Environment: Linux Environment Reporter: JAYABAALAN V Fix For: 1.4 Need to enable spellchecking for five different field and it's configuration.I am using dismax query parser for searching the different fields in the simple.If user has entered a wrong spelling in the front end.It should check in the five different fields and give collate spelling suggestion in the front end and should get a result based on the spelling suggestion.Do provide your configuration details for the same... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Hudson build is back to normal : Solr-trunk #1240
See https://hudson.apache.org/hudson/job/Solr-trunk/1240/changes - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2464) FastVectorHighlighter: add a FragmentBuilder to return entire field contents
[ https://issues.apache.org/jira/browse/LUCENE-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906709#action_12906709 ] Lukas Vlcek commented on LUCENE-2464: - I found that even if the SingleFragListBuilder is used then client has explicitly ensure that numberOfFragments 0 otherwise highlighter produces empty output. The thing is that {noformat} FastVectorHighlighter.getBestFragments( final FieldQuery fieldQuery, IndexReader reader, int docId, String fieldName, int fragCharSize, int maxNumFragments );{noformat} delegates to {noformat} BaseFragmentsBuilder.createFragments( IndexReader reader, int docId, String fieldName, FieldFragList fieldFragList, int maxNumFragments, String[] preTags, String[] postTags, Encoder encoder);{noformat} which needs to be passed maxNumFragments 0 in order to produce any non-empty output. FastVectorHighlighter: add a FragmentBuilder to return entire field contents Key: LUCENE-2464 URL: https://issues.apache.org/jira/browse/LUCENE-2464 Project: Lucene - Java Issue Type: Improvement Components: contrib/highlighter Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 3.1 Attachments: LUCENE-2464.patch In Highlightrer, there is a Nullfragmenter. There is a requirement its counterpart in FastVectorhighlighter. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: solr getUniqueTermCount() when multiple segments?
This is expected/intentional, because computing the true unique term count across multiple segments is exceptionally costly (you have to do the merge sort to de-dup). If you really want the true count, you can pull the TermsEnum and .next() until exhaustion. Alternatively, you can use IndexReader.getSequentialSubReaders(), then step through each SegReader calling its .getUniqueTermCount() and then somehow approximate (eg the sum will be an upper bound of the total unique count). Mike On Tue, Sep 7, 2010 at 2:34 AM, Ryan McKinley ryan...@gmail.com wrote: Hello- I'm looking at using the new terms.getUniqueTermCount() to give a quick count for the LukeRequestHandler rather then needing to walk all the terms. When solr index reader has just one segment, it works great. However with more segments I get: java.lang.UnsupportedOperationException: this reader does not implement getUniqueTermCount() at org.apache.lucene.index.Terms.getUniqueTermCount(Terms.java:84) Is this expected? Is there any way around that? I am getting the terms using: Terms terms = MultiFields.getTerms(reader, fieldName); long cnt = (terms==null) ? 0 : terms.getUniqueTermCount(); Thanks ryan - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Build failed in Hudson: Lucene-trunk #1281
The failure was in TestIndexWriter.testThreadInterruptDeadlock: [junit] java.lang.NoClassDefFoundError: org/apache/lucene/util/ThreadInterruptedException$__CLR2_6_3c0c0gds5twgh [junit] at org.apache.lucene.util.ThreadInterruptedException.init(ThreadInterruptedException.java:28) [junit] at org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:304) [junit] at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2543) [junit] at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2538) [junit] at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2534) [junit] at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3212) [junit] at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2025) [junit] at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1979) [junit] at org.apache.lucene.index.TestIndexWriter$IndexerThreadInterrupt.run(TestIndexWriter.java:4398) I think it's a false failure. I'm pretty the cause is that an interrupt arrived as the class loader was trying to init the ThreadInterruptedException... somehow this (receiving thread interrupts) screws up the class loader. The test already prevents interrupts until things are warmed up first, but this class only gets loaded on the first interrupt. I'll commit a fix, to make sure this class is loaded before any interrupts are sent. Thread interrupting is dangerous!! Mike On Tue, Sep 7, 2010 at 1:40 AM, Apache Hudson Server hud...@hudson.apache.org wrote: See https://hudson.apache.org/hudson/job/Lucene-trunk/1281/ -- [...truncated 13264 lines...] [javadoc] Standard Doclet version 1.5.0_22 [javadoc] Building tree for all the packages and classes... [javadoc] https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/contrib/misc/src/java/org/apache/lucene/index/MultiPassIndexSplitter.java:43: warning - Tag @link: reference not found: IndexWriter#addIndexes(IndexReader[]) [javadoc] https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/contrib/misc/src/java/org/apache/lucene/store/DirectIOLinuxDirectory.java:44: warning - Tag @link: reference not found: Directory [javadoc] https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/contrib/misc/src/java/org/apache/lucene/store/DirectIOLinuxDirectory.java:63: warning - Tag @link: reference not found: NativeFSLockFactory [javadoc] https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/contrib/misc/src/java/org/apache/lucene/store/DirectIOLinuxDirectory.java:44: warning - Tag @link: reference not found: Directory [javadoc] Building index for all the packages and classes... [javadoc] https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/contrib/misc/src/java/org/apache/lucene/store/DirectIOLinuxDirectory.java:44: warning - Tag @link: reference not found: Directory [javadoc] Building index for all classes... [javadoc] Generating https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/docs/api/contrib-misc/stylesheet.css... [javadoc] Note: Custom tags that were not seen: �...@lucene.internal [javadoc] 5 warnings [jar] Building jar: https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/contrib/misc/lucene-misc-4.0-2010-09-07_02-03-49-javadoc.jar [echo] Building queries... javadocs: [mkdir] Created dir: https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/docs/api/contrib-queries [javadoc] Generating Javadoc [javadoc] Javadoc execution [javadoc] Loading source files for package org.apache.lucene.search... [javadoc] Loading source files for package org.apache.lucene.search.regex... [javadoc] Loading source files for package org.apache.lucene.search.similar... [javadoc] Constructing Javadoc information... [javadoc] Standard Doclet version 1.5.0_22 [javadoc] Building tree for all the packages and classes... [javadoc] https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/contrib/queries/src/java/org/apache/lucene/search/regex/JakartaRegexpCapabilities.java:35: warning - Tag @link: can't find prefix in org.apache.lucene.search.regex.JakartaRegexpCapabilities [javadoc] https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/contrib/queries/src/java/org/apache/lucene/search/regex/RegexCapabilities.java:36: warning - Tag @link: reference not found: RegexTermEnum [javadoc] https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/contrib/queries/src/java/org/apache/lucene/search/regex/RegexCapabilities.java:36: warning - Tag @link: reference not found: RegexTermEnum [javadoc] https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/contrib/queries/src/java/org/apache/lucene/search/regex/JavaUtilRegexCapabilities.java:33: warning - Tag @link: can't find prefix in org.apache.lucene.search.regex.JavaUtilRegexCapabilities
Re: Transient TestIndexWriterMergePolicy failure under IntelliJ
Thanks for reporting Steven! This is LUCENE-2118, striking again, taunting me. This particular failure bugs me!! Mike On Mon, Sep 6, 2010 at 8:10 PM, Steven A Rowe sar...@syr.edu wrote: While testing changes for LUCENE-2611, I saw TestIndexWriterMergePolicy.testMaxBufferedDocsChange() fail, but I wasn't able to replicate it either from IntelliJ or from Ant after adding the seed to the newRandom() call in TestIndexWriterMergePolicy.setUp(). Environment: Sun JDK 1.6.0_13, Windows Vista, both 64-bit; IntelliJ IDEA 9.0.3. When I saw this error, I was running two modules' tests in parallel from IntelliJ, and was working on adding tempDir sysprop setting to test invocations from IntelliJ, so the probability that there was something weird about my local setup is non-trivial. Here is the output from IntelliJ: - NOTE: random codec of testcase 'testMaxBufferedDocsChange' was: MockSep NOTE: random locale of testcase 'testMaxBufferedDocsChange' was: en_PH NOTE: random timezone of testcase 'testMaxBufferedDocsChange' was: America/Indianapolis NOTE: random seed of testcase 'testMaxBufferedDocsChange' was: 4118460220441676374 junit.framework.AssertionFailedError: maxMergeDocs=2147483647; numSegments=11; upperBound=10; mergeFactor=10; segs=_65:c5950 _5t:c10-_32 _5u:c10-_32 _5v:c10-_32 _5w:c10-_32 _5x:c10-_32 _5y:c10-_32 _5z:c10-_32 _60:c10-_32 _61:c10-_32 _62:c1-_32 _63:c9-_62 at org.apache.lucene.index.TestIndexWriterMergePolicy.checkInvariants(TestIndexWriterMergePolicy.java:251) at org.apache.lucene.index.TestIndexWriterMergePolicy.testMaxBufferedDocsChange(TestIndexWriterMergePolicy.java:177) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:395) at org.apache.lucene.util.LuceneTestCase.run(LuceneTestCase.java:387) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) at org.junit.runners.Suite.runChild(Suite.java:128) at org.junit.runners.Suite.runChild(Suite.java:24) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.junit.runner.JUnitCore.run(JUnitCore.java:157) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:94) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:192) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:64) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:115) - Steve - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks
[ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906723#action_12906723 ] Michael McCandless commented on LUCENE-2573: bq. We probably need a test that delays the flush process, otherwise flushing to RAM occurs too fast to proceed to the next tier. We can modify MockRAMDir to optionally take its sweet time when writing certain files? {quote} I'm not sure if after a DWPT is flushing we need to decrement what would effectively be a projected RAM usage post current DWPT flush completion. Otherwise we could in many cases, start the flush of most/all of the DWPTs. {quote} But shouldn't tiered flushing take care of this? Ie you only decr RAM consumed when the flush of the DWPT finishes, not before? bq. The DWPT that happens to exceed the first tier, is flushed out. This was easier to implement than finding the highest RAM consuming DWPT and flushing it, from a different thread. Hmm but this won't be most efficient, in general? Ie we could end up creating tiny segments depending on luck-of-the-thread-scheduling? bq. I did a search through the code and ByteBlockAllocator.perDocAllocator has no references, it can probably be removed, unless there was some other intention for it. I think this makes sense -- each DWPT now immediately flushes to its private doc store files, so there's no longer a need to track per-doc pending RAM? {quote} In DocumentsWriterRAMAllocator, we're only recording the addition of more bytes when a new block is created, however because previous blocks may be recycled, it is the recycled blocks that are not being recorded as bytes used. Should we record all allocated blocks as in use ie, count them as bytes used, or wait until they are in use again to be counted as consuming RAM? {quote} I think we have to track both. If a buffer is not in the pool (ie not free), then it's in use and we count that as RAM used, and that counter is used to trigger tiered flushing. Separately we have to track net allocated, in order to trim the buffers (drop them, so GC can reclaim) when we are over the .setRAMBufferSizeMB. Tiered flushing of DWPTs by RAM with low/high water marks - Key: LUCENE-2573 URL: https://issues.apache.org/jira/browse/LUCENE-2573 Project: Lucene - Java Issue Type: Improvement Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2573.patch Now that we have DocumentsWriterPerThreads we need to track total consumed RAM across all DWPTs. A flushing strategy idea that was discussed in LUCENE-2324 was to use a tiered approach: - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM) - Flush all DWPTs at a high water mark (e.g. at 110%) - Use linear steps in between high and low watermark: E.g. when 5 DWPTs are used, flush at 90%, 95%, 100%, 105% and 110%. Should we allow the user to configure the low and high water mark values explicitly using total values (e.g. low water mark at 120MB, high water mark at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB() config method and use something like 90% and 110% for the water marks? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: solr getUniqueTermCount() when multiple segments?
Ahh -- this makes sense. I thought it was too good to be true! On Tue, Sep 7, 2010 at 4:45 AM, Michael McCandless luc...@mikemccandless.com wrote: This is expected/intentional, because computing the true unique term count across multiple segments is exceptionally costly (you have to do the merge sort to de-dup). If you really want the true count, you can pull the TermsEnum and .next() until exhaustion. Alternatively, you can use IndexReader.getSequentialSubReaders(), then step through each SegReader calling its .getUniqueTermCount() and then somehow approximate (eg the sum will be an upper bound of the total unique count). Mike On Tue, Sep 7, 2010 at 2:34 AM, Ryan McKinley ryan...@gmail.com wrote: Hello- I'm looking at using the new terms.getUniqueTermCount() to give a quick count for the LukeRequestHandler rather then needing to walk all the terms. When solr index reader has just one segment, it works great. However with more segments I get: java.lang.UnsupportedOperationException: this reader does not implement getUniqueTermCount() at org.apache.lucene.index.Terms.getUniqueTermCount(Terms.java:84) Is this expected? Is there any way around that? I am getting the terms using: Terms terms = MultiFields.getTerms(reader, fieldName); long cnt = (terms==null) ? 0 : terms.getUniqueTermCount(); Thanks ryan - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks
[ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906798#action_12906798 ] Jason Rutherglen commented on LUCENE-2573: -- bq. shouldn't tiered flushing take care of this Faulty thinking for a few minutes. {quote}but this won't be most efficient, in general? Ie we could end up creating tiny segments depending on luck-of-the-thread-scheduling?{quote} True. Instead, we may want to simply not-flush the current DWPT if it is in fact not the highest RAM user. When addDoc is called on the thread with the highest RAM usage, we can then flush it. bq. there's no longer a need to track per-doc pending RAM I'll remove it from the code. {quote}If a buffer is not in the pool (ie not free), then it's in use and we count that as RAM used{quote} Ok, I'll make the change. {quote}we have to track net allocated, in order to trim the buffers (drop them, so GC can reclaim) when we are over the .setRAMBufferSizeMB{quote} I haven't seen this in the realtime branch. Reclamation of extra allocated free blocks may need to be reimplemented. I'll increment num bytes used when a block is returned for use. On this topic, do you have any thoughts yet about how to make the block pools concurrent? I'm still leaning towards a random access file (seek style) interface because this is easy to make concurrent, and hides the underlying block management mechanism, rather than directly exposes it like today, which can lend itself to problematic usage in the future. Tiered flushing of DWPTs by RAM with low/high water marks - Key: LUCENE-2573 URL: https://issues.apache.org/jira/browse/LUCENE-2573 Project: Lucene - Java Issue Type: Improvement Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2573.patch Now that we have DocumentsWriterPerThreads we need to track total consumed RAM across all DWPTs. A flushing strategy idea that was discussed in LUCENE-2324 was to use a tiered approach: - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM) - Flush all DWPTs at a high water mark (e.g. at 110%) - Use linear steps in between high and low watermark: E.g. when 5 DWPTs are used, flush at 90%, 95%, 100%, 105% and 110%. Should we allow the user to configure the low and high water mark values explicitly using total values (e.g. low water mark at 120MB, high water mark at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB() config method and use something like 90% and 110% for the water marks? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks
[ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906801#action_12906801 ] Jason Rutherglen commented on LUCENE-2573: -- bq. We can modify MockRAMDir to optionally take its sweet time when writing certain files? Yes, I think we need to implement something of this nature. We *could* even randomly assign a different delay value per flush. Of course how the test would instigate this from outside of DW, is somewhat of a different issue. Tiered flushing of DWPTs by RAM with low/high water marks - Key: LUCENE-2573 URL: https://issues.apache.org/jira/browse/LUCENE-2573 Project: Lucene - Java Issue Type: Improvement Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2573.patch Now that we have DocumentsWriterPerThreads we need to track total consumed RAM across all DWPTs. A flushing strategy idea that was discussed in LUCENE-2324 was to use a tiered approach: - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM) - Flush all DWPTs at a high water mark (e.g. at 110%) - Use linear steps in between high and low watermark: E.g. when 5 DWPTs are used, flush at 90%, 95%, 100%, 105% and 110%. Should we allow the user to configure the low and high water mark values explicitly using total values (e.g. low water mark at 120MB, high water mark at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB() config method and use something like 90% and 110% for the water marks? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2002) improve build/tests
[ https://issues.apache.org/jira/browse/SOLR-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906806#action_12906806 ] Yonik Seeley commented on SOLR-2002: Sounds cool! Whatever those strong in ant-foo come up with is fine with me! improve build/tests --- Key: SOLR-2002 URL: https://issues.apache.org/jira/browse/SOLR-2002 Project: Solr Issue Type: Task Components: Build Reporter: Robert Muir Assignee: Robert Muir Priority: Minor Fix For: 3.1, 4.0 Attachments: SOLR-2002.patch, SOLR-2002_core_contrib.patch, SOLR-2002_localization.patch, SOLR-2002_lucenetestcase.patch, SOLR-2002_merged.patch, SOLR-2002_replication.patch, SOLR-2002_testiter.patch, SOLR-2002_testmethod.patch, SOLR-2002_timeout.patch, SOLR-2002setupteardown.patch we are working on improving some functionality in lucene's build/tests, it would be good to improve the solr side to take advantage of it. currently its only sorta-kinda integrated and a bit messy. i'd like to do some incremental improvements piece-by-piece on this issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906812#action_12906812 ] Andrzej Bialecki commented on SOLR-1316: - I added license headers and committed the patch in rev. 993367 - thank you! Create autosuggest component Key: SOLR-1316 URL: https://issues.apache.org/jira/browse/SOLR-1316 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.4 Reporter: Jason Rutherglen Assignee: Andrzej Bialecki Priority: Minor Fix For: Next Attachments: SOLR-1316.patch, SOLR-1316.patch, SOLR-1316.patch, SOLR-1316.patch, SOLR-1316.patch, SOLR-1316.patch, SOLR-1316.patch, SOLR-1316.patch, SOLR-1316.patch, SOLR-1316.patch, SOLR-1316.patch, SOLR-1316_3x-2.patch, SOLR-1316_3x.patch, suggest.patch, suggest.patch, suggest.patch, TST.zip Original Estimate: 96h Remaining Estimate: 96h Autosuggest is a common search function that can be integrated into Solr as a SearchComponent. Our first implementation will use the TernaryTree found in Lucene contrib. * Enable creation of the dictionary from the index or via Solr's RPC mechanism * What types of parameters and settings are desirable? * Hopefully in the future we can include user click through rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2002) improve build/tests
[ https://issues.apache.org/jira/browse/SOLR-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906826#action_12906826 ] Robert Muir commented on SOLR-2002: --- thanks, the major thing left is to consolidate release management-type things (e.g. rat reporting tasks, dist/packaging, artifact signing, checksumming, etc). most of this is really inappropriate the way it is in lucene's build, because its standalone in lucene's build.xml and not reusable to modules and solr. for example: 'rat-sources' just runs on a hardcoded src/java for lucene-core. so we need to fix this kind of stuff anyway so that things in modules/ can actually ever release, no way to do this at the moment. improve build/tests --- Key: SOLR-2002 URL: https://issues.apache.org/jira/browse/SOLR-2002 Project: Solr Issue Type: Task Components: Build Reporter: Robert Muir Assignee: Robert Muir Priority: Minor Fix For: 3.1, 4.0 Attachments: SOLR-2002.patch, SOLR-2002_core_contrib.patch, SOLR-2002_localization.patch, SOLR-2002_lucenetestcase.patch, SOLR-2002_merged.patch, SOLR-2002_replication.patch, SOLR-2002_testiter.patch, SOLR-2002_testmethod.patch, SOLR-2002_timeout.patch, SOLR-2002setupteardown.patch we are working on improving some functionality in lucene's build/tests, it would be good to improve the solr side to take advantage of it. currently its only sorta-kinda integrated and a bit messy. i'd like to do some incremental improvements piece-by-piece on this issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks
[ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-2573: - Attachment: LUCENE-2573.patch * perDocAllocator is removed from DocumentsWriterRAMAllocator * getByteBlock and getIntBlock always increments the numBytesUsed The test that simply prints out debugging messages looks better. I need to figure out unit tests. Tiered flushing of DWPTs by RAM with low/high water marks - Key: LUCENE-2573 URL: https://issues.apache.org/jira/browse/LUCENE-2573 Project: Lucene - Java Issue Type: Improvement Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2573.patch, LUCENE-2573.patch Now that we have DocumentsWriterPerThreads we need to track total consumed RAM across all DWPTs. A flushing strategy idea that was discussed in LUCENE-2324 was to use a tiered approach: - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM) - Flush all DWPTs at a high water mark (e.g. at 110%) - Use linear steps in between high and low watermark: E.g. when 5 DWPTs are used, flush at 90%, 95%, 100%, 105% and 110%. Should we allow the user to configure the low and high water mark values explicitly using total values (e.g. low water mark at 120MB, high water mark at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB() config method and use something like 90% and 110% for the water marks? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks
[ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906918#action_12906918 ] Jason Rutherglen commented on LUCENE-2573: -- The last patch also only flushes a DWPT if it's the highest RAM consumer. Tiered flushing of DWPTs by RAM with low/high water marks - Key: LUCENE-2573 URL: https://issues.apache.org/jira/browse/LUCENE-2573 Project: Lucene - Java Issue Type: Improvement Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2573.patch, LUCENE-2573.patch Now that we have DocumentsWriterPerThreads we need to track total consumed RAM across all DWPTs. A flushing strategy idea that was discussed in LUCENE-2324 was to use a tiered approach: - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM) - Flush all DWPTs at a high water mark (e.g. at 110%) - Use linear steps in between high and low watermark: E.g. when 5 DWPTs are used, flush at 90%, 95%, 100%, 105% and 110%. Should we allow the user to configure the low and high water mark values explicitly using total values (e.g. low water mark at 120MB, high water mark at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB() config method and use something like 90% and 110% for the water marks? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
bug triggered by TestIndexWriter.testRandomStoredFields
Hello, I've tripped on this a few times lately, but never been able to reproduce it: it seems now i am able to reproduce it now semi-consistently with the below configuration. It would be great if someone else could try this out and see if its a real problem, or if its just my machine. occasionally i see a very nasty result from TestIndexWriter.testRandomStoredFields: either a read past EOF, IndexOutOfBounds, NegativeArraySizeException, or field X is wrong, expected nonsense unicode actual different nonsense unicode Here are my steps to reproduce: 1. edit line 87 of TestIndexWriter to plugin the seed: random = newRandom(3312389322103990899L); 2. run this command: ant clean test-core -Dtestcase=TestIndexWriter -Dtestmethod=testRandomStoredFields -Dtests.iter=10 -Dtests.codec=MockVariableIntBlock(29) I used 10 iterations here, as it will usually fail with this seed and # of iterations for me. furthermore, if i comment out lines 5179 and 5180 from TestIndexWriter so that it no longer randomly deletes documents, the test will always pass: //w.deleteDocuments(new Term(id, delID)); //docs.remove(delID); -- Robert Muir rcm...@gmail.com
Re: Getting facets for a field from within a SearchComponent
: I'm writing my first SearchComponent to do custom calculations on search : results. Is it possible to get the facet values for a field from within a : SearchComponent? I've thought of adapting the StatsComponent and : FieldFacetStats classes to try and accomplish this. But before I try that, : is there an API call I could make instead? 1) if you configure your component to run after the FacetComponent, then the result will already have the facet values available for you access. 2) the faceting code is the StatsComponent makes a lot of bad assumptions, so it has some known bugs -- i would not recomend borrowing that code. -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump! - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2052) Allow for a list of filter queries and a single docset filter in QueryComponent
[ https://issues.apache.org/jira/browse/SOLR-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Green updated SOLR-2052: Attachment: SOLR-2052-2.patch Updated patch that fixes a bug when combining filter docsets and filter queries. Allow for a list of filter queries and a single docset filter in QueryComponent --- Key: SOLR-2052 URL: https://issues.apache.org/jira/browse/SOLR-2052 Project: Solr Issue Type: Improvement Components: search Affects Versions: 4.0 Environment: Mac OS X, Java 1.6 Reporter: Stephen Green Priority: Minor Fix For: 1.4.2 Attachments: SOLR-2052-2.patch, SOLR-2052.patch SolrIndexSearcher.QueryCommand allows you to specify a list of filter queries or a single filter (as a DocSet), but not both. This restriction seems arbitrary, and there are cases where we can have both a list of filter queries and a DocSet generated by some other non-query process (e.g., filtering documents according to IDs pulled from some other source like a database.) Fixing this requires a few small changes to SolrIndexSearcher to allow both of these to be set for a QueryCommand and to take both into account when evaluating the query. It also requires a modification to ResponseBuilder to allow setting the single filter at query time. I've run into this against 1.4, but the same holds true for the trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2105) RequestHandler param update.processor is confusing
[ https://issues.apache.org/jira/browse/SOLR-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-2105: -- Attachment: SOLR-2105.patch The attached patch renames the parameter, both in code and config. Tests run after applying it, but I have not done regression testing of the functionality. RequestHandler param update.processor is confusing -- Key: SOLR-2105 URL: https://issues.apache.org/jira/browse/SOLR-2105 Project: Solr Issue Type: Improvement Components: update Affects Versions: 1.4.1 Reporter: Jan Høydahl Priority: Minor Attachments: SOLR-2105.patch Today we reference a custom updateRequestProcessorChain using the update request parameter update.processor. See http://wiki.apache.org/solr/SolrConfigXml#UpdateRequestProcessorChain_section This is confusing, since what we are really referencing is not an UpdateProcessor, but an updateRequestProcessorChain. I propose that update.processor is renamed as update.chain or similar -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks
[ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-2573: - Attachment: LUCENE-2573.patch There was a small bug in the choice of the max DWPT, in that all DWPTs, including ones that were scheduled to flush were being compared against the current DWPT (ie the one being examined for possible flushing). Tiered flushing of DWPTs by RAM with low/high water marks - Key: LUCENE-2573 URL: https://issues.apache.org/jira/browse/LUCENE-2573 Project: Lucene - Java Issue Type: Improvement Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch Now that we have DocumentsWriterPerThreads we need to track total consumed RAM across all DWPTs. A flushing strategy idea that was discussed in LUCENE-2324 was to use a tiered approach: - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM) - Flush all DWPTs at a high water mark (e.g. at 110%) - Use linear steps in between high and low watermark: E.g. when 5 DWPTs are used, flush at 90%, 95%, 100%, 105% and 110%. Should we allow the user to configure the low and high water mark values explicitly using total values (e.g. low water mark at 120MB, high water mark at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB() config method and use something like 90% and 110% for the water marks? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2108) ReversedWildcardFilter can create false positives
ReversedWildcardFilter can create false positives - Key: SOLR-2108 URL: https://issues.apache.org/jira/browse/SOLR-2108 Project: Solr Issue Type: Bug Reporter: Robert Muir Priority: Minor Fix For: 4.0 Reported from the userlist: {noformat} For instance, the query *zemog* matches documents that contain Gomez {noformat} http://www.lucidimagination.com/search/document/35abfdabfcec99b7/false_matches_with_reversedwildcardfilterfactory -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2108) ReversedWildcardFilter can create false positives
[ https://issues.apache.org/jira/browse/SOLR-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated SOLR-2108: -- Attachment: SOLR-2108.patch Simple fix: if we are doing a wildcard query on a reversed field, but we *are not* going to reverse it, we must subtract the set of reversed terms (markerChar*) from the query dfa as these could be false positives. I also added a basic test. ReversedWildcardFilter can create false positives - Key: SOLR-2108 URL: https://issues.apache.org/jira/browse/SOLR-2108 Project: Solr Issue Type: Bug Reporter: Robert Muir Priority: Minor Fix For: 4.0 Attachments: SOLR-2108.patch Reported from the userlist: {noformat} For instance, the query *zemog* matches documents that contain Gomez {noformat} http://www.lucidimagination.com/search/document/35abfdabfcec99b7/false_matches_with_reversedwildcardfilterfactory -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2107) MoreLikeThisHandler doesn't work with alternate qparsers
[ https://issues.apache.org/jira/browse/SOLR-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-2107: --- Attachment: SOLR-2107.patch Here's a patch that adds qparser support for q and fq params. MoreLikeThisHandler doesn't work with alternate qparsers Key: SOLR-2107 URL: https://issues.apache.org/jira/browse/SOLR-2107 Project: Solr Issue Type: Bug Reporter: Yonik Seeley Attachments: SOLR-2107.patch In the MoreLikeThisHandler, Lucene syntax is assumed, and no other query parser can be invoked. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (SOLR-2107) MoreLikeThisHandler doesn't work with alternate qparsers
[ https://issues.apache.org/jira/browse/SOLR-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved SOLR-2107. Fix Version/s: 4.0 Resolution: Fixed MoreLikeThisHandler doesn't work with alternate qparsers Key: SOLR-2107 URL: https://issues.apache.org/jira/browse/SOLR-2107 Project: Solr Issue Type: Bug Reporter: Yonik Seeley Fix For: 4.0 Attachments: SOLR-2107.patch In the MoreLikeThisHandler, Lucene syntax is assumed, and no other query parser can be invoked. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Re: About Solr DataImportHandler
Thank you for your reply,it is very import to me. 1.I agree with you by i read solr's source code,i found that it can resolve this problem by config db-data-config.xml,like this(my database's Sqlserver2005,other database will unavailable): dataSource name=dsSqlServer type=JdbcDataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver batchSize=3000 url=jdbc:sqlserver://192.168.1.5:1433; DatabaseName=testDatabase;responseBuffering=adaptive;selectMethod=cursor user=sa password=12345 / add responseBuffering=adaptive;selectMethod=cursor into url attribute,and Solr will set these parameters by itself: c.createStatement(ResultSet.TYPE_FORWARD_ONLY,ResultSet.CONCUR_READ_ONLY); By these configs,Solr can import big table's datas to index dir. 2.But there are some problems: if the table is very big,solr will spend a long time to import and index,may be one day and more.so once occurred network problems and others during this time,maybe solr can not remember what documents had processed,and if we continue data import ,we do not know where to start. 3.i am sorry for my bad English.i wish you can know what i mean. 2010-09-08 郭芸 发件人: Alexey Serba 发送时间: 2010-09-07 16:07:49 收件人: dev 抄送: 主题: Re: About Solr DataImportHandler i found that Solr import the datas to memory first,then write them to index dir. That's not really true. DataImportHandler streams the result from database query and adding documents into index. So it shouldn't load all database data into memory. Disabling autoCommit, warming queries and spellcheckers usually decreases required amount of memory during indexing process. Please share your hardware details, jvm options, solrconfig and schema configuration, etc. 2010/9/7 郭芸 mickey.guo...@gmail.com: Dear all: I use Solr DataImportHandler's JdbcDataSource to import the Sqlsever 2005's datas to Solr,but My table is versy big,about 300G.and i found that Solr import the datas to memory first,then write them to index dir.So if the datas are too big,there will trigger an OutOfMemoryException. I want to solve this problem,and how can ti do it?anybody can help me?Thank you. 2010-09-07 郭芸 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations
[ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907069#action_12907069 ] Jason Rutherglen commented on LUCENE-2575: -- bq. every term has its own open IndexOutput I'm not seeing IndexOutput in use with the RAM buffer, do you mean the the write* (writeVInt, writeBytes, writeByte) methods of TermsHashPerField? Included in this patch will need to be a way to concurrently grow other arrays such as ParallelPostingsArray. PPA is used to store pointers to data stored in the block pools. Maybe we need a class that concurrently manages growing arrays and block pools. Or we may need to slightly re-architect how we're storing the RAM buffer data so that concurrency can be guaranteed, ie, I think we'll need to write to temporary arrays, which are then flushed to primary readable arrays. The flush would occur after adding a document, or probably for better efficiency, only when getReader is called. Concurrent byte and int block implementations - Key: LUCENE-2575 URL: https://issues.apache.org/jira/browse/LUCENE-2575 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: Realtime Branch Reporter: Jason Rutherglen Fix For: Realtime Branch The current *BlockPool implementations aren't quite concurrent. We really need something that has a locking flush method, where flush is called at the end of adding a document. Once flushed, the newly written data would be available to all other reading threads (ie, postings etc). I'm not sure I understand the slices concept, it seems like it'd be easier to implement a seekable random access file like API. One'd seek to a given position, then read or write from there. The underlying management of byte arrays could then be hidden? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1665) Add debugTimings param so that timings for components can be retrieved without having to do explains(), as in debugQuery
[ https://issues.apache.org/jira/browse/SOLR-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907080#action_12907080 ] Yonik Seeley commented on SOLR-1665: Due to the cost of distributed search tests, I removed DistributedDebugComponentTest and moved the debug tests to TestDistributedSearch. Add debugTimings param so that timings for components can be retrieved without having to do explains(), as in debugQuery -- Key: SOLR-1665 URL: https://issues.apache.org/jira/browse/SOLR-1665 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 4.0 Attachments: SOLR-1665.patch, SOLR-1665.patch, SOLR-1665.patch, SOLR-1665.patch As the title says, it would be great if we could just get back component timings w/o having to do the full boat of explains and other stuff. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: About Solr DataImportHandler
Try to set the batchsize as -1. 2010-09-08 傅顺开 苏州广达友讯技术有限公司 江苏苏州工业园区金鸡湖大道1355号 国际科技园151A,215021 电话:(512)6288-8255(转612) 传真:(512)6288-8155 手机:(0)158-5018-8480 email:f...@peptalk.cn http://www.bedo.cn, http://k.ai, http://www.lbs.org.cn 发件人: 郭芸 发送时间: 2010-09-07 09:55:05 收件人: Solr Lucene 抄送: 主题: About Solr DataImportHandler Dear all: I use Solr DataImportHandler's JdbcDataSource to import the Sqlsever 2005's datas to Solr,but My table is versy big,about 300G.and i found that Solr import the datas to memory first,then write them to index dir.So if the datas are too big,there will trigger an OutOfMemoryException. I want to solve this problem,and how can ti do it?anybody can help me?Thank you. 2010-09-07 郭芸
Re: Re: About Solr DataImportHandler
2.But there are some problems: if the table is very big,solr will spend a long time to import and index,may be one day and more.so once occurred network problems and others during this time,maybe solr can not remember what documents had processed,and if we continue data import ,we do not know where to start. You can _batch_ import your data using full import command by providing additional request parameter ( see http://wiki.apache.org/solr/DataImportHandler#Accessing_request_parameters ), i.e. query=SELECT * FROM my_table ORDER BY id LIMIT 100 OFFSET ${dataimporter.request.offset} and then calling full-import command several times: 1) /dataimport?clean=trueoffset=0 2) /dataimport?clean=falseoffset=100 3) /dataimport?clean=falseoffset=200 etc // Please use solr-u...@lucene.apache.org mailing list for such questions. _dev_ is not appropriate place for this. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org