Hudson build is back to normal : Solr-3.x #96

2010-09-07 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Solr-3.x/96/changes



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2002) improve build/tests

2010-09-07 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-2002:
--

Attachment: SOLR-2002_merged.patch

since we merged lucene  solr, the build system has been somewhat of a mess.

attached is a very early patch thats basically a reboot of the solr build:
* it reuses the logic from lucene's build
* its significantly faster, especially dependencies with lucene's up2date macros
* its nowhere near committable yet

One interesting thing found so far the solr contribs basically have their own 
build systems: and they are hiding exceptions going on behind the scenes when 
running tests (try the patch to see)

The patch doesnt yet work for things like 'dist' or 'example'. at the moment 
only things like 'ant compile, ant test, ant javadocs' work correctly.

additionally the contrib/dataimporthandler 'extras' isnt compiled or tested 
yet, I think i would like to propose instead we make 
contrib/dataimporthandler-extras, that depends on the main dataimporthandler, 
this would really simplify the build.


 improve build/tests
 ---

 Key: SOLR-2002
 URL: https://issues.apache.org/jira/browse/SOLR-2002
 Project: Solr
  Issue Type: Task
  Components: Build
Reporter: Robert Muir
Assignee: Robert Muir
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: SOLR-2002.patch, SOLR-2002_core_contrib.patch, 
 SOLR-2002_localization.patch, SOLR-2002_lucenetestcase.patch, 
 SOLR-2002_merged.patch, SOLR-2002_replication.patch, 
 SOLR-2002_testiter.patch, SOLR-2002_testmethod.patch, 
 SOLR-2002_timeout.patch, SOLR-2002setupteardown.patch


 we are working on improving some functionality in lucene's build/tests, it 
 would be good to improve the solr side to take advantage of it.
 currently its only sorta-kinda integrated and a bit messy.
 i'd like to do some incremental improvements piece-by-piece on this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2002) improve build/tests

2010-09-07 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906689#action_12906689
 ] 

Robert Muir commented on SOLR-2002:
---

by the way, i think this really simplifies the contrib builds.

its probably hard to see in the patch: but here is the entire 
contrib/clustering build now

{noformat}
project name=solr-clustering default=default
  descriptionClustering Integration./description

  property name=src.dir location=src/main/java/
  property name=tests.src.dir location=src/test/java/
  property name=tests.userdir location=src/test/resources/

  import file=../contrib-build.xml/
/project
{noformat}


 improve build/tests
 ---

 Key: SOLR-2002
 URL: https://issues.apache.org/jira/browse/SOLR-2002
 Project: Solr
  Issue Type: Task
  Components: Build
Reporter: Robert Muir
Assignee: Robert Muir
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: SOLR-2002.patch, SOLR-2002_core_contrib.patch, 
 SOLR-2002_localization.patch, SOLR-2002_lucenetestcase.patch, 
 SOLR-2002_merged.patch, SOLR-2002_replication.patch, 
 SOLR-2002_testiter.patch, SOLR-2002_testmethod.patch, 
 SOLR-2002_timeout.patch, SOLR-2002setupteardown.patch


 we are working on improving some functionality in lucene's build/tests, it 
 would be good to improve the solr side to take advantage of it.
 currently its only sorta-kinda integrated and a bit messy.
 i'd like to do some incremental improvements piece-by-piece on this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



solr getUniqueTermCount() when multiple segments?

2010-09-07 Thread Ryan McKinley
Hello-

I'm looking at using the new terms.getUniqueTermCount() to give a
quick count for the LukeRequestHandler rather then needing to walk all
the terms.

When solr index reader has just one segment, it works great.  However
with more segments I get:

java.lang.UnsupportedOperationException: this reader does not
implement getUniqueTermCount()
at org.apache.lucene.index.Terms.getUniqueTermCount(Terms.java:84)

Is this expected?  Is there any way around that?

I am getting the terms using:

  Terms terms = MultiFields.getTerms(reader, fieldName);
  long cnt = (terms==null) ? 0 : terms.getUniqueTermCount();

Thanks
ryan

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2106) Spelling Checking for Multiple Fields

2010-09-07 Thread JAYABAALAN V (JIRA)
Spelling Checking for Multiple Fields
-

 Key: SOLR-2106
 URL: https://issues.apache.org/jira/browse/SOLR-2106
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 1.4
 Environment: Linux Environment
Reporter: JAYABAALAN V
 Fix For: 1.4


Need to enable spellchecking for five different field and it's configuration.I 
am using dismax query parser for searching the different fields in the 
simple.If user has entered a wrong spelling in the front end.It should check in 
the five different fields and give collate spelling suggestion in the front end 
and should get a result based on the spelling suggestion.Do provide your 
configuration details for the same...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Hudson build is back to normal : Solr-trunk #1240

2010-09-07 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Solr-trunk/1240/changes



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2464) FastVectorHighlighter: add a FragmentBuilder to return entire field contents

2010-09-07 Thread Lukas Vlcek (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906709#action_12906709
 ] 

Lukas Vlcek commented on LUCENE-2464:
-

I found that even if the SingleFragListBuilder is used then client has 
explicitly ensure that numberOfFragments  0 otherwise highlighter produces 
empty output.

The thing is that
{noformat} FastVectorHighlighter.getBestFragments( final FieldQuery fieldQuery, 
IndexReader reader, int docId, String fieldName, int fragCharSize, int 
maxNumFragments );{noformat} 
delegates to
{noformat} BaseFragmentsBuilder.createFragments( IndexReader reader, int docId, 
String fieldName, FieldFragList fieldFragList, int maxNumFragments, String[] 
preTags, String[] postTags, Encoder encoder);{noformat}
which needs to be passed maxNumFragments  0 in order to produce any non-empty 
output.

 FastVectorHighlighter: add a FragmentBuilder to return entire field contents
 

 Key: LUCENE-2464
 URL: https://issues.apache.org/jira/browse/LUCENE-2464
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/highlighter
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2464.patch


 In Highlightrer, there is a Nullfragmenter. There is a requirement its 
 counterpart in FastVectorhighlighter.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: solr getUniqueTermCount() when multiple segments?

2010-09-07 Thread Michael McCandless
This is expected/intentional, because computing the true unique term
count across multiple segments is exceptionally costly (you have to do
the merge sort to de-dup).

If you really want the true count, you can pull the TermsEnum and
.next() until exhaustion.

Alternatively, you can use IndexReader.getSequentialSubReaders(), then
step through each SegReader calling its .getUniqueTermCount() and then
somehow approximate (eg the sum will be an upper bound of the total
unique count).

Mike

On Tue, Sep 7, 2010 at 2:34 AM, Ryan McKinley ryan...@gmail.com wrote:
 Hello-

 I'm looking at using the new terms.getUniqueTermCount() to give a
 quick count for the LukeRequestHandler rather then needing to walk all
 the terms.

 When solr index reader has just one segment, it works great.  However
 with more segments I get:

 java.lang.UnsupportedOperationException: this reader does not
 implement getUniqueTermCount()
        at org.apache.lucene.index.Terms.getUniqueTermCount(Terms.java:84)

 Is this expected?  Is there any way around that?

 I am getting the terms using:

          Terms terms = MultiFields.getTerms(reader, fieldName);
          long cnt = (terms==null) ? 0 : terms.getUniqueTermCount();

 Thanks
 ryan

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Build failed in Hudson: Lucene-trunk #1281

2010-09-07 Thread Michael McCandless
The failure was in TestIndexWriter.testThreadInterruptDeadlock:

[junit] java.lang.NoClassDefFoundError:
org/apache/lucene/util/ThreadInterruptedException$__CLR2_6_3c0c0gds5twgh
[junit] at
org.apache.lucene.util.ThreadInterruptedException.init(ThreadInterruptedException.java:28)
[junit] at
org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:304)
[junit] at
org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2543)
[junit] at
org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2538)
[junit] at
org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2534)
[junit] at 
org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3212)
[junit] at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2025)
[junit] at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1979)
[junit] at
org.apache.lucene.index.TestIndexWriter$IndexerThreadInterrupt.run(TestIndexWriter.java:4398)

I think it's a false failure.

I'm pretty the cause is that an interrupt arrived as the class loader
was trying to init the ThreadInterruptedException... somehow this
(receiving thread interrupts) screws up the class loader.  The test
already prevents interrupts until things are warmed up first, but
this class only gets loaded on the first interrupt.

I'll commit a fix, to make sure this class is loaded before any
interrupts are sent.

Thread interrupting is dangerous!!

Mike

On Tue, Sep 7, 2010 at 1:40 AM, Apache Hudson Server
hud...@hudson.apache.org wrote:
 See https://hudson.apache.org/hudson/job/Lucene-trunk/1281/

 --
 [...truncated 13264 lines...]
  [javadoc] Standard Doclet version 1.5.0_22
  [javadoc] Building tree for all the packages and classes...
  [javadoc] 
 https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/contrib/misc/src/java/org/apache/lucene/index/MultiPassIndexSplitter.java:43:
  warning - Tag @link: reference not found: 
 IndexWriter#addIndexes(IndexReader[])
  [javadoc] 
 https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/contrib/misc/src/java/org/apache/lucene/store/DirectIOLinuxDirectory.java:44:
  warning - Tag @link: reference not found: Directory
  [javadoc] 
 https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/contrib/misc/src/java/org/apache/lucene/store/DirectIOLinuxDirectory.java:63:
  warning - Tag @link: reference not found: NativeFSLockFactory
  [javadoc] 
 https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/contrib/misc/src/java/org/apache/lucene/store/DirectIOLinuxDirectory.java:44:
  warning - Tag @link: reference not found: Directory
  [javadoc] Building index for all the packages and classes...
  [javadoc] 
 https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/contrib/misc/src/java/org/apache/lucene/store/DirectIOLinuxDirectory.java:44:
  warning - Tag @link: reference not found: Directory
  [javadoc] Building index for all classes...
  [javadoc] Generating 
 https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/docs/api/contrib-misc/stylesheet.css...
  [javadoc] Note: Custom tags that were not seen: �...@lucene.internal
  [javadoc] 5 warnings
      [jar] Building jar: 
 https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/contrib/misc/lucene-misc-4.0-2010-09-07_02-03-49-javadoc.jar
     [echo] Building queries...

 javadocs:
    [mkdir] Created dir: 
 https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/docs/api/contrib-queries
  [javadoc] Generating Javadoc
  [javadoc] Javadoc execution
  [javadoc] Loading source files for package org.apache.lucene.search...
  [javadoc] Loading source files for package org.apache.lucene.search.regex...
  [javadoc] Loading source files for package 
 org.apache.lucene.search.similar...
  [javadoc] Constructing Javadoc information...
  [javadoc] Standard Doclet version 1.5.0_22
  [javadoc] Building tree for all the packages and classes...
  [javadoc] 
 https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/contrib/queries/src/java/org/apache/lucene/search/regex/JakartaRegexpCapabilities.java:35:
  warning - Tag @link: can't find prefix in 
 org.apache.lucene.search.regex.JakartaRegexpCapabilities
  [javadoc] 
 https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/contrib/queries/src/java/org/apache/lucene/search/regex/RegexCapabilities.java:36:
  warning - Tag @link: reference not found: RegexTermEnum
  [javadoc] 
 https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/contrib/queries/src/java/org/apache/lucene/search/regex/RegexCapabilities.java:36:
  warning - Tag @link: reference not found: RegexTermEnum
  [javadoc] 
 https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/contrib/queries/src/java/org/apache/lucene/search/regex/JavaUtilRegexCapabilities.java:33:
  warning - Tag @link: can't find prefix in 
 org.apache.lucene.search.regex.JavaUtilRegexCapabilities
  

Re: Transient TestIndexWriterMergePolicy failure under IntelliJ

2010-09-07 Thread Michael McCandless
Thanks for reporting Steven!

This is LUCENE-2118, striking again, taunting me.  This particular
failure bugs me!!

Mike

On Mon, Sep 6, 2010 at 8:10 PM, Steven A Rowe sar...@syr.edu wrote:
 While testing changes for LUCENE-2611, I saw 
 TestIndexWriterMergePolicy.testMaxBufferedDocsChange() fail, but I wasn't 
 able to replicate it either from IntelliJ or from Ant after adding the seed 
 to the newRandom() call in TestIndexWriterMergePolicy.setUp().

 Environment: Sun JDK 1.6.0_13, Windows Vista, both 64-bit; IntelliJ IDEA 
 9.0.3.

 When I saw this error, I was running two modules' tests in parallel from 
 IntelliJ, and was working on adding tempDir sysprop setting to test 
 invocations from IntelliJ, so the probability that there was something weird 
 about my local setup is non-trivial.

 Here is the output from IntelliJ:

 -
 NOTE: random codec of testcase 'testMaxBufferedDocsChange' was: MockSep
 NOTE: random locale of testcase 'testMaxBufferedDocsChange' was: en_PH
 NOTE: random timezone of testcase 'testMaxBufferedDocsChange' was: 
 America/Indianapolis
 NOTE: random seed of testcase 'testMaxBufferedDocsChange' was: 
 4118460220441676374

 junit.framework.AssertionFailedError: maxMergeDocs=2147483647; 
 numSegments=11; upperBound=10; mergeFactor=10; segs=_65:c5950 _5t:c10-_32 
 _5u:c10-_32 _5v:c10-_32 _5w:c10-_32 _5x:c10-_32 _5y:c10-_32 _5z:c10-_32 
 _60:c10-_32 _61:c10-_32 _62:c1-_32 _63:c9-_62
        at 
 org.apache.lucene.index.TestIndexWriterMergePolicy.checkInvariants(TestIndexWriterMergePolicy.java:251)
        at 
 org.apache.lucene.index.TestIndexWriterMergePolicy.testMaxBufferedDocsChange(TestIndexWriterMergePolicy.java:177)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at 
 org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:395)
        at org.apache.lucene.util.LuceneTestCase.run(LuceneTestCase.java:387)
        at 
 org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
        at org.junit.runners.Suite.runChild(Suite.java:128)
        at org.junit.runners.Suite.runChild(Suite.java:24)
        at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
        at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
        at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
        at org.junit.runner.JUnitCore.run(JUnitCore.java:157)
        at 
 com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:94)
        at 
 com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:192)
        at 
 com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:64)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at com.intellij.rt.execution.application.AppMain.main(AppMain.java:115)
 -

 Steve



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

2010-09-07 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906723#action_12906723
 ] 

Michael McCandless commented on LUCENE-2573:


bq. We probably need a test that delays the flush process, otherwise flushing 
to RAM occurs too fast to proceed to the next tier.

We can modify MockRAMDir to optionally take its sweet time when writing 
certain files?

{quote}
I'm not sure if after a DWPT is flushing we need to decrement what would 
effectively be a projected RAM usage post current DWPT flush completion. 
Otherwise we could in many cases, start the flush of most/all of the DWPTs.
{quote}

But shouldn't tiered flushing take care of this?  Ie you only decr RAM consumed 
when the flush of the DWPT finishes, not before?

bq. The DWPT that happens to exceed the first tier, is flushed out. This was 
easier to implement than finding the highest RAM consuming DWPT and flushing 
it, from a different thread.

Hmm but this won't be most efficient, in general?  Ie we could end up creating 
tiny segments depending on luck-of-the-thread-scheduling?

bq. I did a search through the code and ByteBlockAllocator.perDocAllocator has 
no references, it can probably be removed, unless there was some other 
intention for it.

I think this makes sense -- each DWPT now immediately flushes to its private 
doc store files, so there's no longer a need to track per-doc pending RAM?

{quote}
In DocumentsWriterRAMAllocator, we're only recording the addition of more bytes 
when a new block is created, however because previous blocks may be recycled, 
it is the recycled blocks that are not being recorded as bytes used. Should we 
record all allocated blocks as in use ie, count them as bytes used, or wait 
until they are in use again to be counted as consuming RAM?
{quote}

I think we have to track both.  If a buffer is not in the pool (ie not free), 
then it's in use and we count that as RAM used, and that counter is used to 
trigger tiered flushing.  Separately we have to track net allocated, in order 
to trim the buffers (drop them, so GC can reclaim) when we are over the 
.setRAMBufferSizeMB.

 Tiered flushing of DWPTs by RAM with low/high water marks
 -

 Key: LUCENE-2573
 URL: https://issues.apache.org/jira/browse/LUCENE-2573
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2573.patch


 Now that we have DocumentsWriterPerThreads we need to track total consumed 
 RAM across all DWPTs.
 A flushing strategy idea that was discussed in LUCENE-2324 was to use a 
 tiered approach:  
 - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
 - Flush all DWPTs at a high water mark (e.g. at 110%)
 - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are 
 used, flush at 90%, 95%, 100%, 105% and 110%.
 Should we allow the user to configure the low and high water mark values 
 explicitly using total values (e.g. low water mark at 120MB, high water mark 
 at 140MB)?  Or shall we keep for simplicity the single setRAMBufferSizeMB() 
 config method and use something like 90% and 110% for the water marks?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: solr getUniqueTermCount() when multiple segments?

2010-09-07 Thread Ryan McKinley
Ahh -- this makes sense.  I thought it was too good to be true!


On Tue, Sep 7, 2010 at 4:45 AM, Michael McCandless
luc...@mikemccandless.com wrote:
 This is expected/intentional, because computing the true unique term
 count across multiple segments is exceptionally costly (you have to do
 the merge sort to de-dup).

 If you really want the true count, you can pull the TermsEnum and
 .next() until exhaustion.

 Alternatively, you can use IndexReader.getSequentialSubReaders(), then
 step through each SegReader calling its .getUniqueTermCount() and then
 somehow approximate (eg the sum will be an upper bound of the total
 unique count).

 Mike

 On Tue, Sep 7, 2010 at 2:34 AM, Ryan McKinley ryan...@gmail.com wrote:
 Hello-

 I'm looking at using the new terms.getUniqueTermCount() to give a
 quick count for the LukeRequestHandler rather then needing to walk all
 the terms.

 When solr index reader has just one segment, it works great.  However
 with more segments I get:

 java.lang.UnsupportedOperationException: this reader does not
 implement getUniqueTermCount()
        at org.apache.lucene.index.Terms.getUniqueTermCount(Terms.java:84)

 Is this expected?  Is there any way around that?

 I am getting the terms using:

          Terms terms = MultiFields.getTerms(reader, fieldName);
          long cnt = (terms==null) ? 0 : terms.getUniqueTermCount();

 Thanks
 ryan

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

2010-09-07 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906798#action_12906798
 ] 

Jason Rutherglen commented on LUCENE-2573:
--

bq. shouldn't tiered flushing take care of this

Faulty thinking for a few minutes.

{quote}but this won't be most efficient, in general? Ie we could end up 
creating tiny segments depending on luck-of-the-thread-scheduling?{quote}

True.  Instead, we may want to simply not-flush the current DWPT if it is in 
fact not the highest RAM user.  When addDoc is called on the thread with the 
highest RAM usage, we can then flush it.

bq. there's no longer a need to track per-doc pending RAM

I'll remove it from the code.

{quote}If a buffer is not in the pool (ie not free), then it's in use and we 
count that as RAM used{quote}

Ok, I'll make the change.  

{quote}we have to track net allocated, in order to trim the buffers (drop them, 
so GC can reclaim) when we are over the .setRAMBufferSizeMB{quote}

I haven't seen this in the realtime branch.  Reclamation of extra allocated 
free blocks may need to be reimplemented.  

I'll increment num bytes used when a block is returned for use.

On this topic, do you have any thoughts yet about how to make the block pools 
concurrent?  I'm still leaning towards a random access file (seek style) 
interface because this is easy to make concurrent, and hides the underlying 
block management mechanism, rather than directly exposes it like today, which 
can lend itself to problematic usage in the future.

 Tiered flushing of DWPTs by RAM with low/high water marks
 -

 Key: LUCENE-2573
 URL: https://issues.apache.org/jira/browse/LUCENE-2573
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2573.patch


 Now that we have DocumentsWriterPerThreads we need to track total consumed 
 RAM across all DWPTs.
 A flushing strategy idea that was discussed in LUCENE-2324 was to use a 
 tiered approach:  
 - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
 - Flush all DWPTs at a high water mark (e.g. at 110%)
 - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are 
 used, flush at 90%, 95%, 100%, 105% and 110%.
 Should we allow the user to configure the low and high water mark values 
 explicitly using total values (e.g. low water mark at 120MB, high water mark 
 at 140MB)?  Or shall we keep for simplicity the single setRAMBufferSizeMB() 
 config method and use something like 90% and 110% for the water marks?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

2010-09-07 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906801#action_12906801
 ] 

Jason Rutherglen commented on LUCENE-2573:
--

bq. We can modify MockRAMDir to optionally take its sweet time when writing 
certain files?

Yes, I think we need to implement something of this nature.  We *could* even 
randomly assign a different delay value per flush.  Of course how the test 
would instigate this from outside of DW, is somewhat of a different issue.

 Tiered flushing of DWPTs by RAM with low/high water marks
 -

 Key: LUCENE-2573
 URL: https://issues.apache.org/jira/browse/LUCENE-2573
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2573.patch


 Now that we have DocumentsWriterPerThreads we need to track total consumed 
 RAM across all DWPTs.
 A flushing strategy idea that was discussed in LUCENE-2324 was to use a 
 tiered approach:  
 - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
 - Flush all DWPTs at a high water mark (e.g. at 110%)
 - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are 
 used, flush at 90%, 95%, 100%, 105% and 110%.
 Should we allow the user to configure the low and high water mark values 
 explicitly using total values (e.g. low water mark at 120MB, high water mark 
 at 140MB)?  Or shall we keep for simplicity the single setRAMBufferSizeMB() 
 config method and use something like 90% and 110% for the water marks?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2002) improve build/tests

2010-09-07 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906806#action_12906806
 ] 

Yonik Seeley commented on SOLR-2002:


Sounds cool!  Whatever those strong in ant-foo come up with is fine with me!

 improve build/tests
 ---

 Key: SOLR-2002
 URL: https://issues.apache.org/jira/browse/SOLR-2002
 Project: Solr
  Issue Type: Task
  Components: Build
Reporter: Robert Muir
Assignee: Robert Muir
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: SOLR-2002.patch, SOLR-2002_core_contrib.patch, 
 SOLR-2002_localization.patch, SOLR-2002_lucenetestcase.patch, 
 SOLR-2002_merged.patch, SOLR-2002_replication.patch, 
 SOLR-2002_testiter.patch, SOLR-2002_testmethod.patch, 
 SOLR-2002_timeout.patch, SOLR-2002setupteardown.patch


 we are working on improving some functionality in lucene's build/tests, it 
 would be good to improve the solr side to take advantage of it.
 currently its only sorta-kinda integrated and a bit messy.
 i'd like to do some incremental improvements piece-by-piece on this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1316) Create autosuggest component

2010-09-07 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906812#action_12906812
 ] 

Andrzej Bialecki  commented on SOLR-1316:
-

I added license headers and committed the patch in rev.  993367 - thank you!

 Create autosuggest component
 

 Key: SOLR-1316
 URL: https://issues.apache.org/jira/browse/SOLR-1316
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Assignee: Andrzej Bialecki 
Priority: Minor
 Fix For: Next

 Attachments: SOLR-1316.patch, SOLR-1316.patch, SOLR-1316.patch, 
 SOLR-1316.patch, SOLR-1316.patch, SOLR-1316.patch, SOLR-1316.patch, 
 SOLR-1316.patch, SOLR-1316.patch, SOLR-1316.patch, SOLR-1316.patch, 
 SOLR-1316_3x-2.patch, SOLR-1316_3x.patch, suggest.patch, suggest.patch, 
 suggest.patch, TST.zip

   Original Estimate: 96h
  Remaining Estimate: 96h

 Autosuggest is a common search function that can be integrated
 into Solr as a SearchComponent. Our first implementation will
 use the TernaryTree found in Lucene contrib. 
 * Enable creation of the dictionary from the index or via Solr's
 RPC mechanism
 * What types of parameters and settings are desirable?
 * Hopefully in the future we can include user click through
 rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2002) improve build/tests

2010-09-07 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906826#action_12906826
 ] 

Robert Muir commented on SOLR-2002:
---

thanks, the major thing left is to consolidate release management-type things 
(e.g. rat reporting tasks, dist/packaging, artifact signing, checksumming, etc).

most of this is really inappropriate the way it is in lucene's build, because 
its standalone in lucene's build.xml and not reusable to modules and solr. for 
example: 'rat-sources' just runs on a hardcoded src/java for lucene-core.

so we need to fix this kind of stuff anyway so that things in modules/ can 
actually ever release, no way to do this at the moment.


 improve build/tests
 ---

 Key: SOLR-2002
 URL: https://issues.apache.org/jira/browse/SOLR-2002
 Project: Solr
  Issue Type: Task
  Components: Build
Reporter: Robert Muir
Assignee: Robert Muir
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: SOLR-2002.patch, SOLR-2002_core_contrib.patch, 
 SOLR-2002_localization.patch, SOLR-2002_lucenetestcase.patch, 
 SOLR-2002_merged.patch, SOLR-2002_replication.patch, 
 SOLR-2002_testiter.patch, SOLR-2002_testmethod.patch, 
 SOLR-2002_timeout.patch, SOLR-2002setupteardown.patch


 we are working on improving some functionality in lucene's build/tests, it 
 would be good to improve the solr side to take advantage of it.
 currently its only sorta-kinda integrated and a bit messy.
 i'd like to do some incremental improvements piece-by-piece on this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

2010-09-07 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-2573:
-

Attachment: LUCENE-2573.patch

* perDocAllocator is removed from DocumentsWriterRAMAllocator

* getByteBlock and getIntBlock always increments the numBytesUsed

The test that simply prints out debugging messages looks better.  I need to 
figure out unit tests.

 Tiered flushing of DWPTs by RAM with low/high water marks
 -

 Key: LUCENE-2573
 URL: https://issues.apache.org/jira/browse/LUCENE-2573
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2573.patch, LUCENE-2573.patch


 Now that we have DocumentsWriterPerThreads we need to track total consumed 
 RAM across all DWPTs.
 A flushing strategy idea that was discussed in LUCENE-2324 was to use a 
 tiered approach:  
 - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
 - Flush all DWPTs at a high water mark (e.g. at 110%)
 - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are 
 used, flush at 90%, 95%, 100%, 105% and 110%.
 Should we allow the user to configure the low and high water mark values 
 explicitly using total values (e.g. low water mark at 120MB, high water mark 
 at 140MB)?  Or shall we keep for simplicity the single setRAMBufferSizeMB() 
 config method and use something like 90% and 110% for the water marks?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

2010-09-07 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906918#action_12906918
 ] 

Jason Rutherglen commented on LUCENE-2573:
--

The last patch also only flushes a DWPT if it's the highest RAM consumer.

 Tiered flushing of DWPTs by RAM with low/high water marks
 -

 Key: LUCENE-2573
 URL: https://issues.apache.org/jira/browse/LUCENE-2573
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2573.patch, LUCENE-2573.patch


 Now that we have DocumentsWriterPerThreads we need to track total consumed 
 RAM across all DWPTs.
 A flushing strategy idea that was discussed in LUCENE-2324 was to use a 
 tiered approach:  
 - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
 - Flush all DWPTs at a high water mark (e.g. at 110%)
 - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are 
 used, flush at 90%, 95%, 100%, 105% and 110%.
 Should we allow the user to configure the low and high water mark values 
 explicitly using total values (e.g. low water mark at 120MB, high water mark 
 at 140MB)?  Or shall we keep for simplicity the single setRAMBufferSizeMB() 
 config method and use something like 90% and 110% for the water marks?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



bug triggered by TestIndexWriter.testRandomStoredFields

2010-09-07 Thread Robert Muir
Hello,

I've tripped on this a few times lately, but never been able to reproduce
it: it seems now i am able to reproduce it now semi-consistently with the
below configuration.
It would be great if someone else could try this out and see if its a real
problem, or if its just my machine.

occasionally i see a very nasty result
from TestIndexWriter.testRandomStoredFields:
either a read past EOF, IndexOutOfBounds, NegativeArraySizeException, or
field X is wrong, expected nonsense unicode actual different nonsense
unicode

Here are my steps to reproduce:
1. edit line 87 of TestIndexWriter to plugin the seed:
random = newRandom(3312389322103990899L);
2. run this command:
ant clean test-core -Dtestcase=TestIndexWriter
-Dtestmethod=testRandomStoredFields -Dtests.iter=10
-Dtests.codec=MockVariableIntBlock(29)

I used 10 iterations here, as it will usually fail with this seed and # of
iterations for me.

furthermore, if i comment out lines 5179 and 5180 from TestIndexWriter so
that it no longer randomly deletes documents, the test will always pass:
//w.deleteDocuments(new Term(id, delID));
//docs.remove(delID);

-- 
Robert Muir
rcm...@gmail.com


Re: Getting facets for a field from within a SearchComponent

2010-09-07 Thread Chris Hostetter

: I'm writing my first SearchComponent to do custom calculations on search
: results. Is it possible to get the facet values for a field from within a
: SearchComponent? I've thought of adapting the StatsComponent and
: FieldFacetStats classes to try and accomplish this. But before I try that,
: is there an API call I could make instead?

1) if you configure your component to run after the FacetComponent, then 
the result will already have the facet values available for you access.

2) the faceting code is the StatsComponent makes a lot of bad assumptions, 
so it has some known bugs -- i would not recomend borrowing that code.


-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2052) Allow for a list of filter queries and a single docset filter in QueryComponent

2010-09-07 Thread Stephen Green (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Green updated SOLR-2052:


Attachment: SOLR-2052-2.patch

Updated patch that fixes a bug when combining filter docsets and filter queries.

 Allow for a list of filter queries and a single docset filter in 
 QueryComponent
 ---

 Key: SOLR-2052
 URL: https://issues.apache.org/jira/browse/SOLR-2052
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
 Environment: Mac OS X, Java 1.6
Reporter: Stephen Green
Priority: Minor
 Fix For: 1.4.2

 Attachments: SOLR-2052-2.patch, SOLR-2052.patch


 SolrIndexSearcher.QueryCommand allows you to specify a list of filter queries 
 or a single filter (as a DocSet), but not both.  This restriction seems 
 arbitrary, and there are cases where we can have both a list of filter 
 queries and a DocSet generated by some other non-query process (e.g., 
 filtering documents according to IDs pulled from some other source like a 
 database.)
 Fixing this requires a few small changes to SolrIndexSearcher to allow both 
 of these to be set for a QueryCommand and to take both into account when 
 evaluating the query.  It also requires a modification to ResponseBuilder to 
 allow setting the single filter at query time.
 I've run into this against 1.4, but the same holds true for the trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2105) RequestHandler param update.processor is confusing

2010-09-07 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-2105:
--

Attachment: SOLR-2105.patch

The attached patch renames the parameter, both in code and config. Tests run 
after applying it, but I have not done regression testing of the functionality.

 RequestHandler param update.processor is confusing
 --

 Key: SOLR-2105
 URL: https://issues.apache.org/jira/browse/SOLR-2105
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.4.1
Reporter: Jan Høydahl
Priority: Minor
 Attachments: SOLR-2105.patch


 Today we reference a custom updateRequestProcessorChain using the update 
 request parameter update.processor.
 See 
 http://wiki.apache.org/solr/SolrConfigXml#UpdateRequestProcessorChain_section
 This is confusing, since what we are really referencing is not an 
 UpdateProcessor, but an updateRequestProcessorChain.
 I propose that update.processor is renamed as update.chain or similar

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

2010-09-07 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-2573:
-

Attachment: LUCENE-2573.patch

There was a small bug in the choice of the max DWPT, in that all DWPTs, 
including ones that were scheduled to flush were being compared against the 
current DWPT (ie the one being examined for possible flushing).

 Tiered flushing of DWPTs by RAM with low/high water marks
 -

 Key: LUCENE-2573
 URL: https://issues.apache.org/jira/browse/LUCENE-2573
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch


 Now that we have DocumentsWriterPerThreads we need to track total consumed 
 RAM across all DWPTs.
 A flushing strategy idea that was discussed in LUCENE-2324 was to use a 
 tiered approach:  
 - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
 - Flush all DWPTs at a high water mark (e.g. at 110%)
 - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are 
 used, flush at 90%, 95%, 100%, 105% and 110%.
 Should we allow the user to configure the low and high water mark values 
 explicitly using total values (e.g. low water mark at 120MB, high water mark 
 at 140MB)?  Or shall we keep for simplicity the single setRAMBufferSizeMB() 
 config method and use something like 90% and 110% for the water marks?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2108) ReversedWildcardFilter can create false positives

2010-09-07 Thread Robert Muir (JIRA)
ReversedWildcardFilter can create false positives
-

 Key: SOLR-2108
 URL: https://issues.apache.org/jira/browse/SOLR-2108
 Project: Solr
  Issue Type: Bug
Reporter: Robert Muir
Priority: Minor
 Fix For: 4.0


Reported from the userlist: 

{noformat}
For instance, the query *zemog* matches documents that contain Gomez
{noformat}

http://www.lucidimagination.com/search/document/35abfdabfcec99b7/false_matches_with_reversedwildcardfilterfactory


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2108) ReversedWildcardFilter can create false positives

2010-09-07 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-2108:
--

Attachment: SOLR-2108.patch

Simple fix: if we are doing a wildcard query on a reversed field, but we 
*are not* going to reverse it, we must subtract the set of reversed terms 
(markerChar*) from the query dfa as these could be false positives.

I also added a basic test.

 ReversedWildcardFilter can create false positives
 -

 Key: SOLR-2108
 URL: https://issues.apache.org/jira/browse/SOLR-2108
 Project: Solr
  Issue Type: Bug
Reporter: Robert Muir
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2108.patch


 Reported from the userlist: 
 {noformat}
 For instance, the query *zemog* matches documents that contain Gomez
 {noformat}
 http://www.lucidimagination.com/search/document/35abfdabfcec99b7/false_matches_with_reversedwildcardfilterfactory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2107) MoreLikeThisHandler doesn't work with alternate qparsers

2010-09-07 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-2107:
---

Attachment: SOLR-2107.patch

Here's a patch that adds qparser support for q and fq params.

 MoreLikeThisHandler doesn't work with alternate qparsers
 

 Key: SOLR-2107
 URL: https://issues.apache.org/jira/browse/SOLR-2107
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
 Attachments: SOLR-2107.patch


 In the MoreLikeThisHandler, Lucene syntax is assumed, and no other query 
 parser can be invoked.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (SOLR-2107) MoreLikeThisHandler doesn't work with alternate qparsers

2010-09-07 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-2107.


Fix Version/s: 4.0
   Resolution: Fixed

 MoreLikeThisHandler doesn't work with alternate qparsers
 

 Key: SOLR-2107
 URL: https://issues.apache.org/jira/browse/SOLR-2107
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
 Fix For: 4.0

 Attachments: SOLR-2107.patch


 In the MoreLikeThisHandler, Lucene syntax is assumed, and no other query 
 parser can be invoked.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Re: About Solr DataImportHandler

2010-09-07 Thread 郭芸
Thank you for your reply,it is very import to me.
1.I  agree with you by i read solr's source code,i found that it can resolve 
this problem by config db-data-config.xml,like this(my database's 
Sqlserver2005,other database will unavailable):
dataSource name=dsSqlServer type=JdbcDataSource 
driver=com.microsoft.sqlserver.jdbc.SQLServerDriver batchSize=3000
url=jdbc:sqlserver://192.168.1.5:1433; 
DatabaseName=testDatabase;responseBuffering=adaptive;selectMethod=cursor 
user=sa password=12345 /

add  responseBuffering=adaptive;selectMethod=cursor into url attribute,and Solr 
will set these parameters by itself:
c.createStatement(ResultSet.TYPE_FORWARD_ONLY,ResultSet.CONCUR_READ_ONLY);
By these configs,Solr can import big table's datas to  index dir.


2.But there are some problems:
if the table is very big,solr will spend a long time to import and index,may be 
one day and more.so once occurred network problems and others during this 
time,maybe solr can not remember what documents had processed,and if we 
continue data import ,we do not know where to start.

3.i am sorry for my bad English.i wish you can know what i mean.


2010-09-08 



郭芸 



发件人: Alexey Serba 
发送时间: 2010-09-07  16:07:49 
收件人: dev 
抄送: 
主题: Re: About Solr DataImportHandler 
 
 i found that Solr import the datas to memory first,then write them to index 
 dir.
That's not really true. DataImportHandler streams the result from
database query and adding documents into index. So it shouldn't load
all database data into memory. Disabling autoCommit, warming queries
and spellcheckers usually decreases required amount of memory during
indexing process.
Please share your hardware details, jvm options, solrconfig and schema
configuration, etc.
2010/9/7 郭芸 mickey.guo...@gmail.com:
 Dear all:
 I use Solr DataImportHandler's JdbcDataSource to import the Sqlsever 2005's
 datas to Solr,but My table is versy big,about 300G.and i found that Solr
 import the datas to memory first,then write them to index dir.So if the
 datas are too big,there will trigger an OutOfMemoryException.
 I want to solve this problem,and how can ti do it?anybody can help me?Thank
 you.

 2010-09-07
 
 郭芸
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org


[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

2010-09-07 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907069#action_12907069
 ] 

Jason Rutherglen commented on LUCENE-2575:
--

bq. every term has its own open IndexOutput

I'm not seeing IndexOutput in use with the RAM buffer, do you
mean the the write* (writeVInt, writeBytes, writeByte) methods
of TermsHashPerField? 

Included in this patch will need to be a way to concurrently
grow other arrays such as ParallelPostingsArray. PPA is used to
store pointers to data stored in the block pools. Maybe we need
a class that concurrently manages growing arrays and block
pools. 

Or we may need to slightly re-architect how we're storing the
RAM buffer data so that concurrency can be guaranteed, ie, I
think we'll need to write to temporary arrays, which are then
flushed to primary readable arrays. The flush would occur after
adding a document, or probably for better efficiency, only when
getReader is called.

 Concurrent byte and int block implementations
 -

 Key: LUCENE-2575
 URL: https://issues.apache.org/jira/browse/LUCENE-2575
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
 Fix For: Realtime Branch


 The current *BlockPool implementations aren't quite concurrent.
 We really need something that has a locking flush method, where
 flush is called at the end of adding a document. Once flushed,
 the newly written data would be available to all other reading
 threads (ie, postings etc). I'm not sure I understand the slices
 concept, it seems like it'd be easier to implement a seekable
 random access file like API. One'd seek to a given position,
 then read or write from there. The underlying management of byte
 arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1665) Add debugTimings param so that timings for components can be retrieved without having to do explains(), as in debugQuery

2010-09-07 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907080#action_12907080
 ] 

Yonik Seeley commented on SOLR-1665:


Due to the cost of distributed search tests, I removed 
DistributedDebugComponentTest and moved the debug tests to 
TestDistributedSearch.

 Add debugTimings param so that timings for components can be retrieved 
 without having to do explains(), as in debugQuery
 --

 Key: SOLR-1665
 URL: https://issues.apache.org/jira/browse/SOLR-1665
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-1665.patch, SOLR-1665.patch, SOLR-1665.patch, 
 SOLR-1665.patch


 As the title says, it would be great if we could just get back component 
 timings w/o having to do the full boat of explains and other stuff.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: About Solr DataImportHandler

2010-09-07 Thread Fu Shunkai 傅顺开
Try to set the batchsize as -1. 


2010-09-08 



傅顺开
苏州广达友讯技术有限公司
江苏苏州工业园区金鸡湖大道1355号
国际科技园151A,215021
电话:(512)6288-8255(转612)
传真:(512)6288-8155
手机:(0)158-5018-8480
email:f...@peptalk.cn
http://www.bedo.cn, http://k.ai, http://www.lbs.org.cn 
 



发件人: 郭芸 
发送时间: 2010-09-07  09:55:05 
收件人: Solr Lucene 
抄送: 
主题: About Solr DataImportHandler 
 
Dear all:
I use Solr DataImportHandler's JdbcDataSource to import the Sqlsever 2005's 
datas to Solr,but My table is versy big,about 300G.and i found that Solr import 
the datas to memory first,then write them to index dir.So if the datas are too 
big,there will trigger an OutOfMemoryException.
I want to solve this problem,and how can ti do it?anybody can help me?Thank you.

2010-09-07 



郭芸 


Re: Re: About Solr DataImportHandler

2010-09-07 Thread Alexey Serba
 2.But there are some problems:
 if the table is very big,solr will spend a long time to import and index,may
 be one day and more.so once occurred network problems and others during this
 time,maybe solr can not remember what documents had processed,and if we
 continue data import ,we do not know where to start.

You can _batch_ import your data using full import command by
providing additional request parameter ( see
http://wiki.apache.org/solr/DataImportHandler#Accessing_request_parameters
), i.e.

query=SELECT * FROM my_table ORDER BY id LIMIT 100 OFFSET
${dataimporter.request.offset}

and then calling full-import command several times:
1) /dataimport?clean=trueoffset=0
2) /dataimport?clean=falseoffset=100
3) /dataimport?clean=falseoffset=200
etc

// Please use solr-u...@lucene.apache.org mailing list for such
questions. _dev_ is not appropriate place for this.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org